Introducing the KafkaLM series with KafkaLM 13B German

We’re excited to announce the release of KafkaLM-13B-German-V0.1: a 13-billion-parameter model based on Llama 2, further pre-trained on a large German-language dataset by Björn Plüster and LAION. Subsequently, the model was fine-tuned on popular high-quality open-source instruction datasets (translated from English into German). The result is a language model that not only provides fact-oriented answers but also demonstrates linguistic creativity.

Why KafkaLM?

Just like the renowned author Franz Kafka, the model is capable of tackling complex or “twisted” topics while remaining comprehensible. It displays a flair for evocative language, yet stays precise enough for business use.

Our primary aim is to empower the German AI community by offering a language model that can be effortlessly applied to everyday and professional tasks—be it customer support, content creation, or complex research queries. The model ensures it does not omit relevant facts while delivering clear, detailed answers in German.

Data & prompt Format

KafkaLM-13B-German-V0.1 was trained using an 8k filtered version of the seedboxai/multitask_german_examples_32k dataset. A structured prompt format (system, user, and assistant blocks) ensures consistent and reliable responses from the model.

Getting Started

Thanks to seamless integration with the Hugging Face ecosystem, it’s easy to get started. With just a few lines of code, you can load the model and generate responses.