This repo contains an overview of open source machine learning projects and companies providing these, which are based in Germany.
A criteria for getting listed here, is that the roots of the project are in Germany or at least a big part of the developers working on the project/for the company are located within Germany.
Contributions to this list are very welcome 🤗 Be it corrections, additions or suggestions - feel free to open an Issue or Pull Request.
Following are different categories of Machine Learning and the corresponding projects, including links to their social media representations
The projects listed here all provide frameworks to perform Natural Language Processing tasks.
Name | Description | Links |
---|---|---|
Explosion |
Best known for the spaCy library, one of the most popular Python packages for everything NLP. It pays off to check their profile for other great repos like spacy-llm and curated-transformers | |
deepset |
Another young Berlin based company, best known for their LLM framework Haystack. Their first step into the limelight was by training a BERT based German language model | |
flair |
Developed at the Humboldt University Berlin, flair is a simple and powerful framework for state-of-the-art NLP. | |
deepL |
Based in Cologne, DeepL provides a great machine translation quality, especially for German, since 2017. With their open source library, their technology is easily integrated into every python project. | |
small-text |
Originating from a research project at Leipzig University, small-text offers a modular and comprehensive Python library for building experiments and applications focused on active learning for text classification. |
Here you can find projects that mainly focus on solving Computer Vision problems, which includes tasks like image classification, object detection, object segmentation.
Name | Description | Links |
---|---|---|
Mobius Labs |
A relatively small and unknown company, but their repos are definitely worth checking out - especially Half-Quadratic Quantization and Aana |
The projects here are focused on Generative AI tasks like LLMs, text-to-image, text-to-video, text-to-audio or similar. Some of the projects/companies listed here might not have popular repositories on GitHub, but instead are releasing ML models with freely accessible weights (mostly on Hugging Face).
Name | Description | Links |
---|---|---|
OpenGPT-X |
This project is tightly connected to Occiglot and backed by some big companies and institutions (e.g. Fraunhofer, dfki, Ionos) and is dedicated to create multilingual LLMs with a focus on open source. | |
Black Forest Labs |
Just announced at the beginning of August '24, this company has already stirred up the AI community with their text-to-image model family called flux | |
Vago Solutions |
Vago Solutions mainly focuses on creating German LLMs (called SauerkrautLM) and have already made more than 20 of those LLMs accessible in their Hugging Face repository |
Name | Description | Links |
---|---|---|
dltHub |
dltHub are the creators of data load tool (dlt). While dlt might not strictly be a Machine Learning library, I still decided to include it here, as it eases the pain of data collection, which is an integral part of the ML lifecycle. | |
Trafilatura |
Originally released to collect data for linguistic research and lexicography at the Berlin-Brandenburg Academy of Sciences, Trafilatura is now widely used in AI, NLP and LLMs. |
Building, Training and Deploying Machine Learning models can be a real struggle in today's overflowing ML landscape. These projects are trying to take the biggest efforts and frustration out of the process.
Name | Description | Links |
---|---|---|
ZenML |
The company from Munich developed a framework to let you build, train and deploy ML pipelines in a simple and reproducible way. | |
dstack |
And another MLOps centred company, originating from Munich. dstack specializes on making it easy to build, train and deploy your ML models on different cloud providers | |
Flower Labs |
Flower Labs offer an open source framework for federated learning, which can be especially helpful when working with distributed and sensitive data. | |
AIME |
While the core business of AIME is about selling HPC Servers, workstations and GPU Cloud space, they have also open-sourced a series of projects for hosting and serving ML models, e.g. aime-ml-containers, aime-api-server |
These companies and projects mainly focus on Neural Search applications and connected topics like Multimodal embeddings.
Name | Description | Links |
---|---|---|
Jina AI |
Jina AI has a big output of open source libraries for a lot of uses cases, but is best known for its library, simply called jina, that let's you build and deploy Multimodal ML applications. | |
Qdrant |
Straight from the vibrant Berlin based start-up scene, Qdrant specializes on neural search applications and multimodal embeddings. They also have a lively discord community. | |
mixedbread.ai |
Still very new to the scene, but they have already released an amazing Sentence Embedding model. |
Here are all the projects that don't fit into one of the other categories (or in more than one).
Name | Description | Links |
---|---|---|
Superduper |
Freshly rebranded (formerly SuperDuperDB), the team from Superduper aims to make every database and storage capable of AI, without needing specialized vector databases or the like. | |
LAION e.V. |
LAION is a non-profit organization with the aim to create free and open-source models and datasets. They have a big community and already released many interesting projects, like Open Assistant and CLAP. |
Name | Description | Links |
---|---|---|
Occiglot |
Occiglot is an collective of researchers, who want to develop open-source language models for and by Europe. Although not entirely rooted in Germany, it is heavily funded by German institutions and many active researchers are from Germany. |
And last but not least a little shout-out to Johannes Rieke and his great (albeit a little outdated) collection of Berlin based Machine Learning start-ups 😉