Skip to main content

On This Page

Sentence Transformers Joins Hugging Face as Community-Driven Open-Source Project

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Sentence Transformers Transitions to Hugging Face

Sentence Transformers, a widely used open-source library for generating high-quality sentence embeddings, has officially joined Hugging Face. This transition will leverage Hugging Face’s robust infrastructure to further advance and democratize the project. The library, initially developed at the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt, will continue to be community-driven and open-source, maintaining its existing Apache 2.0 license.

Key Highlights

  • Transition Announcement: Sentence Transformers is now part of the Hugging Face ecosystem.
  • Maintainership: Tom Aarsen from Hugging Face will continue to lead the project, building on work started in late 2023.
  • Infrastructure Benefits: The project will benefit from Hugging Face’s continuous integration and testing, ensuring up-to-date advancements in Information Retrieval and Natural Language Processing (NLP).
  • Community-Driven: Sentence Transformers will remain a community-driven, open-source project with contributions welcomed from researchers, developers, and enthusiasts.
  • License: The project will continue to operate under the Apache 2.0 license.

Background and History

Sentence Transformers (also known as SentenceBERT or SBERT) was created in 2019 by Dr. Nils Reimers at the UKP Lab, under the supervision of Prof. Dr. Iryna Gurevych. The library addresses limitations of standard BERT embeddings for sentence-level semantic tasks by utilizing a Siamese network architecture to produce semantically meaningful sentence embeddings.

  • 2019: Initial release by Dr. Nils Reimers at TU Darmstadt.
  • 2020: Multilingual support added, extending to over 400 languages.
  • 2021: Support for pair-wise sentence scoring using Cross Encoder and Sentence Transformer models was added, with contributions from Nandan Thakur and Dr. Johannes Daxenberger. Integration with the Hugging Face Hub (v2.0) also occurred.
  • Late 2023: Tom Aarsen from Hugging Face took over maintainership, introducing modernized training for Sentence Transformer models (v3.0), as well as improvements of Cross Encoder (v4.0) and Sparse Encoder (v5.0) models.
  • Funding: The UKP Lab’s development was supported by grants from the German Research Foundation (DFG), German Federal Ministry of Education and Research (BMBF), and Hessen State Ministry for Higher Education, Research and the Arts (HMWK).

Impact and Adoption

Sentence Transformers has become a widely adopted tool in the NLP research toolkit, used for tasks such as:

  • Semantic search
  • Semantic textual similarity
  • Clustering
  • Paraphrase mining

As of the announcement, over 16,000 Sentence Transformers models are publicly available on the Hugging Face Hub, serving more than a million monthly unique users. The project’s success is attributed to its modular design, strong empirical performance, and active community involvement.

Acknowledgements

Hugging Face expressed gratitude to the UKP Lab, particularly Dr. Nils Reimers and Prof. Dr. Iryna Gurevych, for their dedication to the project. The platform also thanked the broader community for contributions including model submissions, bug reports, feature requests, documentation improvements, and real-world applications.

Resources

Continue reading

Next article

Three Questions That Help You Build a Better Software Architecture

Related Content