noise layer
Nomic builds AI models and conducts open-research in embeddings, language models, and dimensionality reduction.
At Nomic, we believe that AI research should be open, transparent, and accessible to everyone. It should also be useful and solve real world problems. Our research efforts and direction are driven by hard problems we discover while working with our customers.
Research & Publications
Explore our
Embeddings
Training Sparse MoE Text Embedding Models
Under Review
Introduces the first general purpose mixture of experts text embedding model, which achieves state-of-the-art performance on the MIRACL benchmark. The model is truly open source, meaning the training data, weights, and code are available and permisively licensed.
Read Paper
CoRNStack: High-Quality Contrastive Data for Better Code Ranking
ICLR 2025
An Open Dataset for training State-of-the-Art Code Embedding Models. Work done in collaboration with University of Illinois at Urbana-Champaign.
Read Paper
Nomic Embed: Training a Reproducible Long Context Text Embedder
TMLR 2024
The first truly open (i.e. open data, weights, and code) text embedding model that outperforms OpenAI Ada. Work done in collaboration with Cornell University.
100+ citations
Read Paper
Nomic Embed Vision: Expanding the Latent Space
ArXiv 2024
The first multimodal embedding model to achieve high performance on text-text, text-image, and image-image tasks with a single unified latent space.
Read Paper
Embedding Based Inference on Generative Models
ArXiv 2024
An extension of Data Kernel methods to black box settings. Work done in collaboration with Johns Hopkins University.
Read Paper
Language Models
Tracking the Perspectives of Interacting Language Models
EMNLP 2024
Developing and studying metrics for understanding information diffusion in communication networks of LLMs. Work done in collaboration with Johns Hopkins University.
Read Paper
GPT4All: An Ecosystem of Open Source Compressed Language Models
EMNLP 2023
How the first open source LLM to surpass GPT-3.5's performance grew from a model into a movement. Work done in collaboration with the GPT4All community.
150+ citations
Read Paper
Comparing Foundation Models using Data Kernels
ArXiv 2023
A method for statistically rigorous comparison of embedding spaces without labeled data. Work done in collaboration with Johns Hopkins University.
Read Paper
Dimensionality Reduction
The Landscape of Biomedical Research
Cell Patterns Cover 2024
The first systematic study of the entirety of PubMed from an information cartography perspective. Work done in collaboration with University of Tubingen.
Cover Story
Read Paper
Mapping Wikipedia with BERT and UMAP
IEEE Vis 2022
The first systematic study of the entirety of English Wikipedia from an information cartography perspective. Work done in collaboration with New York University.
Watch Talk