Chroma Datasets

Making it easy to load data into Chroma since 2023

pip install chroma_datasets

Current Datasets

chroma_datasets is generally backed by hugging face datasets, but it is not a requirement.

How to use

The following will:

  1. Download the 2022 State of the Union
  2. Chunk it up for you
  3. Embed it using Chroma's default open-source embedding function
  4. Import it into Chroma
import chromadb
from chroma_datasets import StateOfTheUnion
from chroma_datasets.utils import import_into_chroma

chroma_client = chromadb.Client()
collection = import_into_chroma(chroma_client=chroma_client, dataset=StateOfTheUnion)
result = collection.query(query_texts=["The United States of America"])
print(result)

Learn about how to create and contribute a package at chroma-core/chroma_datasets.