OLM API Reference

Data API

Datasets, tokenizers, and OLM data loading.

Modules

ModulePublic API
olm.data.datasetsBaseTextDataset, DataLoader, FineWebEduDataset, HuggingFaceTextDataset, LocalTextDataset
olm.data.datasets.base_datasetBaseTextDataset
olm.data.datasets.data_loaderDataLoader
olm.data.datasets.fineweb_eduFineWebEduDataset
olm.data.datasets.hf_datasetFineWebEduDataset, HuggingFaceTextDataset
olm.data.datasets.local_datasetLocalTextDataset
olm.data.tokenization.baseTokenizerBase
olm.data.tokenization.hf_tokenizerHFTokenizer
olm.data.tokenization.hf_train_customHFTokenizerTrainCustom