Datasets, tokenizers, and OLM data loading.
Modules
| Module | Public API |
|---|---|
olm.data.datasets | BaseTextDataset, DataLoader, FineWebEduDataset, HuggingFaceTextDataset, LocalTextDataset |
olm.data.datasets.base_dataset | BaseTextDataset |
olm.data.datasets.data_loader | DataLoader |
olm.data.datasets.fineweb_edu | FineWebEduDataset |
olm.data.datasets.hf_dataset | FineWebEduDataset, HuggingFaceTextDataset |
olm.data.datasets.local_dataset | LocalTextDataset |
olm.data.tokenization.base | TokenizerBase |
olm.data.tokenization.hf_tokenizer | HFTokenizer |
olm.data.tokenization.hf_train_custom | HFTokenizerTrainCustom |