Skip to content

olm.data.datasets.local_dataset

Classes

LocalTextDataset(*args, **kwargs) Dataset that streams text from local .txt files in a directory.

class olm.data.datasets.local_dataset.BaseTextDataset(*args: Any, **kwargs: Any)

Bases: IterableDataset, ABC

Abstract base class for text-based streaming datasets.

Handles tokenization buffering and sequence generation generically. Subclasses must implement _get_text_iterator to yield text chunks.

class olm.data.datasets.local_dataset.LocalTextDataset(*args: Any, **kwargs: Any)

Bases: BaseTextDataset

Dataset that streams text from local .txt files in a directory.

class olm.data.datasets.local_dataset.Union

Bases: object

Represent a union type

E.g. for int | str