olm.data.datasets.local_dataset¶
Classes¶
LocalTextDataset(*args, **kwargs) |
Dataset that streams text from local .txt files in a directory. |
|---|---|
class olm.data.datasets.local_dataset.BaseTextDataset(*args: Any, **kwargs: Any)¶
Bases: IterableDataset, ABC
Abstract base class for text-based streaming datasets.
Handles tokenization buffering and sequence generation generically. Subclasses must implement _get_text_iterator to yield text chunks.
class olm.data.datasets.local_dataset.LocalTextDataset(*args: Any, **kwargs: Any)¶
Bases: BaseTextDataset
Dataset that streams text from local .txt files in a directory.
class olm.data.datasets.local_dataset.Union¶
Bases: object
Represent a union type
E.g. for int | str