OLM API Reference

`olm.data.tokenization.hf_tokenizer`

Source: src/olm/data/tokenization/hf_tokenizer.py:1

Classes

HFTokenizer(model_path: str)

Bases: olm.data.tokenization.base.TokenizerBase

Source: src/olm/data/tokenization/hf_tokenizer.py:8

Methods

decode(self, tokens: torch.Tensor) -> str

Source: src/olm/data/tokenization/hf_tokenizer.py:28

Decodes a single 1D tensor of token IDs back into a string.

encode(self, text: str, add_special_tokens: bool = True) -> torch.Tensor

Source: src/olm/data/tokenization/hf_tokenizer.py:13

Encodes a single string into a 1D PyTorch tensor of input IDs. Padding is implicitly disabled for single inputs.

save(self, path: str) -> None

Source: src/olm/data/tokenization/hf_tokenizer.py:36

Saves tokenizer in HuggingFace format. path must be a directory.