Source: src/olm/data/tokenization/hf_tokenizer.py:1
Classes
HFTokenizer(model_path: str)
Bases: olm.data.tokenization.base.TokenizerBase
Source: src/olm/data/tokenization/hf_tokenizer.py:8
Methods
decode(self, tokens: torch.Tensor) -> str
Source: src/olm/data/tokenization/hf_tokenizer.py:28
Decodes a single 1D tensor of token IDs back into a string.
encode(self, text: str, add_special_tokens: bool = True) -> torch.Tensor
Source: src/olm/data/tokenization/hf_tokenizer.py:13
Encodes a single string into a 1D PyTorch tensor of input IDs. Padding is implicitly disabled for single inputs.
save(self, path: str) -> None
Source: src/olm/data/tokenization/hf_tokenizer.py:36
Saves tokenizer in HuggingFace format.
path must be a directory.