olm.nn.blocks.LM¶
class olm.nn.blocks.LM(*args: Any, **kwargs: Any)¶
Bases: Block
A simple Language Model (LM) architecture.
This model consists of an embedding layer, a stack of Transformer blocks, and a final output projection to the vocabulary size. It is designed for causal language modeling (next-token prediction).
Structure: : Input IDs -> Embedding -> [TransformerBlock] x N -> OutputHead -> Logits
- Parameters:
- vocab_size (int) – Size of the vocabulary.
- embed_dim (int) – Dimension of the embeddings and hidden states.
- num_heads (int) – Number of attention heads in Transformer blocks.
- num_layers (int) – Number of Transformer blocks.
- max_seq_len (int) – Maximum sequence length for the model.
- dropout (float , optional) – Dropout probability. Defaults to 0.0.
- causal (bool , optional) – Whether to use causal masking. Defaults to True.
- ff_multiplier (float , optional) – Multiplier for FFN hidden dimension. Defaults to 2.5.
layers¶
The sequence of layers in the model.
- Type: nn.ModuleList
__init__(vocab_size: int, embed_dim: int, num_heads: int, num_layers: int, max_seq_len: int, dropout: float = 0.0, causal: bool = True, ff_multiplier: float = 2.5)¶
Methods¶
__init__(vocab_size, embed_dim, num_heads, ...) |
|
|---|---|
forward(x) |
Apply each block to the input in sequence. |