Source: src/olm/models/facebook/opt.py:1
Classes
OPT125M()
Bases: olm.models.facebook.opt.OPTModel
Source: src/olm/models/facebook/opt.py:131
OPT 125M Model Definition.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.
OPTBlock(embed_dim: int, intermediate_size: int, num_heads: int, dropout: float = 0.1)
Bases: olm.nn.structure.block.Block
Source: src/olm/models/facebook/opt.py:13
A single Transformer block for the OPT architecture.
Composes a Residual Multi-Head Attention block and a Residual ReLU Feed-Forward block, both utilizing Pre-LayerNorm.
Structure
x = x + MultiHeadAttention(LayerNorm(x)) x = x + ReLU(LayerNorm(x))
Parameters
embed_dim(int): The dimension of the model.intermediate_size(int): The hidden dimension of the feed-forward network.num_heads(int): Number of attention heads.dropout(float): Dropout probability.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.
OPTModel(vocab_size, embed_dim, intermediate_size, num_layers, num_heads, dropout=0.1, tie_weights=True)
Bases: olm.nn.structure.block.Block
Source: src/olm/models/facebook/opt.py:69
OPT Model Definition.
Implements a decoder-only Transformer with specific OPT optimizations:
- Pre-normalization with LayerNorm
- Multi-Head Attention with Causal Masking
- ReLU activation in Feed-Forward Networks
- Tied output projection through
OutputHeadby default
Forward
Accepts token IDs shaped [batch, seq_len] and returns logits shaped
[batch, seq_len, vocab_size].
Parameters
vocab_size(int): Vocabulary size.embed_dim(int): Embedding dimension.intermediate_size(int): FFN dimension.num_layers(int): Number of layers.num_heads(int): Number of heads.dropout(float, optional): Dropout probability. Defaults to 0.1.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.