`olm.models.facebook.opt`

Source: src/olm/models/facebook/opt.py:1

Classes

`OPT125M()`

Bases: olm.models.facebook.opt.OPTModel

Source: src/olm/models/facebook/opt.py:131

OPT 125M Model Definition.

Methods

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

x: Input tensor.

Returns

Output tensor after all blocks have been applied.

`OPTBlock(embed_dim: int, intermediate_size: int, num_heads: int, dropout: float = 0.1)`

Bases: olm.nn.structure.block.Block

Source: src/olm/models/facebook/opt.py:13

A single Transformer block for the OPT architecture.

Composes a Residual Multi-Head Attention block and a Residual ReLU Feed-Forward block, both utilizing Pre-LayerNorm.

Structure

x = x + MultiHeadAttention(LayerNorm(x)) x = x + ReLU(LayerNorm(x))

Parameters

embed_dim (int): The dimension of the model.
intermediate_size (int): The hidden dimension of the feed-forward network.
num_heads (int): Number of attention heads.
dropout (float): Dropout probability.

Methods

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

x: Input tensor.

Returns

Output tensor after all blocks have been applied.

`OPTModel(vocab_size, embed_dim, intermediate_size, num_layers, num_heads, dropout=0.1, tie_weights=True)`

Bases: olm.nn.structure.block.Block

Source: src/olm/models/facebook/opt.py:69

OPT Model Definition.

Implements a decoder-only Transformer with specific OPT optimizations:

Pre-normalization with LayerNorm
Multi-Head Attention with Causal Masking
ReLU activation in Feed-Forward Networks
Tied output projection through OutputHead by default

Forward

Accepts token IDs shaped [batch, seq_len] and returns logits shaped [batch, seq_len, vocab_size].

Parameters

vocab_size (int): Vocabulary size.
embed_dim (int): Embedding dimension.
intermediate_size (int): FFN dimension.
num_layers (int): Number of layers.
num_heads (int): Number of heads.
dropout (float, optional): Dropout probability. Defaults to 0.1.

Methods

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

x: Input tensor.

Returns

Output tensor after all blocks have been applied.

Classes

OPT125M()

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

OPTBlock(embed_dim: int, intermediate_size: int, num_heads: int, dropout: float = 0.1)

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

OPTModel(vocab_size, embed_dim, intermediate_size, num_layers, num_heads, dropout=0.1, tie_weights=True)

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

`OPT125M()`

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

`OPTBlock(embed_dim: int, intermediate_size: int, num_heads: int, dropout: float = 0.1)`

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

`OPTModel(vocab_size, embed_dim, intermediate_size, num_layers, num_heads, dropout=0.1, tie_weights=True)`

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)