OLM API Reference

`olm.models.facebook.opt`

Source: src/olm/models/facebook/opt.py:1

Classes

OPT125M()

Bases: olm.models.facebook.opt.OPTModel

Source: src/olm/models/facebook/opt.py:131

OPT 125M Model Definition.

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

  • x: Input tensor.

Returns

Output tensor after all blocks have been applied.

OPTBlock(embed_dim: int, intermediate_size: int, num_heads: int, dropout: float = 0.1)

Bases: olm.nn.structure.block.Block

Source: src/olm/models/facebook/opt.py:13

A single Transformer block for the OPT architecture.

Composes a Residual Multi-Head Attention block and a Residual ReLU Feed-Forward block, both utilizing Pre-LayerNorm.

Structure

x = x + MultiHeadAttention(LayerNorm(x)) x = x + ReLU(LayerNorm(x))

Parameters

  • embed_dim (int): The dimension of the model.
  • intermediate_size (int): The hidden dimension of the feed-forward network.
  • num_heads (int): Number of attention heads.
  • dropout (float): Dropout probability.

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

  • x: Input tensor.

Returns

Output tensor after all blocks have been applied.

OPTModel(vocab_size, embed_dim, intermediate_size, num_layers, num_heads, dropout=0.1, tie_weights=True)

Bases: olm.nn.structure.block.Block

Source: src/olm/models/facebook/opt.py:69

OPT Model Definition.

Implements a decoder-only Transformer with specific OPT optimizations:

  • Pre-normalization with LayerNorm
  • Multi-Head Attention with Causal Masking
  • ReLU activation in Feed-Forward Networks
  • Tied output projection through OutputHead by default

Forward

Accepts token IDs shaped [batch, seq_len] and returns logits shaped [batch, seq_len, vocab_size].

Parameters

  • vocab_size (int): Vocabulary size.
  • embed_dim (int): Embedding dimension.
  • intermediate_size (int): FFN dimension.
  • num_layers (int): Number of layers.
  • num_heads (int): Number of heads.
  • dropout (float, optional): Dropout probability. Defaults to 0.1.

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

  • x: Input tensor.

Returns

Output tensor after all blocks have been applied.