`olm.models.allenai.olmo`

Source: src/olm/models/allenai/olmo.py:1

Classes

`OLMoBlock(embed_dim: int, intermediate_size: int, num_heads: int, max_seq_len: int, dropout: float)`

Bases: olm.nn.structure.block.Block

Source: src/olm/models/allenai/olmo.py:10

A single Transformer block for the OLMo architecture.

Structure

x = x + Attn(LN(x)) x = x + SwiGLU(LN(x))

Parameters

embed_dim (int): Model dimension.
intermediate_size (int): FFN hidden dimension.
num_heads (int): Number of attention heads.
max_seq_len (int): Max context.
dropout (float): Dropout probability.

Methods

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

x: Input tensor.

Returns

Output tensor after all blocks have been applied.

`OLMoModel(vocab_size: int, embed_dim: int, intermediate_size: int, num_layers: int, num_heads: int, max_seq_len: int = 2048, dropout: float = 0.0, tie_weights: bool = True)`

Bases: olm.nn.structure.block.Block

Source: src/olm/models/allenai/olmo.py:68

Base class for the OLMo (Open Language Model) architecture.

Structure

Embedding -> [OLMoBlock] x N -> LayerNorm -> tied OutputHead.

Forward

Accepts token IDs shaped [batch, seq_len] and returns logits shaped [batch, seq_len, vocab_size].

Methods

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

x: Input tensor.

Returns

Output tensor after all blocks have been applied.

`OLMo_7B()`

Bases: olm.models.allenai.olmo.OLMoModel

Source: src/olm/models/allenai/olmo.py:113

OLMo 7B Model.

Methods

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

Source: src/olm/nn/structure/block.py:26

Apply each block to the input in sequence.

Parameters

x: Input tensor.

Returns

Output tensor after all blocks have been applied.

Classes

OLMoBlock(embed_dim: int, intermediate_size: int, num_heads: int, max_seq_len: int, dropout: float)

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

OLMoModel(vocab_size: int, embed_dim: int, intermediate_size: int, num_layers: int, num_heads: int, max_seq_len: int = 2048, dropout: float = 0.0, tie_weights: bool = True)

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

OLMo_7B()

Methods

forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)

`OLMoBlock(embed_dim: int, intermediate_size: int, num_heads: int, max_seq_len: int, dropout: float)`

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

`OLMoModel(vocab_size: int, embed_dim: int, intermediate_size: int, num_layers: int, num_heads: int, max_seq_len: int = 2048, dropout: float = 0.0, tie_weights: bool = True)`

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)

`OLMo_7B()`

`forward(self, x: torch.Tensor) -> torch.Tensor` (inherited from `Block`)