olm.nn.embeddings¶

class olm.nn.embeddings.AbsolutePositionalEmbedding(*args: Any, **kwargs: Any)¶

Bases: PositionalEmbeddingBase

Absolute (Learned) Positional Embedding.

This is the standard positional embedding used in the original Transformer paper and models like GPT-2. It learns a separate embedding vector for each position in the sequence, up to a maximum sequence length.

These embeddings are typically added to token embeddings before passing through the transformer blocks.

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor¶

Apply absolute positional embedding to input tensor x.

Parameters:
x – shape (batch_size, seq_len, embed_dim) - token embeddings
seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
Returns: Tensor of same shape as x, with positional embeddings added.

class olm.nn.embeddings.Embedding(*args: Any, **kwargs: Any)¶

Bases: Module

Token Embedding layer.

Wraps standard PyTorch embedding with a clean interface. Maps integer indices to dense vectors.

Parameters:
vocab_size (int) – Size of the vocabulary.
embedding_dim (int) – Dimensionality of the word embeddings.

embedding¶

The underlying PyTorch embedding layer.

Type: nn.Embedding

forward(x: torch.Tensor) → torch.Tensor¶

Forward pass of the Embedding layer.

Parameters: x (torch.Tensor) – Input tensor of shape (batch_size, seq_len) containing token IDs.
Returns: Output tensor of shape (batch_size, seq_len, embedding_dim).
Return type: torch.Tensor

Modules¶

`positional`
`token_embed`