Skip to content

olm.nn.embeddings

class olm.nn.embeddings.AbsolutePositionalEmbedding(*args: Any, **kwargs: Any)

Bases: PositionalEmbeddingBase

Absolute (Learned) Positional Embedding.

This is the standard positional embedding used in the original Transformer paper and models like GPT-2. It learns a separate embedding vector for each position in the sequence, up to a maximum sequence length.

These embeddings are typically added to token embeddings before passing through the transformer blocks.

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor

Apply absolute positional embedding to input tensor x.

  • Parameters:
  • x – shape (batch_size, seq_len, embed_dim) - token embeddings
  • seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
  • Returns: Tensor of same shape as x, with positional embeddings added.

class olm.nn.embeddings.Embedding(*args: Any, **kwargs: Any)

Bases: Module

Token Embedding layer.

Wraps standard PyTorch embedding with a clean interface. Maps integer indices to dense vectors.

  • Parameters:
  • vocab_size (int) – Size of the vocabulary.
  • embedding_dim (int) – Dimensionality of the word embeddings.

embedding

The underlying PyTorch embedding layer.

  • Type: nn.Embedding

forward(x: torch.Tensor) → torch.Tensor

Forward pass of the Embedding layer.

  • Parameters: x (torch.Tensor) – Input tensor of shape (batch_size, seq_len) containing token IDs.
  • Returns: Output tensor of shape (batch_size, seq_len, embedding_dim).
  • Return type: torch.Tensor

Modules

positional
token_embed