olm.nn.embeddings.positional.absolute¶

Classes¶

`AbsolutePositionalEmbedding`(args, *kwargs)	Absolute (Learned) Positional Embedding.

class olm.nn.embeddings.positional.absolute.AbsolutePositionalEmbedding(*args: Any, **kwargs: Any)¶

Bases: PositionalEmbeddingBase

Absolute (Learned) Positional Embedding.

This is the standard positional embedding used in the original Transformer paper and models like GPT-2. It learns a separate embedding vector for each position in the sequence, up to a maximum sequence length.

These embeddings are typically added to token embeddings before passing through the transformer blocks.

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor¶

Apply absolute positional embedding to input tensor x.

Parameters:
x – shape (batch_size, seq_len, embed_dim) - token embeddings
seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
Returns: Tensor of same shape as x, with positional embeddings added.

class olm.nn.embeddings.positional.absolute.PositionalEmbeddingBase(*args: Any, **kwargs: Any)¶

Bases: Module, ABC

Abstract base class for all positional embedding implementations.

Positional embeddings add information about token positions in a sequence to help the model understand order and relative positions. Different positional embedding strategies have different properties:

Learned (Absolute): Simple, effective, but limited to max_seq_len
Sinusoidal: Deterministic, can extrapolate to longer sequences
RoPE: Applied to Q/K directly, enables relative position modeling
ALiBi: Adds bias to attention scores, excellent extrapolation

All positional embedding implementations should inherit from this base class and implement the forward method.

extra_repr() → str¶

String representation of the module for debugging.

Override this in subclasses to provide useful information.

abstractmethod forward(*args, **kwargs) → torch.Tensor¶

Apply positional information to input tensor(s).

The signature and behavior of this method varies by implementation: - Some add to embeddings (Absolute, Sinusoidal) - Some rotate representations (RoPE) - Some return bias to add to attention scores (ALiBi)

Returns: Transformed tensor(s) with positional information applied