olm.nn.embeddings.positional.rope¶

Classes¶

`PartialRotaryPositionalEmbedding`(args, *kwargs)	Partial Rotary Positional Embedding (LLaMA-style RoPE).
`RotaryPositionalEmbedding`(args, *kwargs)	Rotary Positional Embedding (RoPE) as described in “RoFormer: Enhanced Transformer with Rotary Position Embedding” (arXiv 2104.09864).

class olm.nn.embeddings.positional.rope.PartialRotaryPositionalEmbedding(*args: Any, **kwargs: Any)¶

Bases: PositionalEmbeddingBase

Partial Rotary Positional Embedding (LLaMA-style RoPE).

Only applies rotary embeddings to a fraction of the head dimensions, leaving the remaining dimensions unchanged. This is the approach used in models like LLaMA, where typically 25-50% of dimensions are rotated.

For example, with head_dim=128 and rotary_percentage=0.5, only the first 64 dimensions are rotated, while the last 64 dimensions pass through unchanged.

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor¶

Apply partial rotary positional embedding to input tensor x.

Parameters:
x – shape (batch_size, seq_len, num_heads, head_dim)
seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
Returns: Tensor of same shape as x, with partial RoPE applied.

class olm.nn.embeddings.positional.rope.PositionalEmbeddingBase(*args: Any, **kwargs: Any)¶

Bases: Module, ABC

Abstract base class for all positional embedding implementations.

Positional embeddings add information about token positions in a sequence to help the model understand order and relative positions. Different positional embedding strategies have different properties:

Learned (Absolute): Simple, effective, but limited to max_seq_len
Sinusoidal: Deterministic, can extrapolate to longer sequences
RoPE: Applied to Q/K directly, enables relative position modeling
ALiBi: Adds bias to attention scores, excellent extrapolation

All positional embedding implementations should inherit from this base class and implement the forward method.

extra_repr() → str¶

String representation of the module for debugging.

Override this in subclasses to provide useful information.

abstractmethod forward(*args, **kwargs) → torch.Tensor¶

Apply positional information to input tensor(s).

The signature and behavior of this method varies by implementation: - Some add to embeddings (Absolute, Sinusoidal) - Some rotate representations (RoPE) - Some return bias to add to attention scores (ALiBi)

Returns: Transformed tensor(s) with positional information applied

class olm.nn.embeddings.positional.rope.RotaryPositionalEmbedding(*args: Any, **kwargs: Any)¶

Bases: PositionalEmbeddingBase

Rotary Positional Embedding (RoPE) as described in “RoFormer: Enhanced Transformer with Rotary Position Embedding” (arXiv 2104.09864).

This module precomputes sin/cos rotation frequencies for a given head‐dim, and then applies to query/key representations via interleaving real/imag parts (or equivalently pairs of dims).

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor¶

Apply rotary positional embedding to input tensor x.

Parameters:
x – shape (batch_size, seq_len, num_heads, head_dim)
seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
Returns: Tensor of same shape as x, with RoPE applied.