Skip to content

olm.nn.embeddings.positional.rope

Classes

PartialRotaryPositionalEmbedding(*args, **kwargs) Partial Rotary Positional Embedding (LLaMA-style RoPE).
RotaryPositionalEmbedding(*args, **kwargs) Rotary Positional Embedding (RoPE) as described in “RoFormer: Enhanced Transformer with Rotary Position Embedding” (arXiv 2104.09864).

class olm.nn.embeddings.positional.rope.PartialRotaryPositionalEmbedding(*args: Any, **kwargs: Any)

Bases: PositionalEmbeddingBase

Partial Rotary Positional Embedding (LLaMA-style RoPE).

Only applies rotary embeddings to a fraction of the head dimensions, leaving the remaining dimensions unchanged. This is the approach used in models like LLaMA, where typically 25-50% of dimensions are rotated.

For example, with head_dim=128 and rotary_percentage=0.5, only the first 64 dimensions are rotated, while the last 64 dimensions pass through unchanged.

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor

Apply partial rotary positional embedding to input tensor x.

  • Parameters:
  • x – shape (batch_size, seq_len, num_heads, head_dim)
  • seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
  • Returns: Tensor of same shape as x, with partial RoPE applied.

class olm.nn.embeddings.positional.rope.PositionalEmbeddingBase(*args: Any, **kwargs: Any)

Bases: Module, ABC

Abstract base class for all positional embedding implementations.

Positional embeddings add information about token positions in a sequence to help the model understand order and relative positions. Different positional embedding strategies have different properties:

  • Learned (Absolute): Simple, effective, but limited to max_seq_len
  • Sinusoidal: Deterministic, can extrapolate to longer sequences
  • RoPE: Applied to Q/K directly, enables relative position modeling
  • ALiBi: Adds bias to attention scores, excellent extrapolation

All positional embedding implementations should inherit from this base class and implement the forward method.

extra_repr() → str

String representation of the module for debugging.

Override this in subclasses to provide useful information.

abstractmethod forward(*args, **kwargs) → torch.Tensor

Apply positional information to input tensor(s).

The signature and behavior of this method varies by implementation: - Some add to embeddings (Absolute, Sinusoidal) - Some rotate representations (RoPE) - Some return bias to add to attention scores (ALiBi)

  • Returns: Transformed tensor(s) with positional information applied

class olm.nn.embeddings.positional.rope.RotaryPositionalEmbedding(*args: Any, **kwargs: Any)

Bases: PositionalEmbeddingBase

Rotary Positional Embedding (RoPE) as described in “RoFormer: Enhanced Transformer with Rotary Position Embedding” (arXiv 2104.09864).

This module precomputes sin/cos rotation frequencies for a given head‐dim, and then applies to query/key representations via interleaving real/imag parts (or equivalently pairs of dims).

forward(x: torch.Tensor, seq_positions: torch.LongTensor | None = None) → torch.Tensor

Apply rotary positional embedding to input tensor x.

  • Parameters:
  • x – shape (batch_size, seq_len, num_heads, head_dim)
  • seq_positions – optional tensor of shape (batch_size, seq_len) with position indices. If None, assumes positions are 0..seq_len-1 for each batch.
  • Returns: Tensor of same shape as x, with RoPE applied.