Skip to content

olm.nn.blocks.output_head

Classes

OutputHead(*args, **kwargs) Final output projection layer for the Language Model.

class olm.nn.blocks.output_head.Block(*args: Any, **kwargs: Any)

Bases: Module

Lightweight sequential container for composable submodules.

Similar to nn.Sequential, but exposes the underlying list for inspection or dynamic manipulation by higher-level builders.

  • Parameters: blocks – Ordered list of modules applied to the input in sequence.

blocks

ModuleList storing the ordered blocks.

forward(x: torch.Tensor) → torch.Tensor

Apply each block to the input in sequence.

  • Parameters: x – Input tensor.
  • Returns: Output tensor after all blocks have been applied.

class olm.nn.blocks.output_head.LayerNorm(*args: Any, **kwargs: Any)

Bases: NormBase

Layer Normalization layer.

Implements Layer Normalization as described in “Layer Normalization” (https://arxiv.org/abs/1607.06450). Normalizes the input across the features dimension.

  • Parameters:
  • d_model (int) – The dimension of the model to normalize.
  • eps (float , optional) – Small constant for numerical stability. Defaults to 1e-5.
  • device (torch.device , optional) – Target device.
  • dtype (torch.dtype , optional) – Target data type.

gamma

Learnable scale parameter.

  • Type: nn.Parameter

beta

Learnable shift parameter.

  • Type: nn.Parameter

forward(x: torch.Tensor) → torch.Tensor

Forward pass of LayerNorm.

  • Parameters: x (torch.Tensor) – Input tensor of shape (batch_size, sequence_length, d_model).
  • Returns: Normalized output tensor of the same shape.
  • Return type: torch.Tensor

class olm.nn.blocks.output_head.Linear(*args: Any, **kwargs: Any)

Bases: Linear

forward(x)

class olm.nn.blocks.output_head.OutputHead(*args: Any, **kwargs: Any)

Bases: Block

Final output projection layer for the Language Model.

Consists of a LayerNorm followed by a Linear projection to the vocabulary size. Typical structure: LayerNorm -> Linear(vocab_size).

  • Parameters:
  • embed_dim (int) – The dimension of the embedding space.
  • vocab_size (int) – The size of the vocabulary.
  • bias (bool , optional) – Whether to include bias in the linear layer. Defaults to False.

layers

The normalization and linear layers.

  • Type: nn.ModuleList