olm.nn.norms¶

class olm.nn.norms.LayerNorm(*args: Any, **kwargs: Any)¶

Bases: NormBase

Layer Normalization layer.

Implements Layer Normalization as described in “Layer Normalization” (https://arxiv.org/abs/1607.06450). Normalizes the input across the features dimension.

Parameters:
d_model (int) – The dimension of the model to normalize.
eps (float , optional) – Small constant for numerical stability. Defaults to 1e-5.
device (torch.device , optional) – Target device.
dtype (torch.dtype , optional) – Target data type.

gamma¶

Learnable scale parameter.

Type: nn.Parameter

beta¶

Learnable shift parameter.

Type: nn.Parameter

forward(x: torch.Tensor) → torch.Tensor¶

Forward pass of LayerNorm.

Parameters: x (torch.Tensor) – Input tensor of shape (batch_size, sequence_length, d_model).
Returns: Normalized output tensor of the same shape.
Return type: torch.Tensor

class olm.nn.norms.RMSNorm(*args: Any, **kwargs: Any)¶

Bases: NormBase

RMSNorm (Root Mean Square Layer Normalization) layer.

Implements RMSNorm as described in “Root Mean Square Layer Normalization” (https://arxiv.org/abs/1910.07467). A simplified version of LayerNorm that scales invariance properties.

Parameters:
d_model (int) – The dimension of the model to normalize.
eps (float , optional) – Small constant for numerical stability. Defaults to 1e-5.
device (torch.device , optional) – Target device.
dtype (torch.dtype , optional) – Target data type.

weight¶

Learnable scale parameter.

Type: nn.Parameter

forward(x: torch.Tensor) → torch.Tensor¶

Forward pass of RMSNorm.

Parameters: x (torch.Tensor) – Input tensor of shape (batch_size, sequence_length, d_model).
Returns: Normalized output tensor of the same shape.
Return type: torch.Tensor

Modules¶

`base`
`layer_norm`
`rms_norm`