Skip to content

olm.nn.activations

class olm.nn.activations.ActivationBase(*args: Any, **kwargs: Any)

Bases: Module, ABC

Abstract base class for all activation functions.

Ensures a consistent interface for activation layers, handling device and dtype initialization. Subclasses must implement the forward method.

device

The device the module is on.

  • Type: torch.device, optional

dtype

The data type of the module parameters.

  • Type: torch.dtype

abstractmethod forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.ELU(*args: Any, **kwargs: Any)

Bases: ActivationBase

ELU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.GELU(*args: Any, **kwargs: Any)

Bases: ActivationBase

GELU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.GLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

GLU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.GeGLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

GeGLU activation function.

Implements the GeGLU variant from “GLU Variants Improve Transformer”. GeGLU(x, W, V) = GELU(xW) * (xV) Here: GeGLU(x) = GELU(gate) * value

  • Parameters:
  • device (torch.device , optional) – Target device.
  • dtype (torch.dtype , optional) – Target data type.

forward(x: torch.Tensor) → torch.Tensor

Forward pass of GeGLU.

  • Parameters: x (torch.Tensor) – Input tensor.
  • Returns: Output tensor with half the last dimension.
  • Return type: torch.Tensor

class olm.nn.activations.Identity(*args: Any, **kwargs: Any)

Bases: ActivationBase

Identity activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.LeakyReLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

LeakyReLU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.LiGLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

LiGLU activation function.

Implements the LiGLU variant (Linear GLU). LiGLU(x, W, V) = (xW) * (xV) Here: LiGLU(x) = gate * value (No activation on gate)

  • Parameters:
  • device (torch.device , optional) – Target device.
  • dtype (torch.dtype , optional) – Target data type.

forward(x: torch.Tensor) → torch.Tensor

Forward pass of LiGLU.

  • Parameters: x (torch.Tensor) – Input tensor.
  • Returns: Output tensor with half the last dimension.
  • Return type: torch.Tensor

class olm.nn.activations.Mish(*args: Any, **kwargs: Any)

Bases: ActivationBase

Mish activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.PReLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

PReLU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.ReGLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

ReGLU activation function.

Implements the ReGLU variant from “GLU Variants Improve Transformer”. ReGLU(x, W, V) = ReLU(xW) * (xV) Here: ReGLU(x) = ReLU(gate) * value

  • Parameters:
  • device (torch.device , optional) – Target device.
  • dtype (torch.dtype , optional) – Target data type.

forward(x: torch.Tensor) → torch.Tensor

Forward pass of ReGLU.

  • Parameters: x (torch.Tensor) – Input tensor.
  • Returns: Output tensor with half the last dimension.
  • Return type: torch.Tensor

class olm.nn.activations.ReLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

ReLU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.SELU(*args: Any, **kwargs: Any)

Bases: ActivationBase

SELU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.SiLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

SiLU (Swish) activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.Sigmoid(*args: Any, **kwargs: Any)

Bases: ActivationBase

Sigmoid activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.Softmax(*args: Any, **kwargs: Any)

Bases: ActivationBase

Softmax activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.Softplus(*args: Any, **kwargs: Any)

Bases: ActivationBase

Softplus activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.activations.SwiGLU(*args: Any, **kwargs: Any)

Bases: ActivationBase

SwiGLU activation function.

Implements the SwiGLU activation as described in “GLU Variants Improve Transformer”. It applies the SiLU activation to one half of the input (the gate) and multiplies it by the other half (the value).

Equation: : SwiGLU(x, W, V) = Swish_1(xW) * (xV) Here, we assume the input x is already projected/concatenated such that we chunk it. So: SwiGLU(x) = (x_1 * SiLU(x_2)) where x = [x_1, x_2]

  • Parameters:
  • device (torch.device , optional) – Target device.
  • dtype (torch.dtype , optional) – Target data type.

forward(x: torch.Tensor) → torch.Tensor

Forward pass of SwiGLU.

  • Parameters: x (torch.Tensor) – Input tensor. Expected to have an even last dimension size.
  • Returns: Output tensor with half the last dimension of the input.
  • Return type: torch.Tensor

class olm.nn.activations.Tanh(*args: Any, **kwargs: Any)

Bases: ActivationBase

Tanh activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

Modules

base
clu
elu
geglu
gelu
glu
identity
leaky_relu
liglu
mish
prelu
reglu
relu
selu
sigmoid
silu
softmax
softplus
swiglu
tanh