Skip to content

olm.nn.feedforward.classic_ffn

Classes

ClassicFFN(*args, **kwargs) Standard Multi-Layer Perceptron (MLP) used in Transformer blocks.

class olm.nn.feedforward.classic_ffn.ClassicFFN(*args: Any, **kwargs: Any)

Bases: FeedForwardBase

Standard Multi-Layer Perceptron (MLP) used in Transformer blocks.

Implements a position-wise feed-forward network consisting of two linear transformations with a non-linear activation function in between.

Structure: : Input -> Linear(embed_dim -> hidden_dim) -> Activation -> Dropout -> Linear(hidden_dim -> embed_dim) -> Dropout

hidden_dim

Dimension of the inner hidden layer.

  • Type: int

up_proj

Projection from embedding dim to hidden dim.

act

Activation function.

  • Type: nn.Module

down_proj

Projection from hidden dim to embedding dim.

dropout

Dropout layer.

  • Type: nn.Dropout

forward(x)

Forward pass of the feedforward network.

  • Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
  • Returns: Output tensor of shape (batch, seq_len, embed_dim).
  • Return type: torch.Tensor

class olm.nn.feedforward.classic_ffn.FeedForwardBase(*args: Any, **kwargs: Any)

Bases: Module, ABC

Abstract base class for feedforward networks in a transformer block.

Defines the interface for FFNs/MLPs. Subclasses must implement the forward method.

embed_dim

The input and output dimension.

  • Type: int

abstractmethod forward(x: torch.Tensor) → torch.Tensor

Forward pass of the feedforward network.

  • Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
  • Returns: Output tensor of shape (batch, seq_len, embed_dim).
  • Return type: torch.Tensor

class olm.nn.feedforward.classic_ffn.GELU(*args: Any, **kwargs: Any)

Bases: ActivationBase

GELU activation wrapper.

forward(x: torch.Tensor) → torch.Tensor

Apply activation to x.

class olm.nn.feedforward.classic_ffn.Linear(*args: Any, **kwargs: Any)

Bases: Linear

forward(x)