olm.nn.feedforward¶
class olm.nn.feedforward.ClassicFFN(*args: Any, **kwargs: Any)¶
Bases: FeedForwardBase
Standard Multi-Layer Perceptron (MLP) used in Transformer blocks.
Implements a position-wise feed-forward network consisting of two linear transformations with a non-linear activation function in between.
Structure: : Input -> Linear(embed_dim -> hidden_dim) -> Activation -> Dropout -> Linear(hidden_dim -> embed_dim) -> Dropout
hidden_dim¶
Dimension of the inner hidden layer.
- Type: int
up_proj¶
Projection from embedding dim to hidden dim.
- Type: Linear
act¶
Activation function.
- Type: nn.Module
down_proj¶
Projection from hidden dim to embedding dim.
- Type: Linear
dropout¶
Dropout layer.
- Type: nn.Dropout
forward(x)¶
Forward pass of the feedforward network.
- Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
- Returns: Output tensor of shape (batch, seq_len, embed_dim).
- Return type: torch.Tensor
class olm.nn.feedforward.FeedForwardBase(*args: Any, **kwargs: Any)¶
Bases: Module, ABC
Abstract base class for feedforward networks in a transformer block.
Defines the interface for FFNs/MLPs. Subclasses must implement the forward method.
embed_dim¶
The input and output dimension.
- Type: int
abstractmethod forward(x: torch.Tensor) → torch.Tensor¶
Forward pass of the feedforward network.
- Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
- Returns: Output tensor of shape (batch, seq_len, embed_dim).
- Return type: torch.Tensor
class olm.nn.feedforward.GeGLUFFN(*args: Any, **kwargs: Any)¶
Bases: FeedForwardBase
Feed-Forward Network using GeGLU activation.
Implements: x = DownProj(GeGLU(UpProj(x))). UpProj expands to 2 * hidden_dim to support splitting for the gate.
- Parameters:
- embed_dim (int) – Input dimension.
- hidden_dim (int , optional) – Hidden dimension. Defaults to 4 * embed_dim if None.
- dropout (float , optional) – Dropout probability. Defaults to 0.0.
- bias (bool , optional) – Whether to usage bias in linear layers. Defaults to True.
- ff_multiplier (float , optional) – Expansion factor if hidden_dim is None. Defaults to 4.0.
forward(x)¶
Forward pass of the feedforward network.
- Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
- Returns: Output tensor of shape (batch, seq_len, embed_dim).
- Return type: torch.Tensor
class olm.nn.feedforward.SwiGLUFFN(*args: Any, **kwargs: Any)¶
Bases: FeedForwardBase
SwiGLU-based feed-forward network used in modern Transformers (e.g., LLaMA, PaLM).
This layer implements the gated linear unit with Swish (SiLU) activation, which has been shown to improve performance over standard GELU/ReLU FFNs.
Structure: : Input -> Linear(embed_dim -> 2 * hidden_dim) [Splits into Gate and Value] -> SwiGLU(Gate * SiLU(Value)) -> Linear(hidden_dim -> embed_dim) -> Dropout
- Parameters:
- embed_dim (int) – The dimension of the input and output.
- hidden_dim (int , optional) – The intermediate inner dimension. If None, defaults to int(ff_multiplier * embed_dim).
- dropout (float , optional) – Dropout probability. Defaults to 0.0.
- bias (bool , optional) – Whether to use bias in linear layers. Defaults to True.
- ff_multiplier (float , optional) – Multiplier for default hidden dimension. Defaults to 2.5 (commonly 8/3 for SwiGLU).
up_proj¶
Projects and splits input into gate and value parts.
- Type: Linear
act¶
The activation function.
- Type: SwiGLU
down_proj¶
Projects back to embedding dimension.
- Type: Linear
dropout¶
Dropout layer.
- Type: nn.Dropout
forward(x)¶
Forward pass of the feedforward network.
- Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
- Returns: Output tensor of shape (batch, seq_len, embed_dim).
- Return type: torch.Tensor
Modules¶
base |
|
|---|---|
classic_ffn |
|
geglu_ffn |
|
swiglu_ffn |