Source: src/olm/nn/feedforward/__init__.py:1
Classes
ClassicFFN(embed_dim, hidden_dim=None, activation_fn=None, dropout=0.0, bias=True)
Bases: olm.nn.feedforward.base.FeedForwardBase
Source: src/olm/nn/feedforward/classic_ffn.py:7
Standard Multi-Layer Perceptron (MLP) used in Transformer blocks.
Implements a position-wise feed-forward network consisting of two linear transformations with a non-linear activation function in between.
Structure
Input -> Linear(embed_dim -> hidden_dim) -> Activation -> Dropout -> Linear(hidden_dim -> embed_dim) -> Dropout
Attributes
hidden_dim(int): Dimension of the inner hidden layer.up_proj(Linear): Projection from embedding dim to hidden dim.act(nn.Module): Activation function.down_proj(Linear): Projection from hidden dim to embedding dim.dropout(nn.Dropout): Dropout layer.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor
Source: src/olm/nn/feedforward/classic_ffn.py:51
Apply the position-wise feed-forward network.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].
ClassicMoEFFN(embed_dim: int, num_experts: int = 8, num_shared_experts: int = 0, top_k: int = 2, hidden_dim: int = None, activation_fn=None, dropout: float = 0.0, bias: bool = True, **kwargs)
Bases: olm.nn.feedforward.moe_base.MoEFeedForwardBase
Source: src/olm/nn/feedforward/classic_moe.py:4
Mixture of Experts version of ClassicFFN.
Parameters
embed_dim(int): Input and output dimension.num_experts(int): Number of experts.num_shared_experts(int): Number of shared experts.top_k(int): Number of experts to route to.hidden_dim(int, optional): Hidden dimension of each expert.activation_fn(nn.Module, optional): Activation function for experts.dropout(float, optional): Dropout probability.bias(bool, optional): Whether to use bias in linear layers.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from MoEFeedForwardBase)
Source: src/olm/nn/feedforward/moe_base.py:100
Forward pass with MoE routing.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].
FeedForwardBase(embed_dim: int, **kwargs)
Bases: Module, ABC
Source: src/olm/nn/feedforward/base.py:5
Abstract base class for feedforward networks in a transformer block.
Defines the interface for FFNs/MLPs. Subclasses must implement the forward method.
Attributes
embed_dim(int): The input and output dimension.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor
Source: src/olm/nn/feedforward/base.py:25
Forward pass of the feedforward network.
Parameters
x(torch.Tensor): Input tensor of shape (batch, seq_len, embed_dim).
Returns
torch.Tensor: Output tensor of shape (batch, seq_len, embed_dim).
GeGLUFFN(embed_dim: int, hidden_dim: int = None, dropout: float = 0.0, bias: bool = True, ff_multiplier: float = 4.0)
Bases: olm.nn.feedforward.base.FeedForwardBase
Source: src/olm/nn/feedforward/geglu_ffn.py:8
Feed-Forward Network using GeGLU activation.
Implements: x = DownProj(GeGLU(UpProj(x))). UpProj expands to 2 * hidden_dim to support splitting for the gate.
Parameters
embed_dim(int): Input dimension.hidden_dim(int, optional): Hidden dimension. Defaults to 4 * embed_dim if None.dropout(float, optional): Dropout probability. Defaults to 0.0.bias(bool, optional): Whether to usage bias in linear layers. Defaults to True.ff_multiplier(float, optional): Expansion factor if hidden_dim is None. Defaults to 4.0.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor
Source: src/olm/nn/feedforward/geglu_ffn.py:54
Apply GeGLU feed-forward projection.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].
GeGLUMoEFFN(embed_dim: int, num_experts: int = 8, num_shared_experts: int = 0, top_k: int = 2, hidden_dim: int = None, dropout: float = 0.0, bias: bool = True, ff_multiplier: float = 4.0, **kwargs)
Bases: olm.nn.feedforward.moe_base.MoEFeedForwardBase
Source: src/olm/nn/feedforward/geglu_moe.py:4
Mixture of Experts version of GeGLUFFN.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from MoEFeedForwardBase)
Source: src/olm/nn/feedforward/moe_base.py:100
Forward pass with MoE routing.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].
SwiGLUFFN(embed_dim: int, hidden_dim: int = None, dropout: float = 0.0, bias: bool = True, ff_multiplier: float = 2.5)
Bases: olm.nn.feedforward.base.FeedForwardBase
Source: src/olm/nn/feedforward/swiglu_ffn.py:8
SwiGLU-based feed-forward network used in modern Transformers (e.g., LLaMA, PaLM).
This layer implements the gated linear unit with Swish (SiLU) activation, which has been shown to improve performance over standard GELU/ReLU FFNs.
Structure
Input -> Linear(embed_dim -> 2 * hidden_dim) [Splits into Gate and Value] -> SwiGLU(Gate * SiLU(Value)) -> Linear(hidden_dim -> embed_dim) -> Dropout
Parameters
embed_dim(int): The dimension of the input and output.hidden_dim(int, optional): The intermediate inner dimension. If None, defaults toint(ff_multiplier * embed_dim).dropout(float, optional): Dropout probability. Defaults to 0.0.bias(bool, optional): Whether to use bias in linear layers. Defaults to True.ff_multiplier(float, optional): Multiplier for default hidden dimension. Defaults to 2.5 (commonly 8/3 for SwiGLU).
Attributes
up_proj(Linear): Projects and splits input into gate and value parts.act(SwiGLU): The activation function.down_proj(Linear): Projects back to embedding dimension.dropout(nn.Dropout): Dropout layer.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor
Source: src/olm/nn/feedforward/swiglu_ffn.py:68
Apply SwiGLU feed-forward projection.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].
SwiGLUMoEFFN(embed_dim: int, num_experts: int = 8, num_shared_experts: int = 0, top_k: int = 2, hidden_dim: int = None, dropout: float = 0.0, bias: bool = True, ff_multiplier: float = 2.5, **kwargs)
Bases: olm.nn.feedforward.moe_base.MoEFeedForwardBase
Source: src/olm/nn/feedforward/swiglu_moe.py:4
Mixture of Experts version of SwiGLUFFN.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from MoEFeedForwardBase)
Source: src/olm/nn/feedforward/moe_base.py:100
Forward pass with MoE routing.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].