olm.nn.feedforward.geglu_ffn¶

`GeGLUFFN`(args, *kwargs)	Feed-Forward Network using GeGLU activation.

Bases: Module, ABC

Abstract base class for feedforward networks in a transformer block.

Defines the interface for FFNs/MLPs. Subclasses must implement the forward method.

The input and output dimension.

Forward pass of the feedforward network.

Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
Returns: Output tensor of shape (batch, seq_len, embed_dim).
Return type: torch.Tensor

GeGLU activation function.

Implements the GeGLU variant from “GLU Variants Improve Transformer”. GeGLU(x, W, V) = GELU(xW) * (xV) Here: GeGLU(x) = GELU(gate) * value

Forward pass of GeGLU.

Feed-Forward Network using GeGLU activation.

Implements: x = DownProj(GeGLU(UpProj(x))). UpProj expands to 2 * hidden_dim to support splitting for the gate.

Parameters:
embed_dim (int) – Input dimension.
hidden_dim (int , optional) – Hidden dimension. Defaults to 4 * embed_dim if None.
dropout (float , optional) – Dropout probability. Defaults to 0.0.
bias (bool , optional) – Whether to usage bias in linear layers. Defaults to True.
ff_multiplier (float , optional) – Expansion factor if hidden_dim is None. Defaults to 4.0.

Forward pass of the feedforward network.

Parameters: x (torch.Tensor) – Input tensor of shape (batch, seq_len, embed_dim).
Returns: Output tensor of shape (batch, seq_len, embed_dim).
Return type: torch.Tensor

Bases: Linear