Source: src/olm/nn/feedforward/classic_moe.py:1
Classes
ClassicMoEFFN(embed_dim: int, num_experts: int = 8, num_shared_experts: int = 0, top_k: int = 2, hidden_dim: int = None, activation_fn=None, dropout: float = 0.0, bias: bool = True, **kwargs)
Bases: olm.nn.feedforward.moe_base.MoEFeedForwardBase
Source: src/olm/nn/feedforward/classic_moe.py:4
Mixture of Experts version of ClassicFFN.
Parameters
embed_dim(int): Input and output dimension.num_experts(int): Number of experts.num_shared_experts(int): Number of shared experts.top_k(int): Number of experts to route to.hidden_dim(int, optional): Hidden dimension of each expert.activation_fn(nn.Module, optional): Activation function for experts.dropout(float, optional): Dropout probability.bias(bool, optional): Whether to use bias in linear layers.
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from MoEFeedForwardBase)
Source: src/olm/nn/feedforward/moe_base.py:100
Forward pass with MoE routing.
Parameters
x(torch.Tensor): Hidden states shaped[batch, seq_len, embed_dim].
Returns
torch.Tensor: Hidden states shaped[batch, seq_len, embed_dim].