olm.nn.activations¶
class olm.nn.activations.ActivationBase(*args: Any, **kwargs: Any)¶
Bases: Module, ABC
Abstract base class for all activation functions.
Ensures a consistent interface for activation layers, handling device and dtype initialization. Subclasses must implement the forward method.
device¶
The device the module is on.
- Type: torch.device, optional
dtype¶
The data type of the module parameters.
- Type: torch.dtype
abstractmethod forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.ELU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
ELU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.GELU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
GELU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.GLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
GLU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.GeGLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
GeGLU activation function.
Implements the GeGLU variant from “GLU Variants Improve Transformer”. GeGLU(x, W, V) = GELU(xW) * (xV) Here: GeGLU(x) = GELU(gate) * value
- Parameters:
- device (torch.device , optional) – Target device.
- dtype (torch.dtype , optional) – Target data type.
forward(x: torch.Tensor) → torch.Tensor¶
Forward pass of GeGLU.
- Parameters: x (torch.Tensor) – Input tensor.
- Returns: Output tensor with half the last dimension.
- Return type: torch.Tensor
class olm.nn.activations.Identity(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
Identity activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.LeakyReLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
LeakyReLU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.LiGLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
LiGLU activation function.
Implements the LiGLU variant (Linear GLU). LiGLU(x, W, V) = (xW) * (xV) Here: LiGLU(x) = gate * value (No activation on gate)
- Parameters:
- device (torch.device , optional) – Target device.
- dtype (torch.dtype , optional) – Target data type.
forward(x: torch.Tensor) → torch.Tensor¶
Forward pass of LiGLU.
- Parameters: x (torch.Tensor) – Input tensor.
- Returns: Output tensor with half the last dimension.
- Return type: torch.Tensor
class olm.nn.activations.Mish(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
Mish activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.PReLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
PReLU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.ReGLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
ReGLU activation function.
Implements the ReGLU variant from “GLU Variants Improve Transformer”. ReGLU(x, W, V) = ReLU(xW) * (xV) Here: ReGLU(x) = ReLU(gate) * value
- Parameters:
- device (torch.device , optional) – Target device.
- dtype (torch.dtype , optional) – Target data type.
forward(x: torch.Tensor) → torch.Tensor¶
Forward pass of ReGLU.
- Parameters: x (torch.Tensor) – Input tensor.
- Returns: Output tensor with half the last dimension.
- Return type: torch.Tensor
class olm.nn.activations.ReLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
ReLU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.SELU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
SELU activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.SiLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
SiLU (Swish) activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.Sigmoid(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
Sigmoid activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.Softmax(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
Softmax activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.Softplus(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
Softplus activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
class olm.nn.activations.SwiGLU(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
SwiGLU activation function.
Implements the SwiGLU activation as described in “GLU Variants Improve Transformer”. It applies the SiLU activation to one half of the input (the gate) and multiplies it by the other half (the value).
Equation: : SwiGLU(x, W, V) = Swish_1(xW) * (xV) Here, we assume the input x is already projected/concatenated such that we chunk it. So: SwiGLU(x) = (x_1 * SiLU(x_2)) where x = [x_1, x_2]
- Parameters:
- device (torch.device , optional) – Target device.
- dtype (torch.dtype , optional) – Target data type.
forward(x: torch.Tensor) → torch.Tensor¶
Forward pass of SwiGLU.
- Parameters: x (torch.Tensor) – Input tensor. Expected to have an even last dimension size.
- Returns: Output tensor with half the last dimension of the input.
- Return type: torch.Tensor
class olm.nn.activations.Tanh(*args: Any, **kwargs: Any)¶
Bases: ActivationBase
Tanh activation wrapper.
forward(x: torch.Tensor) → torch.Tensor¶
Apply activation to x.
Modules¶
base |
|
|---|---|
clu |
|
elu |
|
geglu |
|
gelu |
|
glu |
|
identity |
|
leaky_relu |
|
liglu |
|
mish |
|
prelu |
|
reglu |
|
relu |
|
selu |
|
sigmoid |
|
silu |
|
softmax |
|
softplus |
|
swiglu |
|
tanh |