Source: src/olm/models/openai/__init__.py:1
Classes
GPT2()
Bases: olm.models.openai.gpt2.GPT2Model
Source: src/olm/models/openai/gpt2.py:63
GPT-2 Small (124M).
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.
GPT2Large()
Bases: olm.models.openai.gpt2.GPT2Model
Source: src/olm/models/openai/gpt2.py:85
GPT-2 Large (774M).
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.
GPT2Medium()
Bases: olm.models.openai.gpt2.GPT2Model
Source: src/olm/models/openai/gpt2.py:74
GPT-2 Medium (355M).
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.
GPT2Model(vocab_size: int, embed_dim: int, num_layers: int, num_heads: int, max_seq_len: int, dropout: float = 0.1, tie_weights: bool = True)
Bases: olm.nn.structure.block.Block
Source: src/olm/models/openai/gpt2.py:34
Base class for GPT-2 models.
Structure
Token embedding + learned positional embedding -> GPT2Block x N -> tied OutputHead.
Forward
Accepts token IDs shaped [batch, seq_len] and returns logits shaped
[batch, seq_len, vocab_size].
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.
GPT2XL()
Bases: olm.models.openai.gpt2.GPT2Model
Source: src/olm/models/openai/gpt2.py:96
GPT-2 XL (1.5B).
Methods
forward(self, x: torch.Tensor) -> torch.Tensor (inherited from Block)
Source: src/olm/nn/structure/block.py:26
Apply each block to the input in sequence.
Parameters
x: Input tensor.
Returns
Output tensor after all blocks have been applied.