OLM integrates Weights & Biases (wandb) through a single callback. Add it to any trainer to get live charts of loss, perplexity, learning rate, throughput, and system metrics, plus optional gradient histograms, checkpoint artifacts, alerts, and hyperparameter sweeps.
Install the extra and authenticate:
pip install "openlanguagemodel[wandb]"
wandb login
If wandb is not installed, OLM degrades gracefully and the rest of the library works unchanged.
Basic usage
Construct a WandBCallback and pass it to the trainer. It automatically captures your model size, optimizer, and training configuration.
from olm.logging import WandBCallback
from olm.train import Trainer
wandb_cb = WandBCallback(
project="my-language-model",
name="gpt2-baseline",
tags=["gpt2", "fineweb"],
)
trainer = Trainer(
model, optimizer, loader,
device="cuda", context_length=1024,
callbacks=[wandb_cb],
)
trainer.train(epochs=1, max_steps=10_000)
This logs, every step: train/loss, train/perplexity, train/learning_rate, train/tokens_per_sec, and GPU/CPU statistics.
Gradient and weight tracking
Watch for vanishing or exploding gradients with histograms:
wandb_cb = WandBCallback(
project="my-language-model",
log_gradients=True, # gradient histograms
watch_model=True, # wandb.watch() for detailed tracking
watch_freq=1000,
)
Checkpoint artifacts
Version your model checkpoints as wandb artifacts so every run is reproducible:
wandb_cb = WandBCallback(project="my-language-model", log_model=True)
Alerts
Be notified when a metric crosses a threshold — for example, if the loss diverges:
wandb_cb = WandBCallback(
project="my-language-model",
alert_thresholds={
"train/loss": {"max": 10.0}, # alert if loss climbs above 10
"train/learning_rate": {"min": 1e-6}, # alert if the LR collapses
},
)
Hyperparameter sweeps
OLM provides helpers to launch Bayesian (or grid/random) sweeps. Start from the template and customize it:
from olm.logging import create_sweep, get_sweep_config_template
import wandb
config = get_sweep_config_template("bayes")
config["parameters"]["learning_rate"] = {"min": 1e-5, "max": 1e-3}
config["parameters"]["batch_size"] = {"values": [16, 32, 64]}
sweep_id = create_sweep(config, project="my-language-model")
def train_run():
wandb.init()
cfg = wandb.config
# build the trainer using cfg.learning_rate, cfg.batch_size, ...
# trainer.train(...)
wandb.agent(sweep_id, function=train_run, count=20)
Offline mode
For air-gapped machines, log locally and sync later with wandb sync:
wandb_cb = WandBCallback(project="my-language-model", offline=True)
Distributed training
The callback is distributed-aware: only rank 0 logs, so there are no duplicate runs. Add it to a DDPTrainer or FSDPTrainer exactly as you would to a single-GPU Trainer:
from olm.train.trainer import DDPTrainer
trainer = DDPTrainer(
model, torch.optim.AdamW, loader,
device=device, context_length=1024,
callbacks=[WandBCallback(project="distributed-run")],
)
Next steps
- See
examples/wandb_example.pyfor a complete, runnable script. - Browse the full
WandBCallbackAPI for every option.