OLM API Reference

`olm.core.dist`

Source: src/olm/core/dist.py:1

Distributed training utilities for PyTorch DDP and FSDP.

Provides wrappers and helpers for v2 single-node multi-GPU training using PyTorch's native distributed backends. Multi-node launch recipes are planned for the v4 roadmap.

Functions

all_gather(tensor: torch.Tensor) -> list[torch.Tensor]

Source: src/olm/core/dist.py:129

Gather tensors from all processes.

all_reduce(tensor: torch.Tensor, op: torch.distributed.distributed_c10d.ReduceOp = <RedOpType.SUM: 0>, async_op: bool = False) -> torch.distributed.distributed_c10d.Work | None

Source: src/olm/core/dist.py:107

Reduce tensor across all processes.

Parameters

  • tensor: Tensor to reduce (modified in-place).
  • op: Reduction operation (SUM, AVG, PRODUCT, MIN, MAX).
  • async_op: If True, returns Work handle for async operation.

Example

loss = torch.tensor([2.5])
all_reduce(loss, op=dist.ReduceOp.AVG)

barrier() -> None

Source: src/olm/core/dist.py:101

Synchronize all processes.

broadcast(tensor: torch.Tensor, src: int = 0) -> None

Source: src/olm/core/dist.py:139

Broadcast tensor from src rank to all others.

cleanup_distributed() -> None

Source: src/olm/core/dist.py:95

Cleanup distributed process group.

get_backend() -> str | None

Source: src/olm/core/dist.py:163

Get current distributed backend.

get_local_rank() -> int

Source: src/olm/core/dist.py:32

Get local rank on this machine.

get_rank() -> int

Source: src/olm/core/dist.py:22

Get rank of current process (0 if not distributed).

get_world_size() -> int

Source: src/olm/core/dist.py:27

Get total number of processes (1 if not distributed).

is_distributed() -> bool

Source: src/olm/core/dist.py:17

Check if distributed training is initialized.

is_main_process() -> bool

Source: src/olm/core/dist.py:37

Check if this is rank 0.

main_process_only(func: Callable) -> Callable

Source: src/olm/core/dist.py:151

Decorator to execute function only on rank 0.

Source: src/olm/core/dist.py:145

Print only on rank 0.

setup_distributed(backend: str | None = None, init_method: str = 'env://', timeout_minutes: int = 30) -> None

Source: src/olm/core/dist.py:42

Initialize distributed process group from environment variables.

Parameters

  • backend: 'nccl', 'gloo', or None (auto-detect).
  • init_method: Initialization method. Defaults to 'env://'.
  • timeout_minutes: Timeout for operations. Environment variables (set by torchrun): RANK, WORLD_SIZE, LOCAL_RANK, MASTER_ADDR, MASTER_PORT

Example

# Run with: torchrun --nproc_per_node=4 train.py
setup_distributed()