Skip to content
0.6k

Distributed Utils

Note

These functionalities are available via the Lux.DistributedUtils module.

Backends

Lux.MPIBackend Type
julia
MPIBackend(comm = nothing)

Create an MPI backend for distributed training. Users should not use this function directly. Instead use DistributedUtils.get_distributed_backend(MPIBackend).

source
Lux.NCCLBackend Type
julia
NCCLBackend(comm = nothing, mpi_backend = nothing)

Create an NCCL backend for distributed training. Users should not use this function directly. Instead use DistributedUtils.get_distributed_backend(NCCLBackend).

source

Initialization

Lux.DistributedUtils.initialize Function
julia
initialize(backend::Type{<:AbstractLuxDistributedBackend}; kwargs...)

Initialize the given backend. Users can supply cuda_devices and amdgpu_devices to initialize the backend with the given devices. These can be set to missing to prevent initialization of the given device type. If set to nothing, and the backend is functional we assign GPUs in a round-robin fashion. Finally, a list of integers can be supplied to initialize the backend with the given devices.

Possible values for backend are:

  • MPIBackend: MPI backend for distributed training. Requires MPI.jl to be installed.

  • NCCLBackend: NCCL backend for CUDA distributed training. Requires CUDA.jl, MPI.jl, and NCCL.jl to be installed. This also wraps MPI backend for non-CUDA communications.

source
Lux.DistributedUtils.initialized Function
julia
initialized(backend::Type{<:AbstractLuxDistributedBackend})

Check if the given backend is initialized.

source
Lux.DistributedUtils.get_distributed_backend Function
julia
get_distributed_backend(backend::Type{<:AbstractLuxDistributedBackend})

Get the distributed backend for the given backend type. Possible values are:

  • MPIBackend: MPI backend for distributed training. Requires MPI.jl to be installed.

  • NCCLBackend: NCCL backend for CUDA distributed training. Requires CUDA.jl, MPI.jl, and NCCL.jl to be installed. This also wraps MPI backend for non-CUDA communications.

Danger

initialize(backend; kwargs...) must be called before calling this function.

source

Helper Functions

Lux.DistributedUtils.local_rank Function
julia
local_rank(backend::AbstractLuxDistributedBackend)

Get the local rank for the given backend.

source
Lux.DistributedUtils.total_workers Function
julia
total_workers(backend::AbstractLuxDistributedBackend)

Get the total number of workers for the given backend.

source

Communication Primitives

Lux.DistributedUtils.allreduce! Function
julia
allreduce!(backend::AbstractLuxDistributedBackend, sendrecvbuf, op)
allreduce!(backend::AbstractLuxDistributedBackend, sendbuf, recvbuf, op)

Backend Agnostic API to perform an allreduce operation on the given buffer sendrecvbuf or sendbuf and store the result in recvbuf.

op allows a special DistributedUtils.avg operation that averages the result across all workers.

source
Lux.DistributedUtils.bcast! Function
julia
bcast!(backend::AbstractLuxDistributedBackend, sendrecvbuf; root::Int=0)
bcast!(backend::AbstractLuxDistributedBackend, sendbuf, recvbuf; root::Int=0)

Backend Agnostic API to broadcast the given buffer sendrecvbuf or sendbuf to all workers into recvbuf. The value at root will be broadcasted to all other workers.

source
Lux.DistributedUtils.reduce! Function
julia
reduce!(backend::AbstractLuxDistributedBackend, sendrecvbuf, op; root::Int=0)
reduce!(backend::AbstractLuxDistributedBackend, sendbuf, recvbuf, op; root::Int=0)

Backend Agnostic API to perform a reduce operation on the given buffer sendrecvbuf or sendbuf and store the result in recvbuf.

op allows a special DistributedUtils.avg operation that averages the result across all workers.

source
Lux.DistributedUtils.synchronize!! Function
julia
synchronize!!(backend::AbstractLuxDistributedBackend, ps; root::Int=0)

Synchronize the given structure ps using the given backend. The value at root will be broadcasted to all other workers.

source

Optimizers.jl Integration

Lux.DistributedUtils.DistributedOptimizer Type
julia
DistributedOptimizer(backend::AbstractLuxDistributedBacked, optimizer)

Wrap the optimizer in a DistributedOptimizer. Before updating the parameters, this averages the gradients across the processes using Allreduce.

Arguments

  • optimizer: An Optimizer compatible with the Optimisers.jl package
source

MLUtils.jl Integration

Lux.DistributedUtils.DistributedDataContainer Type
julia
DistributedDataContainer(backend::AbstractLuxDistributedBackend, data)

data must be compatible with MLUtils interface. The returned container is compatible with MLUtils interface and is used to partition the dataset across the available processes.

Load MLUtils.jl

MLUtils.jl must be installed and loaded before using this.

source

Layout Switch

Adjust the layout style of VitePress to adapt to different reading needs and screens.

Expand all
The sidebar and content area occupy the entire width of the screen.
Expand sidebar with adjustable values
Expand sidebar width and add a new slider for user to choose and customize their desired width of the maximum width of sidebar can go, but the content area width will remain the same.
Expand all with adjustable values
Expand sidebar width and add a new slider for user to choose and customize their desired width of the maximum width of sidebar can go, but the content area width will remain the same.
Original width
The original layout width of VitePress

Page Layout Max Width

Adjust the exact value of the page width of VitePress layout to adapt to different reading needs and screens.

Adjust the maximum width of the page layout
A ranged slider for user to choose and customize their desired width of the maximum width of the page layout can go.

Content Layout Max Width

Adjust the exact value of the document content width of VitePress layout to adapt to different reading needs and screens.

Adjust the maximum width of the content layout
A ranged slider for user to choose and customize their desired width of the maximum width of the content layout can go.

Spotlight

Highlight the line where the mouse is currently hovering in the content to optimize for users who may have reading and focusing difficulties.

ONOn
Turn on Spotlight.
OFFOff
Turn off Spotlight.