LuxLib

Backend for Lux.jl

Index

LuxLib.alpha_dropout
LuxLib.batchnorm
LuxLib.dropout
LuxLib.fast_activation!!
LuxLib.fused_conv_bias_activation
LuxLib.fused_dense_bias_activation
LuxLib.groupnorm
LuxLib.instancenorm
LuxLib.layernorm

Fully Connected Layers

# LuxLib.fused_dense_bias_activation — Function.

julia

fused_dense_bias_activation(σ::F, weight::AbstractMatrix, x::AbstractMatrix,
    b::Union{Nothing, AbstractVector}) where {F}

Compute σ.(weight * x .+ b) with the best possible implementation available. Currently this implementation attempts to minimize reallocations by reusing the output buffer for multiple operations.

Arguments

σ: Activation function
weight: Weight matrix
x: Input matrix
b: Bias vector (can be nothing)

Notes on implementation

Despite the naming, currently only the activation (σ) is fused with the bias addition. Currently this is equivalent to using matrix multiply followed by NNlib.bias_act!, though this function doesn't call those operations.
If any of the inputs, don't support setindexing (aka immutable arrays) we fallback to the generic non-mutating implementation.
Maximum memory reuse and operation fusion is guaranteed for ChainRules compatible AD backends or backends that support mutation. Backends like Tracker and ReverseDiff fallback to the generic implementation.
For CUDA Arrays, this uses a special fused implementation via cuBLASLt.

source

Convolutional Layers

# LuxLib.fused_conv_bias_activation — Function.

julia

fused_conv_bias_activation(σ::F, weight::AbstractArray, x::AbstractArray,
    b::Union{Nothing, AbstractArray}, cdims::ConvDims) where {F}

Computes σ.(conv(x, weight, cdims) .+ b) with the best possible implementation available. This operation fuses operations into a single kernel if possible, and minimizes reallocations by reusing the output buffer for multiple operations.

Arguments

σ: Activation function
weight: Weight tensor
x: Input tensor
b: Bias tensor (can be nothing)
cdims: ConvDims object

Notes on implementation

For CUDA Arrays, this uses fused CUDNN kernels when the activation is identity or relu. For other activations, it tries to fuse the operations on the Julia side.
If any of the inputs, don't support setindexing (aka immutable arrays) we fallback to the generic non-mutating implementation.
Maximum memory reuse and operation fusion is guaranteed for ChainRules compatible AD backends or backends that support mutation. Backends like Tracker and ReverseDiff fallback to the generic implementation.
For Mixed-Precision Inputs on GPU, we type promote the inputs to the highest precision, with a warning.

source

Dropout

# LuxLib.alpha_dropout — Function.

julia

alpha_dropout(rng::AbstractRNG, x, p, ::Val{training})
alpha_dropout(rng::AbstractRNG, x, p, ::Val{training}, α, A, B)

Alpha Dropout: Dropout ensuring that the mean and variance of the output remains same as the input. For details see [1]. Use the second call signature to avoid recomputing the constants for a fixed dropout probability.

Arguments

rng: Random number generator
x: Input Array
p: Probability of an element to be dropped out
Val(training): If true then dropout is applied on x with probability p. Else, x is returned
α: -1.7580993408473766. Computed at limit x tends to infinity, selu(x) = -λβ = α
A: Scaling factor for the mean
B: Scaling factor for the variance

Returns

Output Array after applying alpha dropout
Updated state for the random number generator

References

[1] Klambauer, Günter, et al. "Self-normalizing neural networks." Advances in neural information processing systems 30 (2017).

source

# LuxLib.dropout — Function.

julia

dropout(rng::AbstractRNG, x, p, ::Val{training}, invp; dims)
dropout(rng::AbstractRNG, x, mask, p, ::Val{training}, ::Val{update_mask}, invp;
        dims)

Dropout: Simple Way to prevent Neural Networks for Overfitting. For details see [1].

Arguments

rng: Random number generator
x: Input Array
mask: Dropout Mask. If not used then it is constructed automatically
p: Probability of an element to be dropped out
Val(training): If true then dropout is applied on x with probability p along dims. Else, x is returned
Val(update_mask): If true then the mask is generated and used. Else, the mask provided is directly used
invp: Inverse of the probability

Keyword Arguments

dims: Dimensions along which dropout is applied
invp: Inverse of the probability ( $\frac{1}{p}$ )

Returns

Output Array after applying dropout
Dropout Mask (if training == false, the returned value is meaningless)
Updated state for the random number generator

References

[1] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.

source

Normalization

# LuxLib.batchnorm — Function.

julia

batchnorm(x, scale, bias, running_mean, running_var, σ=identity; momentum, epsilon,
    training)

Batch Normalization. For details see [1].

Batch Normalization computes the mean and variance for each $D_{1} \times . . . \times D_{N - 2} \times 1 \times D_{N}$ input slice and normalises the input accordingly.

Arguments

x: Input to be Normalized
scale: Scale factor ( $γ$ ) (can be nothing)
bias: Bias factor ( $β$ ) (can be nothing)
running_mean: Running mean (can be nothing)
running_var: Running variance (can be nothing)
σ: Activation function (default: identity)

Keyword Arguments

momentum: Momentum for updating running mean and variance
epsilon: Value added to the denominator for numerical stability
training: Set to Val(true) if running in training mode

Returns

Normalized Array of same size as x. And a Named Tuple containing the updated running mean and variance.

Performance Considerations

If the input array is 2D, 4D, or 5D CuArray with element types Float16, Float32 and Float64, then the CUDNN code path will be used. In all other cases, a broadcasting fallback is used which is not highly optimized.

References

[1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.

source

# LuxLib.groupnorm — Function.

julia

groupnorm(x, scale, bias; groups, epsilon)

Group Normalization. For details see [1].

This op is similar to batch normalization, but statistics are shared across equally-sized groups of channels and not shared across batch dimension. Thus, group normalization does not depend on the batch composition and does not require maintaining internal state for storing statistics.

Arguments

x: Input to be Normalized
scale: Scale factor ( $γ$ ) (can be nothing)
bias: Bias factor ( $β$ ) (can be nothing)

Keyword Arguments

groups: Number of groups
epsilon: Value added to the denominator for numerical stability

Returns

The normalized array is returned.

Performance Considerations

The most common case of this Op – x is a 4D array – is optimized using KernelAbstractions and has a fast custom backwards pass implemented. All other cases have a fallback implementation which is not especially optimized.

We have tested the code path for Float16 and it works, but gradient accumulation is extremely fragile. Hence, for Float16 inputs, it uses the fallback implementation.

If the batch size is small (< 16), then the fallback implementation will be faster than the KA version. However, this customization is not possible using the direct groupnorm interface.

References

[1] Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European conference on computer vision (ECCV). 2018.

source

# LuxLib.instancenorm — Function.

julia

instancenorm(x, scale, bias, σ = identity; epsilon, training)

Instance Normalization. For details see [1].

Instance Normalization computes the mean and variance for each $D_{1} \times . . . \times D_{N - 2} \times 1 \times 1$ input slice and normalises the input accordingly.

Arguments

x: Input to be Normalized (must be atleast 3D)
scale: Scale factor ( $γ$ ) (can be nothing)
bias: Bias factor ( $β$ ) (can be nothing)
σ: Activation function (default: identity)

Keyword Arguments

epsilon: Value added to the denominator for numerical stability
training: Set to Val(true) if running in training mode

Returns

Normalized Array of same size as x. And a Named Tuple containing the updated running mean and variance.

References

[1] Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).

source

# LuxLib.layernorm — Function.

julia

layernorm(x, scale, bias, σ = identity; dims, epsilon)

Layer Normalization. For details see [1].

Given an input array $x$ , this layer computes

y = \frac{x - E [x]}{\sqrt{V a r [x] + ϵ}} * γ + β

and applies the activation function σ elementwise to y.

Arguments

x: Input to be Normalized
scale: Scale factor ( $γ$ ) (can be nothing)
bias: Bias factor ( $β$ ) (can be nothing)
σ: Activation function (default: identity)

Keyword Arguments

dims: Dimensions along which the mean and std of x is computed
epsilon: Value added to the denominator for numerical stability

Returns

Normalized Array of same size as x.

References

[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).

source

Apply Activation

# LuxLib.fast_activation!! — Function.

julia

fast_activation!!(σ::F, x) where {F}

Compute σ.(x) with the best possible implementation available. If it is possible to rewrite x in-place, it does so. If x is an immutable array, it falls back to the generic implementation.

Note

This function doesn't replace σ with NNlib.fast_act(σ, ...), that needs to be done by the user if needed.

Arguments

σ: Activation function
x: Input array

Returns

Output Array with the same size as x

source

Trusted by

LuxLib ​

Index ​

Fully Connected Layers ​

Convolutional Layers ​

Dropout ​

Normalization ​

Apply Activation ​

LuxLib

Index

Fully Connected Layers

Convolutional Layers

Dropout

Normalization

Apply Activation