LuxLib
Backend for Lux.jl
Index
LuxLib.alpha_dropout
LuxLib.batchnorm
LuxLib.dropout
LuxLib.groupnorm
LuxLib.instancenorm
LuxLib.layernorm
Dropout
alpha_dropout(rng::AbstractRNG, x, p, ::Val{training})
alpha_dropout(rng::AbstractRNG, x, p, ::Val{training}, α, A, B)
Alpha Dropout: Dropout ensuring that the mean and variance of the output remains same as the input. For details see [1]. Use the second call signature to avoid recomputing the constants for a fixed dropout probability.
Arguments
rng
: Random number generatorx
: Input Arrayp
: Probability of an element to be dropped outVal(training)
: Iftrue
then dropout is applied onx
with probabilityp
. Else,x
is returnedα
:-1.7580993408473766
. Computed at limit x tends to infinity,selu(x) = -λβ = α
A
: Scaling factor for the meanB
: Scaling factor for the variance
Returns
Output Array after applying alpha dropout
Updated state for the random number generator
References
[1] Klambauer, Günter, et al. "Self-normalizing neural networks." Advances in neural information processing systems 30 (2017).
dropout(rng::AbstractRNG, x, p, ::Val{training}, invp; dims)
dropout(rng::AbstractRNG, x, mask, p, ::Val{training}, ::Val{update_mask}, invp;
dims)
Dropout: Simple Way to prevent Neural Networks for Overfitting. For details see [1].
Arguments
rng
: Random number generatorx
: Input Arraymask
: Dropout Mask. If not used then it is constructed automaticallyp
: Probability of an element to be dropped outVal(training)
: Iftrue
then dropout is applied onx
with probabilityp
alongdims
. Else,x
is returnedVal(update_mask)
: Iftrue
then the mask is generated and used. Else, themask
provided is directly usedinvp
: Inverse of the probability
Keyword Arguments
dims
: Dimensions along which dropout is appliedinvp
: Inverse of the probability ()
Returns
Output Array after applying dropout
Dropout Mask (if
training == false
, the returned value is meaningless)Updated state for the random number generator
References
[1] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.
Normalization
batchnorm(x, scale, bias, running_mean, running_var; momentum, epsilon, training)
Batch Normalization. For details see [1].
Batch Normalization computes the mean and variance for each
Arguments
x
: Input to be Normalizedscale
: Scale factor () (can be nothing
)bias
: Bias factor () (can be nothing
)running_mean
: Running mean (can benothing
)running_var
: Running variance (can benothing
)
Keyword Arguments
momentum
: Momentum for updating running mean and varianceepsilon
: Value added to the denominator for numerical stabilitytraining
: Set toVal(true)
if running in training mode
Returns
Normalized Array of same size as x
. And a Named Tuple containing the updated running mean and variance.
Performance Considerations
If the input array is 2D
, 4D
, or 5D
CuArray
with element types Float16
, Float32
and Float64
, then the CUDNN code path will be used. In all other cases, a broadcasting fallback is used which is not highly optimized.
References
[1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.
groupnorm(x, scale, bias; groups, epsilon)
Group Normalization. For details see [1].
This op is similar to batch normalization, but statistics are shared across equally-sized groups of channels and not shared across batch dimension. Thus, group normalization does not depend on the batch composition and does not require maintaining internal state for storing statistics.
Arguments
x
: Input to be Normalizedscale
: Scale factor () (can be nothing
)bias
: Bias factor () (can be nothing
)
Keyword Arguments
groups
: Number of groupsepsilon
: Value added to the denominator for numerical stability
Returns
The normalized array is returned.
Performance Considerations
The most common case of this Op – x
is a 4D array – is optimized using KernelAbstractions and has a fast custom backwards pass implemented. All other cases have a fallback implementation which is not especially optimized.
We have tested the code path for Float16
and it works, but gradient accumulation is extremely fragile. Hence, for Float16
inputs, it uses the fallback implementation.
If the batch size is small (< 16), then the fallback implementation will be faster than the KA version. However, this customization is not possible using the direct groupnorm
interface.
References
[1] Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European conference on computer vision (ECCV). 2018.
instancenorm(x, scale, bias; epsilon, training)
Instance Normalization. For details see [1].
Instance Normalization computes the mean and variance for each
Arguments
x
: Input to be Normalized (must be atleast 3D)scale
: Scale factor () (can be nothing
)bias
: Bias factor () (can be nothing
)
Keyword Arguments
epsilon
: Value added to the denominator for numerical stabilitytraining
: Set toVal(true)
if running in training mode
Returns
Normalized Array of same size as x
. And a Named Tuple containing the updated running mean and variance.
References
[1] Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).
layernorm(x, scale, bias; dims, epsilon)
Layer Normalization. For details see [1].
Given an input array
Arguments
x
: Input to be Normalizedscale
: Scale factor () (can be nothing
)bias
: Bias factor () (can be nothing
)
Keyword Arguments
dims
: Dimensions along which the mean and std ofx
is computedepsilon
: Value added to the denominator for numerical stability
Returns
Normalized Array of same size as x
.
References
[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).