LuxLib
Backend for Lux.jl
Index
Dropout
# LuxLib.alpha_dropout —
Function.
```julia
alpha_dropout(rng::AbstractRNG, x, p, ::Val{training})
alpha_dropout(rng::AbstractRNG, x, p, ::Val{training}, α, A, B)
```
Alpha Dropout: Dropout ensuring that the mean and variance of the output remains same as the input. For details see [1]. Use the second call signature to avoid recomputing the constants for a fixed dropout probability.
**Arguments**
* `rng`: Random number generator
* `x`: Input Array
* `p`: Probability of an element to be dropped out
* `Val(training)`: If `true` then dropout is applied on `x` with probability `p`. Else, `x` is returned
* `α`: -1.7580993408473766. Computed at limit x tends to infinity, `selu(x) = -λβ = α`
* `A`: Scaling factor for the mean
* `B`: Scaling factor for the variance
**Returns**
* Output Array after applying alpha dropout
* Updated state for the random number generator
**References**
[1] Klambauer, Günter, et al. "Self-normalizing neural networks." Advances in neural information processing systems 30 (2017).
# LuxLib.dropout —
Function.
```julia
dropout(rng::AbstractRNG, x, p, ::Val{training}, invp; dims)
dropout(rng::AbstractRNG, x, mask, p, ::Val{training}, ::Val{update_mask}, invp;
dims)
```
Dropout: Simple Way to prevent Neural Networks for Overfitting. For details see [1].
**Arguments**
* `rng`: Random number generator
* `x`: Input Array
* `mask`: Dropout Mask. If not used then it is constructed automatically
* `p`: Probability of an element to be dropped out
* `Val(training)`: If `true` then dropout is applied on `x` with probability `p` along `dims`. Else, `x` is returned
* `Val(update_mask)`: If `true` then the mask is generated and used. Else, the `mask` provided is directly used
* `invp`: Inverse of the probability
**Keyword Arguments**
* `dims`: Dimensions along which dropout is applied
* `invp`: Inverse of the probability ($\frac{1}{p}$)
**Returns**
* Output Array after applying dropout
* Dropout Mask (if `training == false`, the returned value is meaningless)
* Updated state for the random number generator
**References**
[1] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.
Normalization
# LuxLib.batchnorm —
Function.
```julia
batchnorm(x, scale, bias, running_mean, running_var; momentum, epsilon, training)
```
Batch Normalization. For details see [1].
Batch Normalization computes the mean and variance for each $D_1 \times ... \times D_{N - 2} \times 1 \times D_N$ input slice and normalises the input accordingly.
**Arguments**
* `x`: Input to be Normalized
* `scale`: Scale factor ($\gamma$) (can be `nothing`)
* `bias`: Bias factor ($\beta$) (can be `nothing`)
* `running_mean`: Running mean (can be `nothing`)
* `running_var`: Running variance (can be `nothing`)
**Keyword Arguments**
* `momentum`: Momentum for updating running mean and variance
* `epsilon`: Value added to the denominator for numerical stability
* `training`: Set to `Val(true)` if running in training mode
**Returns**
Normalized Array of same size as `x`. And a Named Tuple containing the updated running mean and variance.
**Performance Considerations**
If the input array is `2D`, `4D`, or `5D` `CuArray` with element types `Float16`, `Float32` and `Float64`, then the CUDNN code path will be used. In all other cases, a broadcasting fallback is used which is not highly optimized.
**References**
[1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.
# LuxLib.groupnorm —
Function.
```julia
groupnorm(x, scale, bias; groups, epsilon)
```
Group Normalization. For details see [1].
This op is similar to batch normalization, but statistics are shared across equally-sized groups of channels and not shared across batch dimension. Thus, group normalization does not depend on the batch composition and does not require maintaining internal state for storing statistics.
**Arguments**
* `x`: Input to be Normalized
* `scale`: Scale factor ($\gamma$) (can be `nothing`)
* `bias`: Bias factor ($\beta$) (can be `nothing`)
**Keyword Arguments**
* `groups`: Number of groups
* `epsilon`: Value added to the denominator for numerical stability
**Returns**
The normalized array is returned.
**Performance Considerations**
The most common case of this Op – `x` is a 4D array – is optimized using KernelAbstractions and has a fast custom backwards pass implemented. All other cases have a fallback implementation which is not especially optimized.
We have tested the code path for `Float16` and it works, but gradient accumulation is extremely fragile. Hence, for `Float16` inputs, it uses the fallback implementation.
If the batch size is small (< 16), then the fallback implementation will be faster than the KA version. However, this customization is not possible using the direct `groupnorm` interface.
**References**
[1] Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European conference on computer vision (ECCV). 2018.
# LuxLib.instancenorm —
Function.
```julia
instancenorm(x, scale, bias; epsilon, training)
```
Instance Normalization. For details see [1].
Instance Normalization computes the mean and variance for each $D_1 \times ... \times D_{N - 2} \times 1 \times 1$` input slice and normalises the input accordingly.
**Arguments**
* `x`: Input to be Normalized (must be atleast 3D)
* `scale`: Scale factor ($\gamma$) (can be `nothing`)
* `bias`: Bias factor ($\beta$) (can be `nothing`)
**Keyword Arguments**
* `epsilon`: Value added to the denominator for numerical stability
* `training`: Set to `Val(true)` if running in training mode
**Returns**
Normalized Array of same size as `x`. And a Named Tuple containing the updated running mean and variance.
**References**
[1] Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).
# LuxLib.layernorm —
Function.
```julia
layernorm(x, scale, bias; dims, epsilon)
```
Layer Normalization. For details see [1].
Given an input array $x$, this layer computes
$$
y = \frac{x - \mathbb{E}[x]}{\sqrt{Var[x] + \epsilon}} * \gamma + \beta
$$
**Arguments**
* `x`: Input to be Normalized
* `scale`: Scale factor ($\gamma$) (can be `nothing`)
* `bias`: Bias factor ($\beta$) (can be `nothing`)
**Keyword Arguments**
* `dims`: Dimensions along which the mean and std of `x` is computed
* `epsilon`: Value added to the denominator for numerical stability
**Returns**
Normalized Array of same size as `x`.
**References**
[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).