# LuxLib

Backend for Lux.jl

## Apply Activation

## LuxLib.API.fast_activation Function

`fast_activation(σ::F, x::AbstractArray) where {F}`

Compute `σ.(x)`

with the best possible implementation available. On CPUs we unroll the loop and use LoopVectorization.jl to vectorize the computation. On GPUs we use simply use broadcasting.

Note

This function doesn't replace `σ`

with `NNlib.fast_act(σ, ...)`

, that needs to be done by the user if needed.

**Arguments**

`σ`

: Activation function`x`

: Input array

**Returns**

- Output Array with the same size as
`x`

## LuxLib.API.fast_activation!! Function

`fast_activation!!(σ::F, x::AbstractArray) where {F}`

Compute `σ.(x)`

with the best possible implementation available. If it is possible to rewrite `x`

in-place, it does so. If `x`

is an immutable array, it falls back to the generic implementation.

Note

This function doesn't replace `σ`

with `NNlib.fast_act(σ, ...)`

, that needs to be done by the user if needed.

Tip

Certain activation functions are replaced with specialized implementations from SLEEFPirates.jl for FP32. This might lead to faster performance but can cause slight decrease in accuracy (in the floating point limit).

**Arguments**

`σ`

: Activation function`x`

: Input array

**Returns**

- Output Array with the same size as
`x`

## Batched Operations

## LuxLib.API.batched_matmul Function

`batched_matmul(x, y)`

Computes the batched matrix multiplication of `x`

and `y`

. For more details see the NNlib documentation on `NNlib.batched_mul`

. This function is mostly a wrapper around `batched_mul`

but attempts to be faster on CPUs.

## Bias Activation

## LuxLib.API.bias_activation Function

`bias_activation(σ, x, bias)`

Applies the activation function `σ`

elementwise to the result of broadcasted addition of `x`

and `bias`

along the penultimate dimension. A vector `x`

is treated as a matrix with a single last dimension.

**Arguments**

`σ`

: Activation function`x`

: Input to be transformed`bias`

: Bias to be added. Can be`nothing`

.

See also `bias_activation!!`

, `fast_activation`

.

## LuxLib.API.bias_activation!! Function

`bias_activation!!(σ, x, bias)`

Same as `bias_activation`

but might update `x`

in-place if possible. Users should not rely on `x`

being mutated, it is recommended to use it like `y = bias_activation!!(σ, x, bias)`

. If `x`

is updated in-place, `y`

aliases `x`

.

See also `bias_activation`

, `fast_activation!!`

.

## Convolutional Layers

## LuxLib.API.fused_conv_bias_activation Function

```
fused_conv_bias_activation(σ::F, weight::AbstractArray, x::AbstractArray,
b::Optional{<:AbstractVector}, cdims::ConvDims) where {F}
```

Computes `σ.(conv(x, weight, cdims) .+ b)`

(`b`

is not exactly broadcasted like this, rather it is reshaped and broadcasted to the penultimate dimension) with the best possible implementation available. This operation fuses operations into a single kernel if possible, and minimizes reallocations by reusing the output buffer for multiple operations.

**Arguments**

`σ`

: Activation function`weight`

: Weight tensor`x`

: Input tensor`b`

: Bias tensor (can be`nothing`

)`cdims`

:`ConvDims`

object

**Notes on implementation**

For CUDA Arrays, this uses fused CUDNN kernels when the activation is

`identity`

or`relu`

. For other activations, it tries to fuse the operations on the Julia side.If any of the inputs, don't support setindexing (aka immutable arrays) we fallback to the generic non-mutating implementation.

Maximum memory reuse and operation fusion is guaranteed for ChainRules compatible AD backends or backends that support mutation. Backends like

`Tracker`

and`ReverseDiff`

fallback to the generic implementation.For Mixed-Precision Inputs on GPU, we type promote the inputs to the highest precision, with a warning.

## Dropout

## LuxLib.API.alpha_dropout Function

```
alpha_dropout(rng::AbstractRNG, x, p, training)
alpha_dropout(rng::AbstractRNG, x, p, training, α, A, B)
```

Alpha Dropout: Dropout ensuring that the mean and variance of the output remains same as the input. For details see [1]. Use the second call signature to avoid recomputing the constants for a fixed dropout probability.

**Arguments**

`rng`

: Random number generator`x`

: Input Array`p`

: Probability of an element to be dropped out`training`

: Set to`Val(true)`

or`True()`

if running in training mode. Can be set to`nothing`

to automatically determine if the function is being called within an autodiff context``α`

:`-1.7580993408473766`

. Computed at limit x tends to infinity,`selu(x) = -λβ = α`

`A`

: Scaling factor for the mean`B`

: Scaling factor for the variance

**Returns**

Output Array after applying alpha dropout

Updated state for the random number generator

**References**

[1] Klambauer, Günter, et al. "Self-normalizing neural networks." Advances in neural information processing systems 30 (2017).

## LuxLib.API.dropout Function

```
dropout(rng::AbstractRNG, x, p, training, invp, dims)
dropout(rng::AbstractRNG, x, mask, p, training, update_mask::Union{Val, StaticBool},
invp, dims)
```

Dropout: Simple Way to prevent Neural Networks for Overfitting. For details see [1].

**Arguments**

`rng`

: Random number generator`x`

: Input Array`mask`

: Dropout Mask. If not used then it is constructed automatically`p`

: Probability of an element to be dropped out`training`

: Set to`Val(true)`

or`True()`

if running in training mode. Can be set to`nothing`

to automatically determine if the function is being called within an autodiff context`update_mask`

: If`Val(true)`

or`True()`

then the mask is generated and used. Else, the`mask`

provided is directly used`invp`

: Inverse multiplied to the mask. Calculated as`invp = 1 / (1 - p)`

.

**Returns**

Output Array after applying dropout

Dropout Mask (if

`training == false`

, the returned value is meaningless)Updated state for the random number generator

**References**

[1] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.

## Fully Connected Layers

## LuxLib.API.fused_dense_bias_activation Function

```
fused_dense_bias_activation(σ::F, weight::AbstractMatrix, x::AbstractMatrix,
b::Optional{<:AbstractVector}) where {F}
```

Compute `σ.(weight * x .+ b)`

with the best possible implementation available. Currently this implementation attempts to minimize reallocations by reusing the output buffer for multiple operations.

**Arguments**

`σ`

: Activation function`weight`

: Weight matrix`x`

: Input matrix`b`

: Bias vector (can be`nothing`

)

**Notes on implementation**

If any of the inputs, don't support setindexing (aka immutable arrays) we fallback to the generic non-mutating implementation.

Maximum memory reuse and operation fusion is guaranteed for ChainRules compatible AD backends or backends that support mutation. Backends like

`Tracker`

and`ReverseDiff`

fallback to the generic implementation.For CUDA Arrays, this uses a special fused implementation via cuBLASLt.

For small CPU Arrays, we use LoopVectorization.jl. On

`x86_64`

we use Octavian for medium sized matrices. This is overridden if special BLAS implementations are loaded (currently`MKL`

,`AppleAccelerate`

, and`BLISBLAS`

).

## Normalization

## LuxLib.API.batchnorm Function

```
batchnorm(x, scale, bias, running_mean, running_var, training,
σ=identity, momentum = 0.1f0, epsilon = eps(eltype(x)) ^ (5 // 7))
```

Batch Normalization. For details see [1].

Batch Normalization computes the mean and variance for each

**Arguments**

`x`

: Input to be Normalized`scale`

: Scale factor () (can be `nothing`

)`bias`

: Bias factor () (can be `nothing`

)`running_mean`

: Running mean (can be`nothing`

)`running_var`

: Running variance (can be`nothing`

)`training`

: Set to`Val(true)`

or`True()`

if running in training mode. Can be set to`nothing`

to automatically determine if the function is being called within an autodiff context`σ`

: Activation function (default:`identity`

)`momentum`

: Momentum for updating running mean and variance (default:`0.1f0`

)`epsilon`

: Value added to the denominator for numerical stability (default:`eps(eltype(x)) ^ (5 / 7)`

)

**Returns**

Normalized Array of same size as `x`

. And a Named Tuple containing the updated running mean and variance.

**References**

[1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.

## LuxLib.API.groupnorm Function

```
groupnorm(x, scale, bias, groups::Int, σ::F=identity,
epsilon::Real=eps(eltype(x)) ^ (5 // 7))
```

Group Normalization. For details see [1].

This op is similar to batch normalization, but statistics are shared across equally-sized groups of channels and not shared across batch dimension. Thus, group normalization does not depend on the batch composition and does not require maintaining internal state for storing statistics.

**Arguments**

`x`

: Input to be Normalized`scale`

: Scale factor () (can be `nothing`

)`bias`

: Bias factor () (can be `nothing`

)`groups`

: Number of groups`σ`

: Activation function (default:`identity`

)`epsilon`

: Value added to the denominator for numerical stability (default:`eps(eltype(x)) ^ (5 / 7)`

)

**Returns**

The normalized array is returned.

**References**

[1] Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European conference on computer vision (ECCV). 2018.

## LuxLib.API.instancenorm Function

```
instancenorm(x, scale, bias, training, act, epsilon = eps(eltype(x)) ^ (5 // 7))
instancenorm(x, scale, bias, running_mean, running_var, training, act, momentum,
epsilon = eps(eltype(x)) ^ (5 // 7))
```

Instance Normalization. For details see [1].

Instance Normalization computes the mean and variance for each

**Arguments**

`x`

: Input to be Normalized (must be atleast 3D)`scale`

: Scale factor () (can be `nothing`

)`bias`

: Bias factor () (can be `nothing`

)`running_mean`

: Running mean (can be`nothing`

)`running_var`

: Running variance (can be`nothing`

)`training`

: Set to`Val(true)`

or`True()`

if running in training mode. Can be set to`nothing`

to automatically determine if the function is being called within an autodiff context`σ`

: Activation function (default:`identity`

)`epsilon`

: Value added to the denominator for numerical stability (default:`eps(eltype(x)) ^ (5 / 7)`

)`momentum`

: Momentum for updating running mean and variance (default:`0.1f0`

)

**Returns**

Normalized Array of same size as `x`

. And a Named Tuple containing the updated running mean and variance.

**References**

[1] Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).

## LuxLib.API.layernorm Function

```
layernorm(x::AbstractArray{xT, N}, scale, bias, σ = identity, dims=1:(N - 1),
epsilon = eps(eltype(x)) ^ (5 / 7)) where {xT, N}
```

Layer Normalization. For details see [1].

Given an input array

and applies the activation function `σ`

elementwise to `y`

.

**Arguments**

`x`

: Input to be Normalized`scale`

: Scale factor () (can be `nothing`

)`bias`

: Bias factor () (can be `nothing`

)`σ`

: Activation function (default:`identity`

)`dims`

: Dimensions along which the mean and std of`x`

is computed. If`nothing`

is passed, the dims are inferred based on the dimensions of scale and bias. For example, if`x`

is`N`

dimensional and`scale`

and`bias`

are`M`

dimensional, then the dims will be`1:(N - M)`

.`epsilon`

: Value added to the denominator for numerical stability (default:`eps(eltype(x)) ^ (5 / 7)`

)

**Returns**

Normalized Array of same size as `x`

.

**References**

[1] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).

## Helper Functions

## LuxLib.internal_operation_mode Function

```
internal_operation_mode(xs::Tuple)
internal_operation_mode(x::AbstractArray)
```

Returns the internal operation mode for the given array(s). This is useful to define custom implementations using different backends like simple Julia broadcasting, Kernel Abstractions, Loop Vectorization, etc.

Currently supported modes are:

`GenericBroadcastOp`

: This is the fallback for most types. For the following types this is the preferred mode:Arrays with

`fast_scalar_indexing`

set to`False`

.Static Arrays

ReverseDiff Arrays

Tracker Arrays

ForwardDiff.Dual Arrays

`GPUBroadcastOp{dev}`

: GPU Arrays where`dev`

is obtained from`get_device_type(xs)`

. This option dispatches should preferably use`KernelAbstractions`

or specialized vendor dispatches.`LoopedArrayOp`

: CPU arrays that can be optimized using SIMD Loops, ideally using`LoopVectorization.jl`

or`Polyester.jl`

.