Built-In Layers
Containers
Lux.BranchLayer Type
BranchLayer(layers...; fusion=nothing)
BranchLayer(; fusion=nothing, name=nothing, layers...)Takes an input x and passes it through all the layers and returns a tuple of the outputs. If fusion is provided, applies fusion to the tuple of outputs.
Arguments
- Layers can be specified in two formats:
A list of
NLux layersSpecified as
Nkeyword arguments.
Keyword Arguments
fusion: An optional layer or function to apply to the tuple of outputs. Iffusion = nothing, returns the tuple as-is (default behavior). Iffusionis provided, returnsfusion((layer_1(x), layer_2(x), ..., layer_N(x))).
Extended Help
Inputs
x: Will be directly passed to each of thelayers
Returns
If
fusion = nothing: Tuple(layer_1(x), layer_2(x), ..., layer_N(x))(naming changes if using the kwargs API)If
fusionis provided:fusion((layer_1(x), layer_2(x), ..., layer_N(x)))Updated state of the
layers(andfusionif it's a layer)
Parameters
Parameters of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)If
fusionis an AbstractLuxLayer, parameters include bothlayersandfusion
States
States of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)If
fusionis an AbstractLuxLayer, states include bothlayersandfusion
Comparison with Parallel
This is slightly different from Parallel(nothing, layers...)
If the input is a tuple,
Parallelwill pass each element individually to each layer.BranchLayeressentially assumes 1 input comes in and is branched out intoNoutputs.
Example
An easy way to replicate an input to an NTuple is to do
julia> BranchLayer(NoOpLayer(), NoOpLayer(), NoOpLayer())
BranchLayer(
layer_(1-3) = NoOpLayer(),
) # Total: 0 parameters,
# plus 0 states.Lux.Chain Type
Chain(layers...; name=nothing)
Chain(; layers..., name=nothing)Collects multiple layers / functions to be called in sequence on a given input.
Arguments
- Layers can be specified in two formats:
A list of
NLux layersSpecified as
Nkeyword arguments.
Extended Help
Inputs
Input x is passed sequentially to each layer, and must conform to the input requirements of the internal layers.
Returns
Output after sequentially applying all the layers to
xUpdated model states
Parameters
- Parameters of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)
States
- States of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)
Miscellaneous Properties
- Allows indexing and field access syntax. We can access the
ith layer bym[i]orm.layer_i. We can also index using ranges or arrays.
Example
julia> Chain(Dense(2, 3, relu), BatchNorm(3), Dense(3, 2))
Chain(
layer_1 = Dense(2 => 3, relu), # 9 parameters
layer_2 = BatchNorm(3, affine=true, track_stats=true), # 6 parameters, plus 7 non-trainable
layer_3 = Dense(3 => 2), # 8 parameters
) # Total: 23 parameters,
# plus 7 states.
julia> Chain(Dense(2, 3, relu), BatchNorm(3), Dense(3, 2); name="MyFancyChain")
MyFancyChain(
layer_1 = Dense(2 => 3, relu), # 9 parameters
layer_2 = BatchNorm(3, affine=true, track_stats=true), # 6 parameters, plus 7 non-trainable
layer_3 = Dense(3 => 2), # 8 parameters
) # Total: 23 parameters,
# plus 7 states.Lux.PairwiseFusion Type
PairwiseFusion(connection, layers...; name=nothing)
PairwiseFusion(connection; name=nothing, layers...)
PairwiseFusion(; connection, layers..., name=nothing)x1 → layer1 → y1 ↘
connection → layer2 → y2 ↘
x2 ↗ connection → y3
x3 ↗Arguments
connection: Takes 2 inputs and combines themlayers:AbstractLuxLayers. Layers can be specified in two formats:A list of
NLux layersSpecified as
Nkeyword arguments.
Extended Help
Inputs
Layer behaves differently based on input type:
- If the input
xis a tuple of lengthN + 1, then thelayersmust be a tuple of lengthN. The computation is as follows
y = x[1]
for i in 1:N
y = connection(x[i + 1], layers[i](y))
end- Any other kind of input
y = x
for i in 1:N
y = connection(x, layers[i](y))
endReturns
See Inputs section for how the return value is computed
Updated model state for all the contained layers
Parameters
- Parameters of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)
States
- States of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)
Lux.Parallel Type
Parallel(connection, layers...; name=nothing)
Parallel(connection; name=nothing, layers...)
Parallel(; connection, layers..., name=nothing)Create a layer which passes an input to each path in layers, before reducing the output with connection.
Arguments
connection: AnN-argument function that is called after passing the input through each layer, OR an AbstractLuxLayer that takes a tuple ofNinputs. Ifconnection = nothing, we return a tuple:Parallel(nothing, f, g)(x, y) = (f(x), g(y))Layers can be specified in two formats:
A list of
NLux layersSpecified as
Nkeyword arguments.
Extended Help
Inputs
x: Ifxis not a tuple, then return is computed asconnection([l(x) for l in layers]...). Else one is passed to each layer, thusParallel(+, f, g)(x, y) = f(x) + g(y).
Returns
See the Inputs section for how the output is computed
Updated state of the
layers(andconnectionif it's a layer)
Parameters
Parameters of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)If
connectionis an AbstractLuxLayer, parameters include bothlayersandconnection
States
States of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)If
connectionis an AbstractLuxLayer, states include bothlayersandconnection
See also SkipConnection which is Parallel with one identity.
Example
julia> model = Parallel(nothing, Dense(2, 1), Dense(2, 1))
Parallel(
layer_(1-2) = Dense(2 => 1), # 6 (3 x 2) parameters
) # Total: 6 parameters,
# plus 0 states.
julia> using Random;
rng = Random.seed!(123);
ps, st = Lux.setup(rng, model);
x1 = randn(rng, Float32, 2);
x2 = randn(rng, Float32, 2);
julia> size.(first(model((x1, x2), ps, st)))
((1,), (1,))Lux.SkipConnection Type
SkipConnection(layers, connection; name=nothing)
SkipConnection(; layers, connection, name=nothing)Create a skip connection which consists of a layer or Chain of consecutive layers and a shortcut connection linking the block's input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given layer while the second is the unchanged, "skipped" input.
The simplest "ResNet"-type connection is just SkipConnection(layer, +).
Arguments
layer: Layer orChainof layers to be applied to the inputconnection:A 2-argument function that takes
layer(input)and the input ORAn AbstractLuxLayer that takes
(layer(input), input)as input
Extended Help
Inputs
x: Will be passed directly tolayer
Returns
Output of
connection(layer(input), input)Updated state of
layer
Parameters
Parameters of
layerORIf
connectionis an AbstractLuxLayer, then NamedTuple with fields:layersand:connection
States
States of
layerORIf
connectionis an AbstractLuxLayer, then NamedTuple with fields:layersand:connection
See Parallel for a more general implementation.
Lux.RepeatedLayer Type
RepeatedLayer(model; repeats::Val = Val(10), input_injection::Val = Val(false))Iteratively applies model for repeats number of times. The initial input is passed into the model repeatedly if input_injection = Val(true). This layer unrolls the computation, however, semantically this is same as:
input_injection = Val(false)juliares = x for i in 1:repeats res, st = model(res, ps, st) endinput_injection = Val(true)juliares = x for i in 1:repeats res, st = model((res, x), ps, st) end
It is expected that repeats will be a reasonable number below 20, beyond that compile times for gradients might be unreasonably high.
Arguments
modelmust be anAbstractLuxLayer
Keyword Arguments
repeats: Number of times to apply the modelinput_injection: Iftrue, then the input is passed to the model along with the output
Extended Help
Inputs
x: Input as described above
Returns
Output is computed by as described above
Updated state of the
model
Parameters
- Parameters of
model
States
- State of
model
Lux.AlternatePrecision Type
AlternatePrecision{T}(layer)
AlternatePrecision(::Type{T}, layer)This layer is used to convert the input to a different precision (T), execute the layer, and then convert the output back to the original precision.
Arguments
T: The eltype of the input to the layerlayer: The layer to execute
Inputs
x: AbstractArray
Returns
y: Output of the layerState of the output
Convolutional Layers
Lux.Conv Type
Conv(k::NTuple{N,Integer}, (in_chs => out_chs)::Pair{<:Integer,<:Integer},
activation=identity; init_weight=nothing, init_bias=nothing, stride=1,
pad=0, dilation=1, groups=1, use_bias=True(), cross_correlation=False())Standard convolutional layer.
Conv 2D
Image data should be stored in WHCN order (width, height, channels, batch). In other words, a 100 x 100 RGB image would be a 100 x 100 x 3 x 1 array, and a batch of 50 would be a 100 x 100 x 3 x 50 array. This has N = 2 spatial dimensions, and needs a kernel size like (5, 5), a 2-tuple of integers. To take convolutions along N feature dimensions, this layer expects as input an array with ndims(x) == N + 2, where size(x, N + 1) == in_chs is the number of input channels, and size(x, ndims(x)) is the number of observations in a batch.
Warning
Frameworks like Pytorch perform cross-correlation in their convolution layers. Pass cross_correlation=true to use cross-correlation instead.
Arguments
k: Tuple of integers specifying the size of the convolutional kernel. Eg, for 2D convolutionslength(k) == 2in_chs: Number of input channelsout_chs: Number of input and output channelsactivation: Activation Function
Extended Help
Keyword Arguments
init_weight: Controls the initialization of the weight parameter. Ifnothing, then we usekaiming_uniformwith gain computed on the basis of the activation function (taken from Pytorchnn.init.calculate_gain).init_bias: Controls the initialization of the bias parameter. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(fan_in)).stride: Should each be either single integer, or a tuple withNintegersdilation: Should each be either single integer, or a tuple withNintegerspad: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
Nintegers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*Nintegers, for asymmetric padding, orthe singleton
SamePad(), to calculate padding such thatsize(output,d) == size(x,d) / stride(possibly rounded) for each spatial dimension.Periodic padding can achieved by pre-empting the layer with a
WrappedFunction(x -> NNlib.pad_circular(x, N_pad; dims=pad_dims))
groups: Expected to be anInt. It specifies the number of groups to divide a convolution into (setgroups = in_chsfor Depthwise Convolutions).in_chsandout_chsmust be divisible bygroups.use_bias: Trainable bias can be disabled entirely by setting this tofalse.cross_correlation: Iftrue, perform cross-correlation instead of convolution. Prior tov1, Lux used to have aCrossCorlayer which performed cross-correlation. This was removed inv1in favor ofConvwithcross_correlation=true.
Inputs
x: Data satisfyingndims(x) == N + 2 && size(x, N - 1) == in_chs, i.e.size(x) = (I_N, ..., I_1, C_in, N)
Returns
- Output of the convolution
yof size(O_N, ..., O_1, C_out, N)where
- Empty
NamedTuple()
Parameters
weight: Convolution kernelbias: Bias (present ifuse_bias=true)
Lux.ConvTranspose Type
ConvTranspose(k::NTuple{N,Integer}, (in_chs => out_chs)::Pair{<:Integer,<:Integer},
activation=identity; init_weight=glorot_uniform, init_bias=zeros32,
stride=1, pad=0, outpad=0, dilation=1, groups=1, use_bias=True(),
cross_correlation=False())Standard convolutional transpose layer.
Arguments
k: Tuple of integers specifying the size of the convolutional kernel. Eg, for 2D convolutionslength(k) == 2in_chs: Number of input channelsout_chs: Number of input and output channelsactivation: Activation Function
Keyword Arguments
init_weight: Controls the initialization of the weight parameter. Ifnothing, then we usekaiming_uniformwith gain computed on the basis of the activation function (taken from Pytorchnn.init.calculate_gain).init_bias: Controls the initialization of the bias parameter. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(fan_in)).stride: Should each be either single integer, or a tuple withNintegersdilation: Should each be either single integer, or a tuple withNintegerspad: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
Nintegers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*Nintegers, for asymmetric padding, orthe singleton
SamePad(), to calculate padding such thatsize(output,d) == size(x,d) * stride(possibly rounded) for each spatial dimension.
groups: Expected to be anInt. It specifies the number of groups to divide a convolution into (setgroups = in_chsfor Depthwise Convolutions).in_chsandout_chsmust be divisible bygroups.use_bias: Trainable bias can be disabled entirely by setting this tofalse.cross_correlation: Iftrue, perform transposed cross-correlation instead of transposed convolution.outpad: To converseConvinversability whenstride > 1,outpadcan be used to increase the size of the output in the desired dimensions. Whereaspadis used to zero-pad the input,outpadonly affects the output shape.
Extended Help
Inputs
x: Data satisfyingndims(x) == N + 2 && size(x, N - 1) == in_chs, i.e.size(x) = (I_N, ..., I_1, C_in, N)
Returns
Output of the convolution transpose
yof size(O_N, ..., O_1, C_out, N)whereEmpty
NamedTuple()
Parameters
weight: Convolution Transpose kernelbias: Bias (present ifuse_bias=true)
Dropout Layers
Lux.AlphaDropout Type
AlphaDropout(p::Real)AlphaDropout layer.
Arguments
p: Probability of Dropoutif
p = 0thenNoOpLayeris returned.if
p = 1thenWrappedLayer(Base.Fix1(broadcast, zero))is returned.
Inputs
x: Must be an AbstractArray
Returns
xwith dropout mask applied iftraining=Val(true)else justxState with updated
rng
States
rng: Pseudo Random Number Generatortraining: Used to check if training/inference mode
Call Lux.testmode to switch to test mode.
See also Dropout, VariationalHiddenDropout
Lux.Dropout Type
Dropout(p; dims=:)Dropout layer.
Arguments
p: Probability of Dropout (ifp = 0thenNoOpLayeris returned)
Keyword Arguments
- To apply dropout along certain dimension(s), specify the
dimskeyword. e.g.Dropout(p; dims = (3,4))will randomly zero out entire channels on WHCN input (also called 2D dropout).
Inputs
x: Must be an AbstractArray
Returns
xwith dropout mask applied iftraining=Val(true)else justxState with updated
rng
States
rng: Pseudo Random Number Generatortraining: Used to check if training/inference mode
Call Lux.testmode to switch to test mode.
See also AlphaDropout, VariationalHiddenDropout
Lux.VariationalHiddenDropout Type
VariationalHiddenDropout(p; dims=:)VariationalHiddenDropout layer. The only difference from Dropout is that the mask is retained until Lux.update_state(l, :update_mask, Val(true)) is called.
Arguments
p: Probability of Dropout (ifp = 0thenNoOpLayeris returned)
Keyword Arguments
- To apply dropout along certain dimension(s), specify the
dimskeyword. e.g.VariationalHiddenDropout(p; dims = 3)will randomly zero out entire channels on WHCN input (also called 2D dropout).
Inputs
x: Must be an AbstractArray
Returns
xwith dropout mask applied iftraining=Val(true)else justxState with updated
rng
States
rng: Pseudo Random Number Generatortraining: Used to check if training/inference modemask: Dropout mask. Initilly set to nothing. After every run, contains the mask applied in that callupdate_mask: Stores whether new mask needs to be generated in the current call
Call Lux.testmode to switch to test mode.
See also AlphaDropout, Dropout
Pooling Layers
Lux.AdaptiveLPPool Type
AdaptiveLPPool(output_size; p=2)Adaptive LP Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == output_size.
Arguments
output_size: Size of the firstNdimensions for the output
GPU Support
This layer is currently only supported on CPU.
Inputs
x: Expects as input an array withndims(x) == N + 2, i.e. channel and batch dimensions, after theNfeature dimensions, whereN = length(output_size).
Returns
Output of size
(out..., C, N)Empty
NamedTuple()
Lux.AdaptiveMaxPool Type
AdaptiveMaxPool(output_size)Adaptive Max Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == output_size.
Arguments
output_size: Size of the firstNdimensions for the output
Inputs
x: Expects as input an array withndims(x) == N + 2, i.e. channel and batch dimensions, after theNfeature dimensions, whereN = length(output_size).
Returns
Output of size
(out..., C, N)Empty
NamedTuple()
Lux.AdaptiveMeanPool Type
AdaptiveMeanPool(output_size)Adaptive Mean Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == output_size.
Arguments
output_size: Size of the firstNdimensions for the output
Inputs
x: Expects as input an array withndims(x) == N + 2, i.e. channel and batch dimensions, after theNfeature dimensions, whereN = length(output_size).
Returns
Output of size
(out..., C, N)Empty
NamedTuple()
Lux.GlobalLPPool Type
GlobalLPPool(; p=2)Global LP Pooling layer. Transforms (w, h, c, b)-shaped input into (1, 1, c, b)-shaped output, by performing mean pooling on the complete (w, h)-shaped feature maps.
GPU Support
This layer is currently only supported on CPU.
Inputs
x: Data satisfyingndims(x) > 2, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
Output of the pooling
yof size(1, ..., 1, C, N)Empty
NamedTuple()
Lux.GlobalMaxPool Type
GlobalMaxPool()Global Max Pooling layer. Transforms (w, h, c, b)-shaped input into (1, 1, c, b)-shaped output, by performing mean pooling on the complete (w, h)-shaped feature maps.
Inputs
x: Data satisfyingndims(x) > 2, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
Output of the pooling
yof size(1, ..., 1, C, N)Empty
NamedTuple()
Lux.GlobalMeanPool Type
GlobalMeanPool()Global Mean Pooling layer. Transforms (w, h, c, b)-shaped input into (1, 1, c, b)-shaped output, by performing mean pooling on the complete (w, h)-shaped feature maps.
Inputs
x: Data satisfyingndims(x) > 2, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
Output of the pooling
yof size(1, ..., 1, C, N)Empty
NamedTuple()
Lux.LPPool Type
LPPool(window; stride=window, pad=0, dilation=1, p=2)LP Pooling layer, which replaces all pixels in a block of size window with the reduction operation: lp.
Arguments
window: Tuple of integers specifying the size of the window. Eg, for 2D poolinglength(window) == 2
Keyword Arguments
stride: Should each be either single integer, or a tuple withNintegersdilation: Should each be either single integer, or a tuple withNintegerspad: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
Nintegers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*Nintegers, for asymmetric padding, orthe singleton
SamePad(), to calculate padding such thatsize(output,d) == size(x,d) / stride(possibly rounded) for each spatial dimension.
GPU Support
This layer is currently only supported on CPU.
Extended Help
Inputs
x: Data satisfyingndims(x) == N + 2, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
- Output of the pooling
yof size(O_N, ..., O_1, C, N)where
- Empty
NamedTuple()
Lux.MaxPool Type
MaxPool(window; stride=window, pad=0, dilation=1)Max Pooling layer, which replaces all pixels in a block of size window with the reduction operation: max.
Arguments
window: Tuple of integers specifying the size of the window. Eg, for 2D poolinglength(window) == 2
Keyword Arguments
stride: Should each be either single integer, or a tuple withNintegersdilation: Should each be either single integer, or a tuple withNintegerspad: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
Nintegers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*Nintegers, for asymmetric padding, orthe singleton
SamePad(), to calculate padding such thatsize(output,d) == size(x,d) / stride(possibly rounded) for each spatial dimension.
Extended Help
Inputs
x: Data satisfyingndims(x) == N + 2, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
- Output of the pooling
yof size(O_N, ..., O_1, C, N)where
- Empty
NamedTuple()
Lux.MeanPool Type
MeanPool(window; stride=window, pad=0, dilation=1)Mean Pooling layer, which replaces all pixels in a block of size window with the reduction operation: mean.
Arguments
window: Tuple of integers specifying the size of the window. Eg, for 2D poolinglength(window) == 2
Keyword Arguments
stride: Should each be either single integer, or a tuple withNintegersdilation: Should each be either single integer, or a tuple withNintegerspad: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
Nintegers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*Nintegers, for asymmetric padding, orthe singleton
SamePad(), to calculate padding such thatsize(output,d) == size(x,d) / stride(possibly rounded) for each spatial dimension.
Extended Help
Inputs
x: Data satisfyingndims(x) == N + 2, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
- Output of the pooling
yof size(O_N, ..., O_1, C, N)where
- Empty
NamedTuple()
Recurrent Layers
Lux.GRUCell Type
GRUCell((in_dims, out_dims)::Pair{<:Int,<:Int}; use_bias=true, train_state::Bool=false,
init_weight=glorot_uniform, init_recurrent_weight=init_weight,
init_bias=nothing, init_state=zeros32)Gated Recurrent Unit (GRU) Cell
Arguments
in_dims: Input Dimensionout_dims: Output (Hidden State) Dimensionuse_bias: Set to false to deactivate biastrain_state: Trainable initial hidden state can be activated by setting this totrueinit_bias: Initializer for bias. Must be a tuple containing 3 functions. If a single value is passed, it is copied into a 3 element tuple. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_weight: Initializer for weight. Must be a tuple containing 3 functions. If a single value is passed, it is copied into a 3 element tuple. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_recurrent_weight: Initializer for weight. Must be a tuple containing 3 functions. If a single value is passed, it is copied into a 3 element tuple. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_state: Initializer for hidden state
Inputs
Case 1a: Only a single input
xof shape(in_dims, batch_size),train_stateis set tofalse- Creates a hidden state usinginit_stateand proceeds to Case 2.Case 1b: Only a single input
xof shape(in_dims, batch_size),train_stateis set totrue- Repeatshidden_statefrom parameters to match the shape ofxand proceeds to Case 2.Case 2: Tuple
(x, (h, ))is provided, then the output and a tuple containing the updated hidden state is returned.
Returns
Tuple containing
Output
of shape (out_dims, batch_size)Tuple containing new hidden state
Updated model state
Parameters
weight_ih: Concatenated Weights to map from input space. weight_hh: Concatenated Weights to map from hidden space. bias_ih: Concatenated Bias vector for the input space(not present if use_bias=false).bias_hh: Concatenated Bias vector for the hidden space(not present if use_bias=false).hidden_state: Initial hidden state vector (not present iftrain_state=false).
States
rng: Controls the randomness (if any) in the initial state generation
Lux.LSTMCell Type
LSTMCell(in_dims => out_dims; use_bias::Bool=true, train_state::Bool=false,
train_memory::Bool=false, init_weight=nothing,
init_recurrent_weight=init_weight,
init_bias=nothing, init_state=zeros32, init_memory=zeros32)Long Short-Term (LSTM) Cell
Arguments
in_dims: Input Dimensionout_dims: Output (Hidden State & Memory) Dimensionuse_bias: Set to false to deactivate biastrain_state: Trainable initial hidden state can be activated by setting this totruetrain_memory: Trainable initial memory can be activated by setting this totrueinit_bias: Initializer for bias. Must be a tuple containing 4 functions. If a single value is passed, it is copied into a 4 element tuple. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_weight: Initializer for weight. Must be a tuple containing 4 functions. If a single value is passed, it is copied into a 4 element tuple. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_recurrent_weight: Initializer for recurrent weight. Must be a tuple containing 4 functions. If a single value is passed, it is copied into a 4 element tuple. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_state: Initializer for hidden stateinit_memory: Initializer for memory
Inputs
Case 1a: Only a single input
xof shape(in_dims, batch_size),train_stateis set tofalse,train_memoryis set tofalse- Creates a hidden state usinginit_state, hidden memory usinginit_memoryand proceeds to Case 2.Case 1b: Only a single input
xof shape(in_dims, batch_size),train_stateis set totrue,train_memoryis set tofalse- Repeatshidden_statevector from the parameters to match the shape ofx, creates hidden memory usinginit_memoryand proceeds to Case 2.Case 1c: Only a single input
xof shape(in_dims, batch_size),train_stateis set tofalse,train_memoryis set totrue- Creates a hidden state usinginit_state, repeats the memory vector from parameters to match the shape ofxand proceeds to Case 2.Case 1d: Only a single input
xof shape(in_dims, batch_size),train_stateis set totrue,train_memoryis set totrue- Repeats the hidden state and memory vectors from the parameters to match the shape ofxand proceeds to Case 2.Case 2: Tuple
(x, (h, c))is provided, then the output and a tuple containing the updated hidden state and memory is returned.
Returns
Tuple Containing
Output
of shape (out_dims, batch_size)Tuple containing new hidden state
and new memory
Updated model state
Parameters
weight_ih: Concatenated Weights to map from input space. weight_hh: Concatenated Weights to map from hidden spacebias_ih: Bias vector for the input-hidden connection (not present ifuse_bias=false)bias_hh: Concatenated Bias vector for the hidden-hidden connection (not present ifuse_bias=false)hidden_state: Initial hidden state vector (not present iftrain_state=false)memory: Initial memory vector (not present iftrain_memory=false)
States
rng: Controls the randomness (if any) in the initial state generation
Lux.RNNCell Type
RNNCell(in_dims => out_dims, activation=tanh; use_bias=True(), train_state=False(),
init_bias=nothing, init_weight=nothing, init_recurrent_weight=init_weight,
init_state=zeros32)An Elman RNNCell cell with activation (typically set to tanh or relu).
Arguments
in_dims: Input Dimensionout_dims: Output (Hidden State) Dimensionactivation: Activation functionuse_bias: Set to false to deactivate biastrain_state: Trainable initial hidden state can be activated by setting this totrueinit_bias: Initializer for bias. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_weight: Initializer for weight. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_recurrent_weight: Initializer for recurrent weight. Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(out_dims)).init_state: Initializer for hidden state
Inputs
Case 1a: Only a single input
xof shape(in_dims, batch_size),train_stateis set tofalse- Creates a hidden state usinginit_stateand proceeds to Case 2.Case 1b: Only a single input
xof shape(in_dims, batch_size),train_stateis set totrue- Repeatshidden_statefrom parameters to match the shape ofxand proceeds to Case 2.Case 2: Tuple
(x, (h, ))is provided, then the output and a tuple containing the updated hidden state is returned.
Returns
Tuple containing
Output
of shape (out_dims, batch_size)Tuple containing new hidden state
Updated model state
Parameters
weight_ih: Maps the input to the hidden state.weight_hh: Maps the hidden state to the hidden state.bias_ih: Bias vector for the input-hidden connection (not present ifuse_bias=false)bias_hh: Bias vector for the hidden-hidden connection (not present ifuse_bias=false)hidden_state: Initial hidden state vector (not present iftrain_state=false)
States
rng: Controls the randomness (if any) in the initial state generation
Lux.Recurrence Type
Recurrence(cell;
ordering::AbstractTimeSeriesDataBatchOrdering=BatchLastIndex(),
return_sequence::Bool=false)Wraps a recurrent cell (like RNNCell, LSTMCell, GRUCell) to automatically operate over a sequence of inputs.
Relation to Flux.Recur
This is completely distinct from Flux.Recur. It doesn't make the cell stateful, rather allows operating on an entire sequence of inputs at once. See StatefulRecurrentCell for functionality similar to Flux.Recur.
Arguments
cell: A recurrent cell. SeeRNNCell,LSTMCell,GRUCell, for how the inputs/outputs of a recurrent cell must be structured.
Keyword Arguments
return_sequence: Iftruereturns the entire sequence of outputs, else returns only the last output. Defaults tofalse.ordering: The ordering of the batch and time dimensions in the input. Defaults toBatchLastIndex(). Alternatively can be set toTimeLastIndex().
Extended Help
Inputs
- If
xis aTuple or Vector: Each element is fed to the
cellsequentially.Array (except a Vector): It is spliced along the penultimate dimension and each slice is fed to the
cellsequentially.
Returns
Output of the
cellfor the entire sequence.Update state of the
cell.
Tip
Frameworks like Tensorflow have special implementation of StackedRNNCells to handle sequentially composed RNN Cells. In Lux, one can simple stack multiple Recurrence blocks in a Chain to achieve the same.
Chain(
Recurrence(RNNCell(inputsize => latentsize); return_sequence=true),
Recurrence(RNNCell(latentsize => latentsize); return_sequence=true),
:
x -> stack(x; dims=2)
)For some discussion on this topic, see https://github.com/LuxDL/Lux.jl/issues/472.
Lux.StatefulRecurrentCell Type
StatefulRecurrentCell(cell)Wraps a recurrent cell (like RNNCell, LSTMCell, GRUCell) and makes it stateful.
To avoid undefined behavior, once the processing of a single sequence of data is complete, update the state with Lux.update_state(st, :carry, nothing).
Arguments
cell: A recurrent cell. SeeRNNCell,LSTMCell,GRUCell, for how the inputs/outputs of a recurrent cell must be structured.
Inputs
- Input to the
cell.
Returns
Output of the
cellfor the entire sequence.Update state of the
celland updatedcarry.
States
- NamedTuple containing:
cell: Same ascell.carry: The carry state of thecell.
Lux.BidirectionalRNN Type
BidirectionalRNN(cell::AbstractRecurrentCell,
backward_cell::Union{AbstractRecurrentCell, Nothing}=nothing;
merge_mode::Union{Function, Nothing}=vcat,
ordering::AbstractTimeSeriesDataBatchOrdering=BatchLastIndex())Bidirectional RNN wrapper.
Arguments
cell: A recurrent cell. SeeRNNCell,LSTMCell,GRUCell, for how the inputs/outputs of a recurrent cell must be structured.backward_cell: A optional backward recurrent cell. Ifbackward_cellisnothing, the rnn layer instance passed as thecellargument will be used to generate the backward layer automatically.in_dimsofbackward_cellshould be consistent within_dimsofcell
Keyword Arguments
merge_mode: Function by which outputs of the forward and backward RNNs will be combined. default value isvcat. Ifnothing, the outputs will not be combined.ordering: The ordering of the batch and time dimensions in the input. Defaults toBatchLastIndex(). Alternatively can be set toTimeLastIndex().
Extended Help
Inputs
- If
xis aTuple or Vector: Each element is fed to the
cellsequentially.Array (except a Vector): It is spliced along the penultimate dimension and each slice is fed to the
cellsequentially.
Returns
Merged output of the
cellandbackward_cellfor the entire sequence.Update state of the
cellandbackward_cell.
Parameters
NamedTuplewithcellandbackward_cell.
States
- Same as
cellandbackward_cell.
Linear Layers
Lux.Bilinear Type
Bilinear((in1_dims, in2_dims) => out, activation=identity; init_weight=nothing,
init_bias=nothing, use_bias=True())
Bilinear(in12_dims => out, activation=identity; init_weight=nothing,
init_bias=nothing, use_bias=True())Create a fully connected layer between two inputs and an output, and otherwise similar to Dense. Its output, given vectors x & y, is another vector z with, for all i in 1:out:
z[i] = activation(x' * W[i, :, :] * y + bias[i])
If x and y are matrices, then each column of the output z = B(x, y) is of this form, with B the Bilinear layer.
Arguments
in1_dims: number of input dimensions ofxin2_dims: number of input dimensions ofyin12_dims: If specified, thenin1_dims = in2_dims = in12_dimsout: number of output dimensionsactivation: activation function
Keyword Arguments
init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in1_dims, in2_dims)). Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(in1_dims)).init_bias: initializer for the bias vector (ignored ifuse_bias=false). Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(in1_dims)).use_bias: Trainable bias can be disabled entirely by setting this tofalse
Input
A 2-Tuple containing
xmust be an AbstractArray withsize(x, 1) == in1_dimsymust be an AbstractArray withsize(y, 1) == in2_dims
If the input is an AbstractArray, then
x = y
Returns
AbstractArray with dimensions
(out_dims, size(x, 2))Empty
NamedTuple()
Parameters
weight: Weight Matrix of size(out_dims, in1_dims, in2_dims)bias: Bias of size(out_dims, 1)(present ifuse_bias=true)
Lux.Dense Type
Dense(in_dims => out_dims, activation=identity; init_weight=nothing,
init_bias=nothing, use_bias=True())Create a traditional fully connected layer, whose forward pass is given by: y = activation.(weight * x .+ bias)
Arguments
in_dims: number of input dimensionsout_dims: number of output dimensionsactivation: activation function
Keyword Arguments
init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims)). Ifnothing, then we usekaiming_uniformwith gain computed on the basis of the activation function (taken from Pytorchnn.init.calculate_gain).init_bias: initializer for the bias vector (ignored ifuse_bias=false). Ifnothing, then we use uniform distribution with bounds-boundandboundwherebound = inv(sqrt(in_dims)).use_bias: Trainable bias can be disabled entirely by setting this tofalse
Input
xmust be an AbstractArray withsize(x, 1) == in_dims
Returns
AbstractArray with dimensions
(out_dims, ...)where...are the dimensions ofxEmpty
NamedTuple()
Parameters
weight: Weight Matrix of size(out_dims, in_dims)bias: Bias of size(out_dims, 1)(present ifuse_bias=true)
Lux.Scale Type
Scale(dims, activation=identity; init_weight=ones32, init_bias=zeros32, use_bias=True())Create a Sparsely Connected Layer with a very specific structure (only Diagonal Elements are non-zero). The forward pass is given by: y = activation.(weight .* x .+ bias)
Arguments
dims: size of the learnable scale and bias parameters.activation: activation function
Keyword Arguments
init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims))init_bias: initializer for the bias vector (ignored ifuse_bias=false)use_bias: Trainable bias can be disabled entirely by setting this tofalse
Input
xmust be an Array of size(dims..., B)or(dims...[0], ..., dims[k])fork ≤ size(dims)
Returns
Array of size
(dims..., B)or(dims...[0], ..., dims[k])fork ≤ size(dims)Empty
NamedTuple()
Parameters
weight: Weight Array of size(dims...)bias: Bias of size(dims...)
Attention Layers
Lux.MultiHeadAttention Type
MultiHeadAttention(dims; nheads=1, dense_kwargs=(; use_bias=False()),
attention_dropout_probability=0.0f0,
is_causal::Union{Bool,Nothing}=nothing)The multi-head dot-product attention layer used in Transformer architectures [1].
Arguments
dims: The embedding dimensions of inputs, intermediate tensors and outputs. In the most general case, it is given as- a)
(q_in_dim, k_in_dim, v_in_dim) => (qk_dim, v_dim) => out_dim.
Can take also simpler forms as
b)
dims::Int;c)
in_dim::Int => (qk_dim, v_dim) => out_dim;d)
in_dim::Int => qkv_dim => out_dim.
- a)
Keyword Arguments
nheads: number of heads.attention_dropout_probability: dropout probability for the attention scores.dense_kwargs: keyword arguments for the Dense layers. Defaultuse_bias=false.is_causal: whether the attention is causal. If this is provided, the attention mask will be automatically created (passing in themaskargument is not allowed and will throw an error).
Forward Pass Signature(s)
(m::MultiHeadAttention)(qkv, ps, st::NamedTuple)
(m::MultiHeadAttention)((q, kv), ps, st::NamedTuple)
(m::MultiHeadAttention)((q, k, v, [mask = nothing]), ps, st::NamedTuple)Inputs
qkv: a single input tensor for query, key and value. This corresponds to self-attention.(q, kv): a tuple of two input tensors for query and key-value.(q, k, v): a tuple of three input tensors for query, key and value.mask: an optional mask to apply to the attention scores. This must be broadcastable to the shape of the attention scores(kv_len, q_len, nheads, batch_size).
The query tensor q is expected to have shape (q_in_dim, q_len, batch_size), the key test k is expected to have shape (k_in_dim, kv_len, batch_size), the value tensor v is expected to have shape (v_in_dim, kv_len, batch_size).
Returns
A tuple of two elements. The first element is the output tensor of shape
(out_dim, q_len, batch_size)and the second element is the attention scores of shape(q_len, kv_len, nheads, batch_size).A NamedTuple of the states of the layer.
Extended Help
Examples
julia> m = MultiHeadAttention(64; nheads=8);
julia> ps, st = Lux.setup(Random.default_rng(), m);
julia> q = randn(Float32, 64, 10, 32);
julia> k = randn(Float32, 64, 20, 32);
julia> v = randn(Float32, 64, 20, 32);
julia> (y, α), st_new = m((q, k, v), ps, st);
julia> size(y)
(64, 10, 32)
julia> size(α)
(20, 10, 8, 32)
julia> (y, α), st_new = m(q, ps, st); # self-attention
julia> size(y)
(64, 10, 32)
julia> size(α)
(10, 10, 8, 32)Embedding Layers
Lux.Embedding Type
Embedding(in_dims => out_dims; init_weight=rand32)A lookup table that stores embeddings of dimension out_dims for a vocabulary of size in_dims. When the vocabulary is multi-dimensional, the input is expected to be a tuple of Cartesian indices.
This layer is often used to store word embeddings and retrieve them using indices.
Arguments
in_dims: number(s) of input dimensionsout_dims: number of output dimensions
Keyword Arguments
init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims...))
Input
Integer OR
Abstract Vector of Integers OR
Abstract Array of Integers OR
Tuple of Integers OR
Tuple of Abstract Vectors of Integers OR
Tuple of Abstract Arrays of Integers
Returns
Returns the embedding corresponding to each index in the input. For an N dimensional input, an N + 1 dimensional output is returned.
Empty
NamedTuple()
Gradients with Tracker.jl
Tracker.jl produces incorrect gradients for this layer if indices in the input are repeated. Don't use this layer with Tracker.jl if you need to compute gradients.
Lux.RotaryPositionalEmbedding Type
RotaryPositionalEmbedding(
dim::IntegerType;
max_sequence_length::IntegerType=4096,
base::IntegerType=10000,
low_memory_variant::Bool=true,
)Rotary Positional Embedding. For details see Su et al. [2].
The traditional implementation rotates consecutive pairs of elements in the feature dimension while the default implementation rotates pairs with stride half the feature dimensions for efficiency.
Arguments
dim: The feature dimensions to be rotated. If the input feature is larger than dims then the rest is left unchanged.
Keyword Arguments
base: The base used to compute angular frequency for each dimension in the positional encodings. Default:10000.max_sequence_length: The maximum sequence length. Default:4096.low_memory_variant: Iftruethen cos and sin cache have leading dimension ofdim ÷ 2. Iffalsethen cos and sin cache have leading dimension ofdim. Default:true.
Input
- 4D
AbstractArraysuch thatsize(x, 1) == dimsize(x, 3) ≤ max_sequence_length
Returns
- 4D
AbstractArrayof the same size as the input.
States
- NamedTuple containing
cos_cacheandsin_cache.
Lux.SinusoidalPositionalEmbedding Type
SinusoidalPositionalEmbedding(
dims::IntegerType; min_freq=0.0001f0, max_freq=1.0f0,
scale=nothing, full_turns::Bool=false
)Sinusoidal Positional Embedding. For details see Vaswani et al. [1].
Arguments
dims: The dimensionality of the resulting positional embeddings.
Keyword Arguments
min_freq: The minimum frequency expected. Default:0.0001f0.max_freq: The maximum frequency expected. Default:1.0f0.scale: A multiplicative scale for the embeddings. Default:. full_turns: Iftruemultiply the frequencies with. Default: false.
Input
- AbstractArray
Returns
- If the input array is of size
(insz...,)then the output is of size(dims, insz...).
States
- NamedTuple containing
sigmas.
Functional API
Lux.apply_rotary_embedding Function
apply_rotary_embedding(x::AbstractArray{T,4}, cos_cache::AbstractMatrix,
sin_cache::AbstractMatrix; head_dim::Integer, seq_dim::Integer)
apply_rotary_embedding(x::AbstractArray{T,4}, input_positions::AbstractVector{<:Integer},
cos_cache::AbstractMatrix, sin_cache::AbstractMatrix;
head_dim::Integer, seq_dim::Integer)Apply rotary embedding to the input x using the cos_cache and sin_cache parameters. If input_positions is provided, then we extract the cosine and sine cache for the corresponding positions in the sequence. Otherwise, we use the entire cache upto sequence length of x.
Arguments
x: 4DAbstractArray.cos_cache: Cache of cosine values. Generated usingcompute_rotary_embedding_params.sin_cache: Cache of sine values. Generated usingcompute_rotary_embedding_params.seq_dim: Dimension of the sequence. Must be between 1 and 4.input_positions: Positions in the sequence to extract the cosine and sine cache for. If not provided, then we use the entire cache upto sequence length ofx.
Returns
- Output of the rotary embedding.
Lux.compute_rotary_embedding_params Function
compute_rotary_embedding_params(head_dim::Integer, max_sequence_length::Integer;
base::Number, dtype::Type{T}=Float32,
low_memory_variant::Bool=true)Computes the cosine and sine cache for rotary positional embeddings.
Arguments
head_dim: The feature dimensions to be rotated.max_sequence_length: The maximum sequence length. Default:4096.
Keyword Arguments
base: The base used to compute angular frequency for each dimension in the positional encodings. Default:10000.dtype: The data type of the cache. Default:Float32.low_memory_variant: Iftruethen cos and sin cache have leading dimension ofhead_dim ÷ 2. Iffalsethen cos and sin cache have leading dimension ofhead_dim. Default:true.
Misc. Helper Layers
Lux.FlattenLayer Type
FlattenLayer(; N = nothing)Flattens the passed array into a matrix.
Keyword Arguments
N: Flatten the firstNdimensions of the input array. Ifnothing, then all dimensions (except the last) are flattened. Note that the batch dimension is never flattened.
Inputs
x: AbstractArray
Returns
AbstractMatrix of size
(:, size(x, ndims(x)))ifNisnothingelse the firstNdimensions of the input array are flattened.Empty
NamedTuple()
Example
julia> model = FlattenLayer()
FlattenLayer{Nothing}(nothing)
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = randn(rng, Float32, (2, 2, 2, 2));
julia> y, st_new = model(x, ps, st);
size(y)
(8, 2)Lux.Maxout Type
Maxout(layers...)
Maxout(; layers...)
Maxout(f::Function, n_alts::Int)This contains a number of internal layers, each of which receives the same input. Its output is the elementwise maximum of the the internal layers' outputs.
Maxout over linear dense layers satisfies the universal approximation theorem [3].
See also Parallel to reduce with other operators.
Arguments
- Layers can be specified in three formats:
A list of
NLux layersSpecified as
Nkeyword arguments.A no argument function
fand an integern_altswhich specifies the number of layers.
Extended Help
Inputs
x: Input that is passed to each of the layers
Returns
Output is computed by taking elementwise
maxof the outputs of the individual layers.Updated state of the
layers
Parameters
- Parameters of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)
States
- States of each
layerwrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N(naming changes if using the kwargs API)
Lux.NoOpLayer Type
NoOpLayer()As the name suggests does nothing but allows pretty printing of layers. Whatever input is passed is returned.
Example
julia> model = NoOpLayer()
NoOpLayer()
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = 1
1
julia> y, st_new = model(x, ps, st)
(1, NamedTuple())Lux.ReshapeLayer Type
ReshapeLayer(dims)Reshapes the passed array to have a size of (dims..., :)
Arguments
dims: The new dimensions of the array (excluding the last dimension).
Inputs
x: AbstractArray of any shape which can be reshaped in(dims..., size(x, ndims(x)))
Returns
AbstractArray of size
(dims..., size(x, ndims(x)))Empty
NamedTuple()
Example
julia> model = ReshapeLayer((2, 2))
ReshapeLayer(output_dims = (2, 2, :))
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = randn(rng, Float32, (4, 1, 3));
julia> y, st_new = model(x, ps, st);
size(y)
(2, 2, 3)Lux.SelectDim Type
SelectDim(dim, i)Return a view of all the data of the input x where the index for dimension dim equals i. Equivalent to view(x,:,:,...,i,:,:,...) where i is any valid index for index slot d (e.g. an integer or a unit range). Note that it may be inefficient to use non-contiguous views.
Arguments
dim: Dimension for indexingi: Index or indices for dimensiondim
Inputs
x: AbstractArray that can be indexed withview(x,:,:,...,i,:,:,...)
Returns
view(x,:,:,...,i,:,:,...)whereiis in positiondEmpty
NamedTuple()
Lux.WrappedFunction Type
WrappedFunction(f)Wraps a stateless and parameter less function. Might be used when a function is added to Chain. For example, Chain(x -> relu.(x)) would not work and the right thing to do would be Chain((x, ps, st) -> (relu.(x), st)). An easier thing to do would be Chain(WrappedFunction(Base.Fix1(broadcast, relu)))
Arguments
f: Some function.
Inputs
x: will be directly passed tof
Returns
Output of
f(x)Empty
NamedTuple()
Lux.ReverseSequence Type
ReverseSequence(dim = nothing)Reverse the specified dimension dims of the passed array
Arguments
dim: Dimension that need to be reversed. Ifnothing, for AbstractVector{T} it reverses itself (dimension 1), for other arrays, reverse the dimensionndims(x) - 1.
Inputs
x: AbstractArray.
Returns
AbstractArray with the same dimensions as the input
Empty
NamedTuple()
Example
julia> model = ReverseSequence()
ReverseSequence{Nothing}(nothing)
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = [1.0, 2.0, 3.0];
julia> y, st_new = model(x, ps, st)
([3.0, 2.0, 1.0], NamedTuple())Normalization Layers
Lux.BatchNorm Type
BatchNorm(chs::Integer, activation=identity; init_bias=zeros32, init_scale=ones32,
affine=True(), track_stats=True(), epsilon=1f-5, momentum=0.1f0)Batch Normalization layer.
BatchNorm computes the mean and variance for each
Arguments
chs: Size of the channel dimension in your data. Given an array withNdimensions, call theN-1th the channel dimension. For a batch of feature vectors this is just the data dimension, forWHCNimages it's the usual channel dimension.activation: After normalization, elementwise activationactivationis applied.
Keyword Arguments
If
track_stats=true, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase.epsilon: a value added to the denominator for numerical stabilitymomentum: the value used for therunning_meanandrunning_varcomputationIf
affine=true, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.init_bias: Controls how thebiasis initializedinit_scale: Controls how thescaleis initialized
Extended Help
Inputs
x: Array wheresize(x, N - 1) = chs
Returns
y: Normalized ArrayUpdate model state
Parameters
affine=truebias: Bias of shape(chs,)scale: Scale of shape(chs,)
affine=false- EmptyNamedTuple()
States
Statistics if
track_stats=truerunning_mean: Running mean of shape(chs,)running_var: Running variance of shape(chs,)
Statistics if
track_stats=falserunning_mean: nothingrunning_var: nothing
training: Used to check if training/inference mode
Use Lux.testmode during inference.
Example
julia> Chain(Dense(784 => 64), BatchNorm(64, relu), Dense(64 => 10), BatchNorm(10))
Chain(
layer_1 = Dense(784 => 64), # 50_240 parameters
layer_2 = BatchNorm(64, relu, affine=true, track_stats=true), # 128 parameters, plus 129 non-trainable
layer_3 = Dense(64 => 10), # 650 parameters
layer_4 = BatchNorm(10, affine=true, track_stats=true), # 20 parameters, plus 21 non-trainable
) # Total: 51_038 parameters,
# plus 150 states.Warning
Passing a batch size of 1, during training will result in an error.
See also BatchNorm, InstanceNorm, LayerNorm, WeightNorm
Lux.GroupNorm Type
GroupNorm(chs::Integer, groups::Integer, activation=identity; init_bias=zeros32,
init_scale=ones32, affine=true, epsilon=1f-5)Group Normalization layer.
Arguments
chs: Size of the channel dimension in your data. Given an array withNdimensions, call theN-1th the channel dimension. For a batch of feature vectors this is just the data dimension, forWHCNimages it's the usual channel dimension.groupsis the number of groups along which the statistics are computed. The number of channels must be an integer multiple of the number of groups.activation: After normalization, elementwise activationactivationis applied.
Keyword Arguments
epsilon: a value added to the denominator for numerical stabilityIf
affine=true, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.init_bias: Controls how thebiasis initializedinit_scale: Controls how thescaleis initialized
Extended Help
Inputs
x: Array wheresize(x, N - 1) = chsandndims(x) > 2
Returns
y: Normalized ArrayUpdate model state
Parameters
affine=truebias: Bias of shape(chs,)scale: Scale of shape(chs,)
affine=false- EmptyNamedTuple()
States
training: Used to check if training/inference mode
Use Lux.testmode during inference.
Example
julia> Chain(Dense(784 => 64), GroupNorm(64, 4, relu), Dense(64 => 10), GroupNorm(10, 5))
Chain(
layer_1 = Dense(784 => 64), # 50_240 parameters
layer_2 = GroupNorm(64, 4, relu, affine=true), # 128 parameters
layer_3 = Dense(64 => 10), # 650 parameters
layer_4 = GroupNorm(10, 5, affine=true), # 20 parameters
) # Total: 51_038 parameters,
# plus 0 states.See also GroupNorm, InstanceNorm, LayerNorm, WeightNorm
Lux.InstanceNorm Type
InstanceNorm(chs::Integer, activation=identity; init_bias=zeros32, init_scale=ones32,
affine=False(), track_stats=False(), epsilon=1f-5, momentum=0.1f0)Instance Normalization. For details see Ulyanov et al. [4].
Instance Normalization computes the mean and variance for each
Arguments
chs: Size of the channel dimension in your data. Given an array withNdimensions, call theN-1th the channel dimension. For a batch of feature vectors this is just the data dimension, forWHCNimages it's the usual channel dimension.activation: After normalization, elementwise activationactivationis applied.
Keyword Arguments
If
track_stats=true, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase.epsilon: a value added to the denominator for numerical stabilitymomentum: the value used for therunning_meanandrunning_varcomputationIf
affine=true, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.init_bias: Controls how thebiasis initializedinit_scale: Controls how thescaleis initialized
Extended Help
Inputs
x: Array wheresize(x, N - 1) = chsandndims(x) > 2
Returns
y: Normalized ArrayUpdate model state
Parameters
affine=truebias: Bias of shape(chs,)scale: Scale of shape(chs,)
affine=false- EmptyNamedTuple()
States
Statistics if
track_stats=truerunning_mean: Running mean of shape(chs,)running_var: Running variance of shape(chs,)
Statistics if
track_stats=falserunning_mean: nothingrunning_var: nothing
training: Used to check if training/inference mode
Use Lux.testmode during inference.
Example
julia> Chain(Dense(784 => 64), InstanceNorm(64, relu; affine=true), Dense(64 => 10),
InstanceNorm(10, relu; affine=true))
Chain(
layer_1 = Dense(784 => 64), # 50_240 parameters
layer_2 = InstanceNorm(64, relu, affine=true, track_stats=false), # 128 parameters, plus 1 non-trainable
layer_3 = Dense(64 => 10), # 650 parameters
layer_4 = InstanceNorm(10, relu, affine=true, track_stats=false), # 20 parameters, plus 1 non-trainable
) # Total: 51_038 parameters,
# plus 2 states.See also BatchNorm, GroupNorm, LayerNorm, WeightNorm
Lux.LayerNorm Type
LayerNorm(shape::NTuple{N, Int}, activation=identity; epsilon=1f-5, dims=Colon(),
affine=true, init_bias=zeros32, init_scale=ones32)Computes mean and standard deviation over the whole input array, and uses these to normalize the whole array. Optionally applies an elementwise affine transformation afterwards.
Given an input array
where affine=true.
Arguments
shape: Broadcastable shape of input array excluding the batch dimension.activation: After normalization, elementwise activationactivationis applied.
Keyword Arguments
epsilon: a value added to the denominator for numerical stability.dims: Dimensions to normalize the array over.If
affine=true, it also applies a shift and a rescale to the input through to learnable per-element bias and scale parameters.init_bias: Controls how thebiasis initializedinit_scale: Controls how thescaleis initialized
Extended Help
Inputs
x: AbstractArray
Returns
y: Normalized ArrayEmpty NamedTuple()
Parameters
affine=false: EmptyNamedTuple()affine=truebias: Bias of shape(shape..., 1)scale: Scale of shape(shape..., 1)
Lux.WeightNorm Type
WeightNorm(layer::AbstractLuxLayer, which_params::NTuple{N, Symbol},
dims::Union{Tuple, Nothing}=nothing)Applies weight normalization to a parameter in the given layer.
Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This updates the parameters in which_params (e.g. weight) using two parameters: one specifying the magnitude (e.g. weight_g) and one specifying the direction (e.g. weight_v).
Arguments
layerwhose parameters are being reparameterizedwhich_params: parameter names for the parameters being reparameterizedBy default, a norm over the entire array is computed. Pass
dimsto modify the dimension.
Inputs
x: Should be of valid type for input tolayer
Returns
Output from
layerUpdated model state of
layer
Parameters
normalized: Parameters oflayerthat are being normalizedunnormalized: Parameters oflayerthat are not being normalized
States
- Same as that of
layer
Lux.RMSNorm Type
RMSNorm(normalized_shape::Dims; epsilon=1.0f-5, affine=true)
RMSNorm(dim::Integer...; kwargs...)Root Mean Square Normalization layer. It normalizes the input by computing the root mean square (RMS) of the first N dimensions of the input where N is the length of normalized_shape.
Arguments
normalized_shape: The input shape from which the RMS normalization factor is computed. The input is expected to have a shape that can be broadcast withnormalized_shape.
Keyword Arguments
epsilon: A small value for numerical stability.affine: Iftrue, learns a scale parameter.
Extended Help
Inputs
x: Array of size(normalized_shape..., *, *..., *)
Returns
y: Normalized Array of same shape asxEmpty
NamedTuple()
Upsampling
Lux.PixelShuffle Type
PixelShuffle(r::Int)Pixel shuffling layer with upscale factor r. Usually used for generating higher resolution images while upscaling them.
See NNlib.pixel_shuffle for more details.
Arguments
r: Upscale factor
Inputs
x: For 4D-arrays representing N images, the operation converts inputsize(x) == (W, H, r² x C, N)to output of size(r x W, r x H, C, N). For D-dimensional data, it expectsndims(x) == D + 2with channel and batch dimensions, and divides the number of channels byrᴰ.
Returns
- Output of size
(r x W, r x H, C, N)for 4D-arrays, and(r x W, r x H, ..., C, N)for D-dimensional data, whereD = ndims(x) - 2
Lux.Upsample Type
Upsample(mode = :nearest; [scale, size, align_corners=false])
Upsample(scale, mode = :nearest)Upsampling Layer.
Layer Construction
Option 1
mode: Set to:nearest,:linear,:bilinearor:trilinear
Exactly one of two keywords must be specified:
If
scaleis a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.Alternatively, keyword
sizeaccepts a tuple, to directly specify the leading dimensions of the output.
Option 2
If
scaleis a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.mode: Set to:nearest,:bilinearor:trilinear
Currently supported upsampling modes and corresponding NNlib's methods are:
:nearest->NNlib.upsample_nearest:bilinear->NNlib.upsample_bilinear:trilinear->NNlib.upsample_trilinear
Extended Help
Other Keyword Arguments
align_corners: Iftrue, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is one of:bilinearor:trilinear.
Inputs
x: For the input dimensions look into the documentation for the correspondingNNlibfunctionAs a rule of thumb,
:nearestshould work with arrays of arbitrary dimensions:bilinearworks with 4D Arrays:trilinearworks with 5D Arrays
Returns
Upsampled Input of size
sizeor of size(I_1 x scale[1], ..., I_N x scale[N], C, N)Empty
NamedTuple()