Built-In Layers
Index
Lux.AdaptiveLPPool
Lux.AdaptiveMaxPool
Lux.AdaptiveMeanPool
Lux.AlphaDropout
Lux.BatchNorm
Lux.BidirectionalRNN
Lux.Bilinear
Lux.BranchLayer
Lux.Chain
Lux.Conv
Lux.ConvTranspose
Lux.Dense
Lux.Dropout
Lux.Embedding
Lux.FlattenLayer
Lux.GRUCell
Lux.GlobalLPPool
Lux.GlobalMaxPool
Lux.GlobalMeanPool
Lux.GroupNorm
Lux.InstanceNorm
Lux.LPPool
Lux.LSTMCell
Lux.LayerNorm
Lux.MaxPool
Lux.Maxout
Lux.MeanPool
Lux.NoOpLayer
Lux.PairwiseFusion
Lux.Parallel
Lux.PixelShuffle
Lux.RNNCell
Lux.Recurrence
Lux.RepeatedLayer
Lux.ReshapeLayer
Lux.ReverseSequence
Lux.Scale
Lux.SelectDim
Lux.SkipConnection
Lux.StatefulRecurrentCell
Lux.Upsample
Lux.VariationalHiddenDropout
Lux.WeightNorm
Lux.WrappedFunction
Containers
BranchLayer(layers...)
BranchLayer(; name=nothing, layers...)
Takes an input x
and passes it through all the layers
and returns a tuple of the outputs.
Arguments
- Layers can be specified in two formats:
A list of
N
Lux layersSpecified as
N
keyword arguments.
Extended Help
Inputs
x
: Will be directly passed to each of thelayers
Returns
Tuple:
(layer_1(x), layer_2(x), ..., layer_N(x))
(naming changes if using the kwargs API)Updated state of the
layers
Parameters
- Parameters of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
States
- States of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
Comparison with Parallel
This is slightly different from Parallel(nothing, layers...)
If the input is a tuple,
Parallel
will pass each element individually to each layer.BranchLayer
essentially assumes 1 input comes in and is branched out intoN
outputs.
Example
An easy way to replicate an input to an NTuple is to do
julia> BranchLayer(NoOpLayer(), NoOpLayer(), NoOpLayer())
BranchLayer(
layer_1 = NoOpLayer(),
layer_2 = NoOpLayer(),
layer_3 = NoOpLayer(),
) # Total: 0 parameters,
# plus 0 states.
Chain(layers...; name=nothing)
Chain(; layers..., name=nothing)
Collects multiple layers / functions to be called in sequence on a given input.
Arguments
- Layers can be specified in two formats:
A list of
N
Lux layersSpecified as
N
keyword arguments.
Extended Help
Inputs
Input x
is passed sequentially to each layer, and must conform to the input requirements of the internal layers.
Returns
Output after sequentially applying all the layers to
x
Updated model states
Parameters
- Parameters of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
States
- States of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
Miscellaneous Properties
- Allows indexing and field access syntax. We can access the
i
th layer bym[i]
orm.layer_i
. We can also index using ranges or arrays.
Example
julia> Chain(Dense(2, 3, relu), BatchNorm(3), Dense(3, 2))
Chain(
layer_1 = Dense(2 => 3, relu), # 9 parameters
layer_2 = BatchNorm(3, affine=true, track_stats=true), # 6 parameters, plus 7
layer_3 = Dense(3 => 2), # 8 parameters
) # Total: 23 parameters,
# plus 7 states.
julia> Chain(Dense(2, 3, relu), BatchNorm(3), Dense(3, 2); name="MyFancyChain")
MyFancyChain(
layer_1 = Dense(2 => 3, relu), # 9 parameters
layer_2 = BatchNorm(3, affine=true, track_stats=true), # 6 parameters, plus 7
layer_3 = Dense(3 => 2), # 8 parameters
) # Total: 23 parameters,
# plus 7 states.
PairwiseFusion(connection, layers...; name=nothing)
PairwiseFusion(connection; name=nothing, layers...)
PairwiseFusion(; connection, layers..., name=nothing)
x1 → layer1 → y1 ↘
connection → layer2 → y2 ↘
x2 ↗ connection → y3
x3 ↗
Arguments
connection
: Takes 2 inputs and combines themlayers
:AbstractLuxLayer
s. Layers can be specified in two formats:A list of
N
Lux layersSpecified as
N
keyword arguments.
Extended Help
Inputs
Layer behaves differently based on input type:
- If the input
x
is a tuple of lengthN + 1
, then thelayers
must be a tuple of lengthN
. The computation is as follows
y = x[1]
for i in 1:N
y = connection(x[i + 1], layers[i](y))
end
- Any other kind of input
y = x
for i in 1:N
y = connection(x, layers[i](y))
end
Returns
See Inputs section for how the return value is computed
Updated model state for all the contained layers
Parameters
- Parameters of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
States
- States of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
Parallel(connection, layers...; name=nothing)
Parallel(connection; name=nothing, layers...)
Parallel(; connection, layers..., name=nothing)
Create a layer which passes an input to each path in layers
, before reducing the output with connection
.
Arguments
connection
: AnN
-argument function that is called after passing the input through each layer. Ifconnection = nothing
, we return a tupleParallel(nothing, f, g)(x, y) = (f(x), g(y))
Layers can be specified in two formats:
A list of
N
Lux layersSpecified as
N
keyword arguments.
Extended Help
Inputs
x
: Ifx
is not a tuple, then return is computed asconnection([l(x) for l in layers]...)
. Else one is passed to each layer, thusParallel(+, f, g)(x, y) = f(x) + g(y)
.
Returns
See the Inputs section for how the output is computed
Updated state of the
layers
Parameters
- Parameters of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
States
- States of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
See also SkipConnection
which is Parallel
with one identity.
Example
julia> model = Parallel(nothing, Dense(2, 1), Dense(2, 1))
Parallel(
layer_1 = Dense(2 => 1), # 3 parameters
layer_2 = Dense(2 => 1), # 3 parameters
) # Total: 6 parameters,
# plus 0 states.
julia> using Random;
rng = Random.seed!(123);
ps, st = Lux.setup(rng, model);
x1 = randn(rng, Float32, 2);
x2 = randn(rng, Float32, 2);
julia> size.(first(model((x1, x2), ps, st)))
((1,), (1,))
SkipConnection(layers, connection; name=nothing)
SkipConnection(; layers, connection, name=nothing)
Create a skip connection which consists of a layer or Chain
of consecutive layers and a shortcut connection linking the block's input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given layer
while the second is the unchanged, "skipped" input.
The simplest "ResNet"-type connection is just SkipConnection(layer, +)
.
Arguments
layer
: Layer orChain
of layers to be applied to the inputconnection
:A 2-argument function that takes
layer(input)
and the input ORAn AbstractLuxLayer that takes
(layer(input), input)
as input
Extended Help
Inputs
x
: Will be passed directly tolayer
Returns
Output of
connection(layer(input), input)
Updated state of
layer
Parameters
Parameters of
layer
ORIf
connection
is an AbstractLuxLayer, then NamedTuple with fields:layers
and:connection
States
States of
layer
ORIf
connection
is an AbstractLuxLayer, then NamedTuple with fields:layers
and:connection
See Parallel
for a more general implementation.
RepeatedLayer(model; repeats::Val = Val(10), input_injection::Val = Val(false))
Iteratively applies model
for repeats
number of times. The initial input is passed into the model repeatedly if input_injection = Val(true)
. This layer unrolls the computation, however, semantically this is same as:
input_injection = Val(false)
res = x
for i in 1:repeats
res, st = model(res, ps, st)
end
input_injection = Val(true)
res = x
for i in 1:repeats
res, st = model((res, x), ps, st)
end
It is expected that repeats
will be a reasonable number below 20
, beyond that compile times for gradients might be unreasonably high.
Arguments
model
must be anAbstractLuxLayer
Keyword Arguments
repeats
: Number of times to apply the modelinput_injection
: Iftrue
, then the input is passed to the model along with the output
Extended Help
Inputs
x
: Input as described above
Returns
Output is computed by as described above
Updated state of the
model
Parameters
- Parameters of
model
States
- State of
model
Convolutional Layers
Conv(k::NTuple{N,Integer}, (in_chs => out_chs)::Pair{<:Integer,<:Integer},
activation=identity; init_weight=nothing, init_bias=nothing, stride=1,
pad=0, dilation=1, groups=1, use_bias=True(), cross_correlation=False())
Standard convolutional layer.
Conv 2D
Image data should be stored in WHCN order (width, height, channels, batch). In other words, a 100 x 100
RGB image would be a 100 x 100 x 3 x 1
array, and a batch of 50 would be a 100 x 100 x 3 x 50
array. This has N = 2
spatial dimensions, and needs a kernel size like (5, 5)
, a 2-tuple of integers. To take convolutions along N
feature dimensions, this layer expects as input an array with ndims(x) == N + 2
, where size(x, N + 1) == in_chs
is the number of input channels, and size(x, ndims(x))
is the number of observations in a batch.
Warning
Frameworks like Pytorch
perform cross-correlation in their convolution layers. Pass cross_correlation=true
to use cross-correlation instead.
Arguments
k
: Tuple of integers specifying the size of the convolutional kernel. Eg, for 2D convolutionslength(k) == 2
in_chs
: Number of input channelsout_chs
: Number of input and output channelsactivation
: Activation Function
Extended Help
Keyword Arguments
init_weight
: Controls the initialization of the weight parameter. Ifnothing
, then we usekaiming_uniform
with gain computed on the basis of the activation function (taken from Pytorchnn.init.calculate_gain
).init_bias
: Controls the initialization of the bias parameter. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(fan_in))
.stride
: Should each be either single integer, or a tuple withN
integersdilation
: Should each be either single integer, or a tuple withN
integerspad
: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
N
integers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*N
integers, for asymmetric padding, orthe singleton
SamePad()
, to calculate padding such thatsize(output,d) == size(x,d) / stride
(possibly rounded) for each spatial dimension.Periodic padding can achieved by pre-empting the layer with a
WrappedFunction(x -> NNlib.circular_pad(x, N_pad; dims=pad_dims))
groups
: Expected to be anInt
. It specifies the number of groups to divide a convolution into (setgroups = in_chs
for Depthwise Convolutions).in_chs
andout_chs
must be divisible bygroups
.use_bias
: Trainable bias can be disabled entirely by setting this tofalse
.cross_correlation
: Iftrue
, perform cross-correlation instead of convolution. Prior tov1
, Lux used to have aCrossCor
layer which performed cross-correlation. This was removed inv1
in favor ofConv
withcross_correlation=true
.
Inputs
x
: Data satisfyingndims(x) == N + 2 && size(x, N - 1) == in_chs
, i.e.size(x) = (I_N, ..., I_1, C_in, N)
Returns
- Output of the convolution
y
of size(O_N, ..., O_1, C_out, N)
where
- Empty
NamedTuple()
Parameters
weight
: Convolution kernelbias
: Bias (present ifuse_bias=true
)
ConvTranspose(k::NTuple{N,Integer}, (in_chs => out_chs)::Pair{<:Integer,<:Integer},
activation=identity; init_weight=glorot_uniform, init_bias=zeros32,
stride=1, pad=0, outpad=0, dilation=1, groups=1, use_bias=True(),
cross_correlation=False())
Standard convolutional transpose layer.
Arguments
k
: Tuple of integers specifying the size of the convolutional kernel. Eg, for 2D convolutionslength(k) == 2
in_chs
: Number of input channelsout_chs
: Number of input and output channelsactivation
: Activation Function
Keyword Arguments
init_weight
: Controls the initialization of the weight parameter. Ifnothing
, then we usekaiming_uniform
with gain computed on the basis of the activation function (taken from Pytorchnn.init.calculate_gain
).init_bias
: Controls the initialization of the bias parameter. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(fan_in))
.stride
: Should each be either single integer, or a tuple withN
integersdilation
: Should each be either single integer, or a tuple withN
integerspad
: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
N
integers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*N
integers, for asymmetric padding, orthe singleton
SamePad()
, to calculate padding such thatsize(output,d) == size(x,d) * stride
(possibly rounded) for each spatial dimension.
groups
: Expected to be anInt
. It specifies the number of groups to divide a convolution into (setgroups = in_chs
for Depthwise Convolutions).in_chs
andout_chs
must be divisible bygroups
.use_bias
: Trainable bias can be disabled entirely by setting this tofalse
.cross_correlation
: Iftrue
, perform transposed cross-correlation instead of transposed convolution.outpad
: To converseConv
inversability whenstride > 1
,outpad
can be used to increase the size of the output in the desired dimensions. Whereaspad
is used to zero-pad the input,outpad
only affects the output shape.
Extended Help
Inputs
x
: Data satisfyingndims(x) == N + 2 && size(x, N - 1) == in_chs
, i.e.size(x) = (I_N, ..., I_1, C_in, N)
Returns
Output of the convolution transpose
y
of size(O_N, ..., O_1, C_out, N)
whereEmpty
NamedTuple()
Parameters
weight
: Convolution Transpose kernelbias
: Bias (present ifuse_bias=true
)
Dropout Layers
AlphaDropout(p::Real)
AlphaDropout layer.
Arguments
p
: Probability of Dropoutif
p = 0
thenNoOpLayer
is returned.if
p = 1
thenWrappedLayer(Base.Fix1(broadcast, zero))
is returned.
Inputs
x
: Must be an AbstractArray
Returns
x
with dropout mask applied iftraining=Val(true)
else justx
State with updated
rng
States
rng
: Pseudo Random Number Generatortraining
: Used to check if training/inference mode
Call Lux.testmode
to switch to test mode.
See also Dropout
, VariationalHiddenDropout
Dropout(p; dims=:)
Dropout layer.
Arguments
p
: Probability of Dropout (ifp = 0
thenNoOpLayer
is returned)
Keyword Arguments
- To apply dropout along certain dimension(s), specify the
dims
keyword. e.g.Dropout(p; dims = (3,4))
will randomly zero out entire channels on WHCN input (also called 2D dropout).
Inputs
x
: Must be an AbstractArray
Returns
x
with dropout mask applied iftraining=Val(true)
else justx
State with updated
rng
States
rng
: Pseudo Random Number Generatortraining
: Used to check if training/inference mode
Call Lux.testmode
to switch to test mode.
See also AlphaDropout
, VariationalHiddenDropout
VariationalHiddenDropout(p; dims=:)
VariationalHiddenDropout layer. The only difference from Dropout is that the mask
is retained until Lux.update_state(l, :update_mask, Val(true))
is called.
Arguments
p
: Probability of Dropout (ifp = 0
thenNoOpLayer
is returned)
Keyword Arguments
- To apply dropout along certain dimension(s), specify the
dims
keyword. e.g.VariationalHiddenDropout(p; dims = 3)
will randomly zero out entire channels on WHCN input (also called 2D dropout).
Inputs
x
: Must be an AbstractArray
Returns
x
with dropout mask applied iftraining=Val(true)
else justx
State with updated
rng
States
rng
: Pseudo Random Number Generatortraining
: Used to check if training/inference modemask
: Dropout mask. Initilly set to nothing. After every run, contains the mask applied in that callupdate_mask
: Stores whether new mask needs to be generated in the current call
Call Lux.testmode
to switch to test mode.
See also AlphaDropout
, Dropout
Pooling Layers
AdaptiveLPPool(output_size; p=2)
Adaptive LP Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == output_size
.
Arguments
output_size
: Size of the firstN
dimensions for the output
GPU Support
This layer is currently only supported on CPU.
Inputs
x
: Expects as input an array withndims(x) == N + 2
, i.e. channel and batch dimensions, after theN
feature dimensions, whereN = length(output_size)
.
Returns
Output of size
(out..., C, N)
Empty
NamedTuple()
AdaptiveMaxPool(output_size)
Adaptive Max Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == output_size
.
Arguments
output_size
: Size of the firstN
dimensions for the output
Inputs
x
: Expects as input an array withndims(x) == N + 2
, i.e. channel and batch dimensions, after theN
feature dimensions, whereN = length(output_size)
.
Returns
Output of size
(out..., C, N)
Empty
NamedTuple()
AdaptiveMeanPool(output_size)
Adaptive Mean Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == output_size
.
Arguments
output_size
: Size of the firstN
dimensions for the output
Inputs
x
: Expects as input an array withndims(x) == N + 2
, i.e. channel and batch dimensions, after theN
feature dimensions, whereN = length(output_size)
.
Returns
Output of size
(out..., C, N)
Empty
NamedTuple()
GlobalLPPool(; p=2)
Global LP Pooling layer. Transforms (w, h, c, b)
-shaped input into (1, 1, c, b)
-shaped output, by performing mean pooling on the complete (w, h)
-shaped feature maps.
GPU Support
This layer is currently only supported on CPU.
Inputs
x
: Data satisfyingndims(x) > 2
, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
Output of the pooling
y
of size(1, ..., 1, C, N)
Empty
NamedTuple()
GlobalMaxPool()
Global Max Pooling layer. Transforms (w, h, c, b)
-shaped input into (1, 1, c, b)
-shaped output, by performing mean pooling on the complete (w, h)
-shaped feature maps.
Inputs
x
: Data satisfyingndims(x) > 2
, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
Output of the pooling
y
of size(1, ..., 1, C, N)
Empty
NamedTuple()
GlobalMeanPool()
Global Mean Pooling layer. Transforms (w, h, c, b)
-shaped input into (1, 1, c, b)
-shaped output, by performing mean pooling on the complete (w, h)
-shaped feature maps.
Inputs
x
: Data satisfyingndims(x) > 2
, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
Output of the pooling
y
of size(1, ..., 1, C, N)
Empty
NamedTuple()
LPPool(window; stride=window, pad=0, dilation=1, p=2)
LP Pooling layer, which replaces all pixels in a block of size window
with the reduction operation: lp.
Arguments
window
: Tuple of integers specifying the size of the window. Eg, for 2D poolinglength(window) == 2
Keyword Arguments
stride
: Should each be either single integer, or a tuple withN
integersdilation
: Should each be either single integer, or a tuple withN
integerspad
: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
N
integers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*N
integers, for asymmetric padding, orthe singleton
SamePad()
, to calculate padding such thatsize(output,d) == size(x,d) / stride
(possibly rounded) for each spatial dimension.
GPU Support
This layer is currently only supported on CPU.
Extended Help
Inputs
x
: Data satisfyingndims(x) == N + 2
, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
- Output of the pooling
y
of size(O_N, ..., O_1, C, N)
where
- Empty
NamedTuple()
MaxPool(window; stride=window, pad=0, dilation=1)
Max Pooling layer, which replaces all pixels in a block of size window
with the reduction operation: max.
Arguments
window
: Tuple of integers specifying the size of the window. Eg, for 2D poolinglength(window) == 2
Keyword Arguments
stride
: Should each be either single integer, or a tuple withN
integersdilation
: Should each be either single integer, or a tuple withN
integerspad
: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
N
integers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*N
integers, for asymmetric padding, orthe singleton
SamePad()
, to calculate padding such thatsize(output,d) == size(x,d) / stride
(possibly rounded) for each spatial dimension.
Extended Help
Inputs
x
: Data satisfyingndims(x) == N + 2
, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
- Output of the pooling
y
of size(O_N, ..., O_1, C, N)
where
- Empty
NamedTuple()
MeanPool(window; stride=window, pad=0, dilation=1)
Mean Pooling layer, which replaces all pixels in a block of size window
with the reduction operation: mean.
Arguments
window
: Tuple of integers specifying the size of the window. Eg, for 2D poolinglength(window) == 2
Keyword Arguments
stride
: Should each be either single integer, or a tuple withN
integersdilation
: Should each be either single integer, or a tuple withN
integerspad
: Specifies the number of elements added to the borders of the data array. It can bea single integer for equal padding all around,
a tuple of
N
integers, to apply the same padding at begin/end of each spatial dimension,a tuple of
2*N
integers, for asymmetric padding, orthe singleton
SamePad()
, to calculate padding such thatsize(output,d) == size(x,d) / stride
(possibly rounded) for each spatial dimension.
Extended Help
Inputs
x
: Data satisfyingndims(x) == N + 2
, i.e.size(x) = (I_N, ..., I_1, C, N)
Returns
- Output of the pooling
y
of size(O_N, ..., O_1, C, N)
where
- Empty
NamedTuple()
Recurrent Layers
GRUCell((in_dims, out_dims)::Pair{<:Int,<:Int}; use_bias=true, train_state::Bool=false,
init_weight=nothing, init_bias=nothing, init_state=zeros32)
Gated Recurrent Unit (GRU) Cell
Arguments
in_dims
: Input Dimensionout_dims
: Output (Hidden State) Dimensionuse_bias
: Set to false to deactivate biastrain_state
: Trainable initial hidden state can be activated by setting this totrue
init_bias
: Initializer for bias. Must be a tuple containing 3 functions. If a single value is passed, it is copied into a 3 element tuple. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(out_dims))
.init_weight
: Initializer for weight. Must be a tuple containing 3 functions. If a single value is passed, it is copied into a 3 element tuple. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(out_dims))
.init_state
: Initializer for hidden state
Inputs
Case 1a: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set tofalse
- Creates a hidden state usinginit_state
and proceeds to Case 2.Case 1b: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set totrue
- Repeatshidden_state
from parameters to match the shape ofx
and proceeds to Case 2.Case 2: Tuple
(x, (h, ))
is provided, then the output and a tuple containing the updated hidden state is returned.
Returns
Tuple containing
Output
of shape (out_dims, batch_size)
Tuple containing new hidden state
Updated model state
Parameters
weight_ih
: Concatenated Weights to map from input space. weight_hh
: Concatenated Weights to map from hidden space. bias_ih
: Concatenated Bias vector for the input space(not present if use_bias=false
).bias_hh
: Concatenated Bias vector for the hidden space(not present if use_bias=false
).hidden_state
: Initial hidden state vector (not present iftrain_state=false
).
States
rng
: Controls the randomness (if any) in the initial state generation
LSTMCell(in_dims => out_dims; use_bias::Bool=true, train_state::Bool=false,
train_memory::Bool=false, init_weight=nothing, init_bias=nothing,
init_state=zeros32, init_memory=zeros32)
Long Short-Term (LSTM) Cell
Arguments
in_dims
: Input Dimensionout_dims
: Output (Hidden State & Memory) Dimensionuse_bias
: Set to false to deactivate biastrain_state
: Trainable initial hidden state can be activated by setting this totrue
train_memory
: Trainable initial memory can be activated by setting this totrue
init_bias
: Initializer for bias. Must be a tuple containing 4 functions. If a single value is passed, it is copied into a 4 element tuple. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(out_dims))
.init_weight
: Initializer for weight. Must be a tuple containing 4 functions. If a single value is passed, it is copied into a 4 element tuple. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(out_dims))
.init_state
: Initializer for hidden stateinit_memory
: Initializer for memory
Inputs
Case 1a: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set tofalse
,train_memory
is set tofalse
- Creates a hidden state usinginit_state
, hidden memory usinginit_memory
and proceeds to Case 2.Case 1b: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set totrue
,train_memory
is set tofalse
- Repeatshidden_state
vector from the parameters to match the shape ofx
, creates hidden memory usinginit_memory
and proceeds to Case 2.Case 1c: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set tofalse
,train_memory
is set totrue
- Creates a hidden state usinginit_state
, repeats the memory vector from parameters to match the shape ofx
and proceeds to Case 2.Case 1d: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set totrue
,train_memory
is set totrue
- Repeats the hidden state and memory vectors from the parameters to match the shape ofx
and proceeds to Case 2.Case 2: Tuple
(x, (h, c))
is provided, then the output and a tuple containing the updated hidden state and memory is returned.
Returns
Tuple Containing
Output
of shape (out_dims, batch_size)
Tuple containing new hidden state
and new memory
Updated model state
Parameters
weight_ih
: Concatenated Weights to map from input space. weight_hh
: Concatenated Weights to map from hidden spacebias_ih
: Bias vector for the input-hidden connection (not present ifuse_bias=false
)bias_hh
: Concatenated Bias vector for the hidden-hidden connection (not present ifuse_bias=false
)hidden_state
: Initial hidden state vector (not present iftrain_state=false
)memory
: Initial memory vector (not present iftrain_memory=false
)
States
rng
: Controls the randomness (if any) in the initial state generation
RNNCell(in_dims => out_dims, activation=tanh; use_bias=True(), train_state=False(),
init_bias=nothing, init_weight=nothing, init_state=zeros32)
An Elman RNNCell cell with activation
(typically set to tanh
or relu
).
Arguments
in_dims
: Input Dimensionout_dims
: Output (Hidden State) Dimensionactivation
: Activation functionuse_bias
: Set to false to deactivate biastrain_state
: Trainable initial hidden state can be activated by setting this totrue
init_bias
: Initializer for bias. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(out_dims))
.init_weight
: Initializer for weight. Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(out_dims))
.init_state
: Initializer for hidden state
Inputs
Case 1a: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set tofalse
- Creates a hidden state usinginit_state
and proceeds to Case 2.Case 1b: Only a single input
x
of shape(in_dims, batch_size)
,train_state
is set totrue
- Repeatshidden_state
from parameters to match the shape ofx
and proceeds to Case 2.Case 2: Tuple
(x, (h, ))
is provided, then the output and a tuple containing the updated hidden state is returned.
Returns
Tuple containing
Output
of shape (out_dims, batch_size)
Tuple containing new hidden state
Updated model state
Parameters
weight_ih
: Maps the input to the hidden state.weight_hh
: Maps the hidden state to the hidden state.bias_ih
: Bias vector for the input-hidden connection (not present ifuse_bias=false
)bias_hh
: Bias vector for the hidden-hidden connection (not present ifuse_bias=false
)hidden_state
: Initial hidden state vector (not present iftrain_state=false
)
States
rng
: Controls the randomness (if any) in the initial state generation
Recurrence(cell;
ordering::AbstractTimeSeriesDataBatchOrdering=BatchLastIndex(),
return_sequence::Bool=false)
Wraps a recurrent cell (like RNNCell
, LSTMCell
, GRUCell
) to automatically operate over a sequence of inputs.
Relation to Flux.Recur
This is completely distinct from Flux.Recur
. It doesn't make the cell
stateful, rather allows operating on an entire sequence of inputs at once. See StatefulRecurrentCell
for functionality similar to Flux.Recur
.
Arguments
cell
: A recurrent cell. SeeRNNCell
,LSTMCell
,GRUCell
, for how the inputs/outputs of a recurrent cell must be structured.
Keyword Arguments
return_sequence
: Iftrue
returns the entire sequence of outputs, else returns only the last output. Defaults tofalse
.ordering
: The ordering of the batch and time dimensions in the input. Defaults toBatchLastIndex()
. Alternatively can be set toTimeLastIndex()
.
Extended Help
Inputs
- If
x
is aTuple or Vector: Each element is fed to the
cell
sequentially.Array (except a Vector): It is spliced along the penultimate dimension and each slice is fed to the
cell
sequentially.
Returns
Output of the
cell
for the entire sequence.Update state of the
cell
.
Tip
Frameworks like Tensorflow have special implementation of MultiRNNCell
to handle sequentially composed RNN Cells. In Lux, one can simple stack multiple Recurrence
blocks in a Chain
to achieve the same.
Chain(
Recurrence(RNNCell(inputsize => latentsize); return_sequence=true),
Recurrence(RNNCell(latentsize => latentsize); return_sequence=true),
:
x -> stack(x; dims=2)
)
For some discussion on this topic, see https://github.com/LuxDL/Lux.jl/issues/472.
StatefulRecurrentCell(cell)
Wraps a recurrent cell (like RNNCell
, LSTMCell
, GRUCell
) and makes it stateful.
To avoid undefined behavior, once the processing of a single sequence of data is complete, update the state with Lux.update_state(st, :carry, nothing)
.
Arguments
cell
: A recurrent cell. SeeRNNCell
,LSTMCell
,GRUCell
, for how the inputs/outputs of a recurrent cell must be structured.
Inputs
- Input to the
cell
.
Returns
Output of the
cell
for the entire sequence.Update state of the
cell
and updatedcarry
.
States
- NamedTuple containing:
cell
: Same ascell
.carry
: The carry state of thecell
.
BidirectionalRNN(cell::AbstractRecurrentCell,
backward_cell::Union{AbstractRecurrentCell, Nothing}=nothing;
merge_mode::Union{Function, Nothing}=vcat,
ordering::AbstractTimeSeriesDataBatchOrdering=BatchLastIndex())
Bidirectional RNN wrapper.
Arguments
cell
: A recurrent cell. SeeRNNCell
,LSTMCell
,GRUCell
, for how the inputs/outputs of a recurrent cell must be structured.backward_cell
: A optional backward recurrent cell. Ifbackward_cell
isnothing
, the rnn layer instance passed as thecell
argument will be used to generate the backward layer automatically.in_dims
ofbackward_cell
should be consistent within_dims
ofcell
Keyword Arguments
merge_mode
: Function by which outputs of the forward and backward RNNs will be combined. default value isvcat
. Ifnothing
, the outputs will not be combined.ordering
: The ordering of the batch and time dimensions in the input. Defaults toBatchLastIndex()
. Alternatively can be set toTimeLastIndex()
.
Extended Help
Inputs
- If
x
is aTuple or Vector: Each element is fed to the
cell
sequentially.Array (except a Vector): It is spliced along the penultimate dimension and each slice is fed to the
cell
sequentially.
Returns
Merged output of the
cell
andbackward_cell
for the entire sequence.Update state of the
cell
andbackward_cell
.
Parameters
NamedTuple
withcell
andbackward_cell
.
States
- Same as
cell
andbackward_cell
.
Linear Layers
Bilinear((in1_dims, in2_dims) => out, activation=identity; init_weight=nothing,
init_bias=nothing, use_bias=True())
Bilinear(in12_dims => out, activation=identity; init_weight=nothing,
init_bias=nothing, use_bias=True())
Create a fully connected layer between two inputs and an output, and otherwise similar to Dense
. Its output, given vectors x
& y
, is another vector z
with, for all i in 1:out
:
z[i] = activation(x' * W[i, :, :] * y + bias[i])
If x
and y
are matrices, then each column of the output z = B(x, y)
is of this form, with B
the Bilinear layer.
Arguments
in1_dims
: number of input dimensions ofx
in2_dims
: number of input dimensions ofy
in12_dims
: If specified, thenin1_dims = in2_dims = in12_dims
out
: number of output dimensionsactivation
: activation function
Keyword Arguments
init_weight
: initializer for the weight matrix (weight = init_weight(rng, out_dims, in1_dims, in2_dims)
). Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(in1_dims))
.init_bias
: initializer for the bias vector (ignored ifuse_bias=false
). Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(in1_dims))
.use_bias
: Trainable bias can be disabled entirely by setting this tofalse
Input
A 2-Tuple containing
x
must be an AbstractArray withsize(x, 1) == in1_dims
y
must be an AbstractArray withsize(y, 1) == in2_dims
If the input is an AbstractArray, then
x = y
Returns
AbstractArray with dimensions
(out_dims, size(x, 2))
Empty
NamedTuple()
Parameters
weight
: Weight Matrix of size(out_dims, in1_dims, in2_dims)
bias
: Bias of size(out_dims, 1)
(present ifuse_bias=true
)
Dense(in_dims => out_dims, activation=identity; init_weight=nothing,
init_bias=nothing, use_bias=True())
Create a traditional fully connected layer, whose forward pass is given by: y = activation.(weight * x .+ bias)
Arguments
in_dims
: number of input dimensionsout_dims
: number of output dimensionsactivation
: activation function
Keyword Arguments
init_weight
: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims)
). Ifnothing
, then we usekaiming_uniform
with gain computed on the basis of the activation function (taken from Pytorchnn.init.calculate_gain
).init_bias
: initializer for the bias vector (ignored ifuse_bias=false
). Ifnothing
, then we use uniform distribution with bounds-bound
andbound
wherebound = inv(sqrt(in_dims))
.use_bias
: Trainable bias can be disabled entirely by setting this tofalse
Input
x
must be an AbstractArray withsize(x, 1) == in_dims
Returns
AbstractArray with dimensions
(out_dims, ...)
where...
are the dimensions ofx
Empty
NamedTuple()
Parameters
weight
: Weight Matrix of size(out_dims, in_dims)
bias
: Bias of size(out_dims, 1)
(present ifuse_bias=true
)
Embedding(in_dims => out_dims; init_weight=rand32)
A lookup table that stores embeddings of dimension out_dims
for a vocabulary of size in_dims
. When the vocabulary is multi-dimensional, the input is expected to be a tuple of Cartesian indices.
This layer is often used to store word embeddings and retrieve them using indices.
Arguments
in_dims
: number(s) of input dimensionsout_dims
: number of output dimensions
Keyword Arguments
init_weight
: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims...)
)
Input
Integer OR
Abstract Vector of Integers OR
Abstract Array of Integers OR
Tuple of Integers OR
Tuple of Abstract Vectors of Integers OR
Tuple of Abstract Arrays of Integers
Returns
Returns the embedding corresponding to each index in the input. For an N dimensional input, an N + 1 dimensional output is returned.
Empty
NamedTuple()
Scale(dims, activation=identity; init_weight=ones32, init_bias=zeros32, use_bias=True())
Create a Sparsely Connected Layer with a very specific structure (only Diagonal Elements are non-zero). The forward pass is given by: y = activation.(weight .* x .+ bias)
Arguments
dims
: size of the learnable scale and bias parameters.activation
: activation function
Keyword Arguments
init_weight
: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims)
)init_bias
: initializer for the bias vector (ignored ifuse_bias=false
)use_bias
: Trainable bias can be disabled entirely by setting this tofalse
Input
x
must be an Array of size(dims..., B)
or(dims...[0], ..., dims[k])
fork ≤ size(dims)
Returns
Array of size
(dims..., B)
or(dims...[0], ..., dims[k])
fork ≤ size(dims)
Empty
NamedTuple()
Parameters
weight
: Weight Array of size(dims...)
bias
: Bias of size(dims...)
Misc. Helper Layers
FlattenLayer(; N = nothing)
Flattens the passed array into a matrix.
Keyword Arguments
N
: Flatten the firstN
dimensions of the input array. Ifnothing
, then all dimensions (except the last) are flattened. Note that the batch dimension is never flattened.
Inputs
x
: AbstractArray
Returns
AbstractMatrix of size
(:, size(x, ndims(x)))
ifN
isnothing
else the firstN
dimensions of the input array are flattened.Empty
NamedTuple()
Example
julia> model = FlattenLayer()
FlattenLayer{Nothing}(nothing)
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = randn(rng, Float32, (2, 2, 2, 2));
julia> y, st_new = model(x, ps, st);
size(y)
(8, 2)
Maxout(layers...)
Maxout(; layers...)
Maxout(f::Function, n_alts::Int)
This contains a number of internal layers, each of which receives the same input. Its output is the elementwise maximum of the the internal layers' outputs.
Maxout over linear dense layers satisfies the universal approximation theorem. See [1].
See also Parallel
to reduce with other operators.
Arguments
- Layers can be specified in three formats:
A list of
N
Lux layersSpecified as
N
keyword arguments.A no argument function
f
and an integern_alts
which specifies the number of layers.
Extended Help
Inputs
x
: Input that is passed to each of the layers
Returns
Output is computed by taking elementwise
max
of the outputs of the individual layers.Updated state of the
layers
Parameters
- Parameters of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
States
- States of each
layer
wrapped in a NamedTuple withfields = layer_1, layer_2, ..., layer_N
(naming changes if using the kwargs API)
References
[1] Goodfellow, Warde-Farley, Mirza, Courville & Bengio "Maxout Networks" https://arxiv.org/abs/1302.4389
NoOpLayer()
As the name suggests does nothing but allows pretty printing of layers. Whatever input is passed is returned.
Example
julia> model = NoOpLayer()
NoOpLayer()
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = 1
1
julia> y, st_new = model(x, ps, st)
(1, NamedTuple())
ReshapeLayer(dims)
Reshapes the passed array to have a size of (dims..., :)
Arguments
dims
: The new dimensions of the array (excluding the last dimension).
Inputs
x
: AbstractArray of any shape which can be reshaped in(dims..., size(x, ndims(x)))
Returns
AbstractArray of size
(dims..., size(x, ndims(x)))
Empty
NamedTuple()
Example
julia> model = ReshapeLayer((2, 2))
ReshapeLayer(output_dims = (2, 2, :))
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = randn(rng, Float32, (4, 1, 3));
julia> y, st_new = model(x, ps, st);
size(y)
(2, 2, 3)
SelectDim(dim, i)
Return a view of all the data of the input x
where the index for dimension dim
equals i
. Equivalent to view(x,:,:,...,i,:,:,...)
where i
is in position d
.
Arguments
dim
: Dimension for indexingi
: Index for dimensiondim
Inputs
x
: AbstractArray that can be indexed withview(x,:,:,...,i,:,:,...)
Returns
view(x,:,:,...,i,:,:,...)
wherei
is in positiond
Empty
NamedTuple()
WrappedFunction(f)
Wraps a stateless and parameter less function. Might be used when a function is added to Chain
. For example, Chain(x -> relu.(x))
would not work and the right thing to do would be Chain((x, ps, st) -> (relu.(x), st))
. An easier thing to do would be Chain(WrappedFunction(Base.Fix1(broadcast, relu)))
Arguments
f
: Some function.
Inputs
x
: s.thasmethod(f, (typeof(x),))
istrue
if :direct_call elsehasmethod(f, (typeof(x), NamedTuple, NamedTuple))
istrue
Returns
Output of
f(x)
Empty
NamedTuple()
ReverseSequence(dim = nothing)
Reverse the specified dimension dims
of the passed array
Arguments
dim
: Dimension that need to be reversed. Ifnothing
, for AbstractVector{T} it reverses itself (dimension 1), for other arrays, reverse the dimensionndims(x) - 1
.
Inputs
x
: AbstractArray.
Returns
AbstractArray with the same dimensions as the input
Empty
NamedTuple()
Example
julia> model = ReverseSequence()
ReverseSequence{Nothing}(nothing)
julia> rng = Random.default_rng();
Random.seed!(rng, 0);
ps, st = Lux.setup(rng, model);
x = [1.0, 2.0, 3.0];
julia> y, st_new = model(x, ps, st)
([3.0, 2.0, 1.0], NamedTuple())
Normalization Layers
BatchNorm(chs::Integer, activation=identity; init_bias=zeros32, init_scale=ones32,
affine=True(), track_stats=True(), epsilon=1f-5, momentum=0.1f0)
Batch Normalization layer.
BatchNorm
computes the mean and variance for each
Arguments
chs
: Size of the channel dimension in your data. Given an array withN
dimensions, call theN-1
th the channel dimension. For a batch of feature vectors this is just the data dimension, forWHCN
images it's the usual channel dimension.activation
: After normalization, elementwise activationactivation
is applied.
Keyword Arguments
If
track_stats=true
, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase.epsilon
: a value added to the denominator for numerical stabilitymomentum
: the value used for therunning_mean
andrunning_var
computationIf
affine=true
, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.init_bias
: Controls how thebias
is initializedinit_scale
: Controls how thescale
is initialized
Extended Help
Inputs
x
: Array wheresize(x, N - 1) = chs
Returns
y
: Normalized ArrayUpdate model state
Parameters
affine=true
bias
: Bias of shape(chs,)
scale
: Scale of shape(chs,)
affine=false
- EmptyNamedTuple()
States
Statistics if
track_stats=true
running_mean
: Running mean of shape(chs,)
running_var
: Running variance of shape(chs,)
Statistics if
track_stats=false
running_mean
: nothingrunning_var
: nothing
training
: Used to check if training/inference mode
Use Lux.testmode
during inference.
Example
julia> Chain(Dense(784 => 64), BatchNorm(64, relu), Dense(64 => 10), BatchNorm(10))
Chain(
layer_1 = Dense(784 => 64), # 50_240 parameters
layer_2 = BatchNorm(64, relu, affine=true, track_stats=true), # 128 parameters, plus 129
layer_3 = Dense(64 => 10), # 650 parameters
layer_4 = BatchNorm(10, affine=true, track_stats=true), # 20 parameters, plus 21
) # Total: 51_038 parameters,
# plus 150 states.
Warning
Passing a batch size of 1, during training will result in an error.
See also BatchNorm
, InstanceNorm
, LayerNorm
, WeightNorm
GroupNorm(chs::Integer, groups::Integer, activation=identity; init_bias=zeros32,
init_scale=ones32, affine=true, epsilon=1f-5)
Group Normalization layer.
Arguments
chs
: Size of the channel dimension in your data. Given an array withN
dimensions, call theN-1
th the channel dimension. For a batch of feature vectors this is just the data dimension, forWHCN
images it's the usual channel dimension.groups
is the number of groups along which the statistics are computed. The number of channels must be an integer multiple of the number of groups.activation
: After normalization, elementwise activationactivation
is applied.
Keyword Arguments
epsilon
: a value added to the denominator for numerical stabilityIf
affine=true
, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.init_bias
: Controls how thebias
is initializedinit_scale
: Controls how thescale
is initialized
Extended Help
Inputs
x
: Array wheresize(x, N - 1) = chs
andndims(x) > 2
Returns
y
: Normalized ArrayUpdate model state
Parameters
affine=true
bias
: Bias of shape(chs,)
scale
: Scale of shape(chs,)
affine=false
- EmptyNamedTuple()
States
training
: Used to check if training/inference mode
Use Lux.testmode
during inference.
Example
julia> Chain(Dense(784 => 64), GroupNorm(64, 4, relu), Dense(64 => 10), GroupNorm(10, 5))
Chain(
layer_1 = Dense(784 => 64), # 50_240 parameters
layer_2 = GroupNorm(64, 4, relu, affine=true), # 128 parameters
layer_3 = Dense(64 => 10), # 650 parameters
layer_4 = GroupNorm(10, 5, affine=true), # 20 parameters
) # Total: 51_038 parameters,
# plus 0 states.
See also GroupNorm
, InstanceNorm
, LayerNorm
, WeightNorm
InstanceNorm(chs::Integer, activation=identity; init_bias=zeros32, init_scale=ones32,
affine=False(), track_stats=False(), epsilon=1f-5, momentum=0.1f0)
Instance Normalization. For details see [1].
Instance Normalization computes the mean and variance for each
Arguments
chs
: Size of the channel dimension in your data. Given an array withN
dimensions, call theN-1
th the channel dimension. For a batch of feature vectors this is just the data dimension, forWHCN
images it's the usual channel dimension.activation
: After normalization, elementwise activationactivation
is applied.
Keyword Arguments
If
track_stats=true
, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase.epsilon
: a value added to the denominator for numerical stabilitymomentum
: the value used for therunning_mean
andrunning_var
computationIf
affine=true
, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.init_bias
: Controls how thebias
is initializedinit_scale
: Controls how thescale
is initialized
Extended Help
Inputs
x
: Array wheresize(x, N - 1) = chs
andndims(x) > 2
Returns
y
: Normalized ArrayUpdate model state
Parameters
affine=true
bias
: Bias of shape(chs,)
scale
: Scale of shape(chs,)
affine=false
- EmptyNamedTuple()
States
Statistics if
track_stats=true
running_mean
: Running mean of shape(chs,)
running_var
: Running variance of shape(chs,)
Statistics if
track_stats=false
running_mean
: nothingrunning_var
: nothing
training
: Used to check if training/inference mode
Use Lux.testmode
during inference.
Example
julia> Chain(Dense(784 => 64), InstanceNorm(64, relu; affine=true), Dense(64 => 10),
InstanceNorm(10, relu; affine=true))
Chain(
layer_1 = Dense(784 => 64), # 50_240 parameters
layer_2 = InstanceNorm(64, relu, affine=true, track_stats=false), # 128 parameters, plus 1
layer_3 = Dense(64 => 10), # 650 parameters
layer_4 = InstanceNorm(10, relu, affine=true, track_stats=false), # 20 parameters, plus 1
) # Total: 51_038 parameters,
# plus 2 states.
References
[1] Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).
See also BatchNorm
, GroupNorm
, LayerNorm
, WeightNorm
LayerNorm(shape::NTuple{N, Int}, activation=identity; epsilon=1f-5, dims=Colon(),
affine=true, init_bias=zeros32, init_scale=ones32)
Computes mean and standard deviation over the whole input array, and uses these to normalize the whole array. Optionally applies an elementwise affine transformation afterwards.
Given an input array
where affine=true
.
Inconsistent Defaults till v0.5.0
As of v0.5.0, the doc used to say affine::Bool=false
, but the code actually had affine::Bool=true
as the default. Now the doc reflects the code, so please check whether your assumptions about the default (if made) were invalid.
Arguments
shape
: Broadcastable shape of input array excluding the batch dimension.activation
: After normalization, elementwise activationactivation
is applied.
Keyword Arguments
epsilon
: a value added to the denominator for numerical stability.dims
: Dimensions to normalize the array over.If
affine=true
, it also applies a shift and a rescale to the input through to learnable per-element bias and scale parameters.init_bias
: Controls how thebias
is initializedinit_scale
: Controls how thescale
is initialized
Extended Help
Inputs
x
: AbstractArray
Returns
y
: Normalized ArrayEmpty NamedTuple()
Parameters
affine=false
: EmptyNamedTuple()
affine=true
bias
: Bias of shape(shape..., 1)
scale
: Scale of shape(shape..., 1)
WeightNorm(layer::AbstractLuxLayer, which_params::NTuple{N, Symbol},
dims::Union{Tuple, Nothing}=nothing)
Applies weight normalization to a parameter in the given layer.
Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This updates the parameters in which_params
(e.g. weight
) using two parameters: one specifying the magnitude (e.g. weight_g
) and one specifying the direction (e.g. weight_v
).
Arguments
layer
whose parameters are being reparameterizedwhich_params
: parameter names for the parameters being reparameterizedBy default, a norm over the entire array is computed. Pass
dims
to modify the dimension.
Inputs
x
: Should be of valid type for input tolayer
Returns
Output from
layer
Updated model state of
layer
Parameters
normalized
: Parameters oflayer
that are being normalizedunnormalized
: Parameters oflayer
that are not being normalized
States
- Same as that of
layer
Upsampling
PixelShuffle(r::Int)
Pixel shuffling layer with upscale factor r
. Usually used for generating higher resolution images while upscaling them.
See NNlib.pixel_shuffle
for more details.
PixelShuffle is not a Layer, rather it returns a WrappedFunction
with the function set to Base.Fix2(pixel_shuffle, r)
Arguments
r
: Upscale factor
Inputs
x
: For 4D-arrays representing N images, the operation converts inputsize(x) == (W, H, r² x C, N)
to output of size(r x W, r x H, C, N)
. For D-dimensional data, it expectsndims(x) == D + 2
with channel and batch dimensions, and divides the number of channels byrᴰ
.
Returns
- Output of size
(r x W, r x H, C, N)
for 4D-arrays, and(r x W, r x H, ..., C, N)
for D-dimensional data, whereD = ndims(x) - 2
Upsample(mode = :nearest; [scale, size, align_corners=false])
Upsample(scale, mode = :nearest)
Upsampling Layer.
Layer Construction
Option 1
mode
: Set to:nearest
,:linear
,:bilinear
or:trilinear
Exactly one of two keywords must be specified:
If
scale
is a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.Alternatively, keyword
size
accepts a tuple, to directly specify the leading dimensions of the output.
Option 2
If
scale
is a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.mode
: Set to:nearest
,:bilinear
or:trilinear
Currently supported upsampling mode
s and corresponding NNlib's methods are:
:nearest
->NNlib.upsample_nearest
:bilinear
->NNlib.upsample_bilinear
:trilinear
->NNlib.upsample_trilinear
Extended Help
Other Keyword Arguments
align_corners
: Iftrue
, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is one of:bilinear
or:trilinear
.
Inputs
x
: For the input dimensions look into the documentation for the correspondingNNlib
functionAs a rule of thumb,
:nearest
should work with arrays of arbitrary dimensions:bilinear
works with 4D Arrays:trilinear
works with 5D Arrays
Returns
Upsampled Input of size
size
or of size(I_1 x scale[1], ..., I_N x scale[N], C, N)
Empty
NamedTuple()