Skip to content

Boltz

Accelerate ⚡ your ML research using pre-built Deep Learning Models with Lux.

Index

Computer Vision Models

Classification Models: Native Lux Models

MODEL NAMEFUNCTIONNAMEPRETRAINEDTOP 1 ACCURACY (%)TOP 5 ACCURACY (%)
VGGvgg:vgg1167.3587.91
VGGvgg:vgg1368.4088.48
VGGvgg:vgg1670.2489.80
VGGvgg:vgg1971.0990.27
VGGvgg:vgg11_bn69.0988.94
VGGvgg:vgg13_bn69.6689.49
VGGvgg:vgg16_bn72.1191.02
VGGvgg:vgg19_bn72.9591.32
Vision Transformervision_transformer:tiny🚫
Vision Transformervision_transformer:small🚫
Vision Transformervision_transformer:base🚫
Vision Transformervision_transformer:large🚫
Vision Transformervision_transformer:huge🚫
Vision Transformervision_transformer:giant🚫
Vision Transformervision_transformer:gigantic🚫

Building Blocks

# Boltz.ClassTokensType.
julia
ClassTokens(dim; init=Lux.zeros32)

Appends class tokens to an input with embedding dimension dim for use in many vision transformer namels.

source


# Boltz.MultiHeadAttentionType.
julia
MultiHeadAttention(in_planes::Int, number_heads::Int; qkv_bias::Bool=false,
                   attention_dropout_rate::T=0.0f0,
                   projection_dropout_rate::T=0.0f0) where {T}

Multi-head self-attention layer

source


# Boltz.ViPosEmbeddingType.
julia
ViPosEmbedding(embedsize, npatches;
               init = (rng, dims...) -> randn(rng, Float32, dims...))

Positional embedding layer used by many vision transformer-like namels.

source


# Boltz.transformer_encoderFunction.
julia
transformer_encoder(in_planes, depth, number_heads; mlp_ratio = 4.0f0, dropout = 0.0f0)

Transformer as used in the base ViT architecture. (reference).

Arguments

  • in_planes: number of input channels

  • depth: number of attention blocks

  • number_heads: number of attention heads

  • mlp_ratio: ratio of MLP layers to the number of input channels

  • dropout_rate: dropout rate

source


# Boltz.vggFunction.
julia
vgg(imsize; config, inchannels, batchnorm = false, nclasses, fcsize, dropout)

Create a VGG model (reference).

Arguments

  • imsize: input image width and height as a tuple

  • config: the configuration for the convolution layers

  • inchannels: number of input channels

  • batchnorm: set to true to use batch normalization after each convolution

  • nclasses: number of output classes

  • fcsize: intermediate fully connected layer size (see Metalhead._vgg_classifier_layers)

  • dropout: dropout level between fully connected layers

source


Non-Public API

# Boltz._seconddimmeanFunction.
julia
_seconddimmean(x)

Computes the mean of x along dimension 2

source


# Boltz._fast_chunkFunction.
julia
_fast_chunk(x::AbstractArray, ::Val{n}, ::Val{dim})

Type-stable and faster version of MLUtils.chunk

source


# Boltz._flatten_spatialFunction.
julia
_flatten_spatial(x::AbstractArray{T, 4})

Flattens the first 2 dimensions of x, and permutes the remaining dimensions to (2, 1, 3)

source


# Boltz._vgg_blockFunction.
julia
_vgg_block(input_filters, output_filters, depth, batchnorm)

A VGG block of convolution layers (reference).

Arguments

  • input_filters: number of input feature maps

  • output_filters: number of output feature maps

  • depth: number of convolution/convolution + batch norm layers

  • batchnorm: set to true to include batch normalization after each convolution

source


# Boltz._vgg_classifier_layersFunction.
julia
_vgg_classifier_layers(imsize, nclasses, fcsize, dropout)

Create VGG classifier (fully connected) layers (reference).

Arguments

  • imsize: tuple (width, height, channels) indicating the size after the convolution layers (see Metalhead._vgg_convolutional_layers)

  • nclasses: number of output classes

  • fcsize: input and output size of the intermediate fully connected layer

  • dropout: the dropout level between each fully connected layer

source


# Boltz._vgg_convolutional_layersFunction.
julia
_vgg_convolutional_layers(config, batchnorm, inchannels)

Create VGG convolution layers (reference).

Arguments

  • config: vector of tuples (output_channels, num_convolutions) for each block (see Metalhead._vgg_block)

  • batchnorm: set to true to include batch normalization after each convolution

  • inchannels: number of input channels

source


Classification Models: Imported from Metalhead.jl

Tip

You need to load Flux and Metalhead before using these models.

MODEL NAMEFUNCTIONNAMEPRETRAINEDTOP 1 ACCURACY (%)TOP 5 ACCURACY (%)
AlexNetalexnet:alexnet54.4877.72
ResNetresnet:resnet18🚫68.0888.44
ResNetresnet:resnet34🚫72.1390.91
ResNetresnet:resnet50🚫74.5592.36
ResNetresnet:resnet101🚫74.8192.36
ResNetresnet:resnet152🚫77.6393.84
ConvMixerconvmixer:small🚫
ConvMixerconvmixer:base🚫
ConvMixerconvmixer:large🚫
DenseNetdensenet:densenet121🚫
DenseNetdensenet:densenet161🚫
DenseNetdensenet:densenet169🚫
DenseNetdensenet:densenet201🚫
GoogleNetgooglenet:googlenet🚫
MobileNetmobilenet:mobilenet_v1🚫
MobileNetmobilenet:mobilenet_v2🚫
MobileNetmobilenet:mobilenet_v3_small🚫
MobileNetmobilenet:mobilenet_v3_large🚫
ResNeXTresnext:resnext50🚫
ResNeXTresnext:resnext101🚫
ResNeXTresnext:resnext152🚫

These models can be created using <FUNCTION>(<NAME>; pretrained = <PRETRAINED>)

Preprocessing

All the pretrained models require that the images be normalized with the parameters mean = [0.485f0, 0.456f0, 0.406f0] and std = [0.229f0, 0.224f0, 0.225f0].