J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo and Y. Liu. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024).
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville and Y. Bengio. Maxout networks. In: International conference on machine learning (PMLR, 2013); pp. 1319–1327.
D. Ulyanov, A. Vedaldi and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization, arXiv preprint arXiv:1607.08022 (2016).
T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár. Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (2017); pp. 2980–2988.
F. Milletari, N. Navab and S.-A. Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV) (Ieee, 2016); pp. 565–571.
R. Hadsell, S. Chopra and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), Vol. 2 (IEEE, 2006); pp. 1735–1742.
G. Klambauer, T. Unterthiner, A. Mayr and S. Hochreiter. Self-normalizing neural networks. Advances in neural information processing systems 30 (2017).
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1929–1958 (2014).
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning (pmlr, 2015); pp. 448–456.
Y. Wu and K. He. Group normalization. In: Proceedings of the European conference on computer vision (ECCV) (2018); pp. 3–19.
J. L. Ba, J. R. Kiros and G. E. Hinton. Layer normalization, arXiv preprint arXiv:1607.06450 (2016).