CMC Microsystems Atlas 800 AI Competition: Recommended Models for SOTA Model Migration

MindSpore

MindSpore is an open-source AI framework that is most compatible with the Altas 800 server, that could be used for mobile, edge and cloud scenarios.

You are recommended to use MindSpore to implement One of the following deep learning models, and train it on the Altas 800 server with NPU, to reach the target baseline accuracy.

Model	Paper	Code	Dataset	Accuracy Baseline
CoAtNet	CoAtNet: Marrying Convolution and Attention for All Data Sizes	https://paperswithcode.com/paper/coatnet-marrying-convolution-and-attention	ImageNet	Top1 Acc：85%
Swin Transformer V2	Scaling Up Capacity and Resolution	https://paperswithcode.com/paper/swin-transformer-v2-scaling-up-capacity-and	ImageNet	Top1 Acc：85%
Focal Transformer	https://arxiv.org/pdf/2107.00641.pdf	‣	ImageNet-1k	Top1 Acc：83.6%
conformer	https://arxiv.org/pdf/2105.03889.pdf	‣	ImageNet-1k	Top1 Acc：81.31%
Twins	https://arxiv.org/pdf/2104.13840.pdf	‣	ImageNet-1k	Top1 Acc：81.2%
VAN	https://arxiv.org/pdf/2202.09741.pdf	‣	ImageNet-1k	Top1 Acc：82.8 %
convmixer	https://openreview.net/forum?id=TVHS5Y4dNvM	‣	ImageNet-1k	Top1 Acc：81.37 %
BEiT	https://arxiv.org/abs/2106.08254	https://github.com/microsoft/unilm/tree/master/beit	ImageNet-1k	Top1 Acc： 85.2 %

PyTorch

PyTorch is a popular AI framework. The original pytorch is only compatible with CPU/GPU. On the Altas 800 server, the PyTorch is a modified version that is compatible with NPU, with minor changes on some APIs (refer to Ascend PyTorch ).

You are recommended to migrate One of the following deep learning models to Ascend PyTorch, and train it on the Altas 800 server with NPU, to reach the target baseline accuracy.

Model	Paper	Code	Dataset	Accuracy Baseline
AWSRN	Lightweight Image Super-Resolution with Adaptive Weighted Learning Network	https://github.com/ChaofWang/AWSRN	DIV2K Set5, Set14, B100, Urban100, Manga109	refer to the original paper’s result
GFN	Gated Fusion Network for Joint Image Deblurring and Super-Resolution	https://github.com/jacquelinelala/GFN	GOPRO_Large	refer to the original paper’s result
ESPCN	Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network	https://github.com/Lornatang/ESPCN-PyTorch

https://github.com/leftthomas/espcn | DIV2K, DIV8K, Flickr2K, OST, T91, Set5, Set14, BSDS100 and BSDS200 | refer to the original paper’s result | | GridDehazeNet | GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing | https://github.com/proteus1991/GridDehazeNet | RESIDE | refer to the original paper’s result | | SRGAN | Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network | https://github.com/Lornatang/SRGAN-PyTorch | DIV2K, DIV8K, Flickr2K, OST, T91, Set5, Set14, BSDS100 and BSDS200 | refer to the original paper’s result | | multilogue-net | Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation | https://github.com/amanshenoy/multilogue-net | CMU-MOSEI | refer to the original paper’s result | | infomax | Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis | https://github.com/declare-lab/multimodal-infomax | CMU-MOSI and CMU-MOSEI | CMU-MOSI and CMU-MOSEI |