MindSpore

MindSpore is an open-source AI framework that is most compatible with the Altas 800 server, that could be used for mobile, edge and cloud scenarios.

You are recommended to use MindSpore to implement One of the following deep learning models, and train it on the Altas 800 server with NPU, to reach the target baseline accuracy.

Model Paper Code Dataset Accuracy Baseline
CoAtNet CoAtNet: Marrying Convolution and Attention for All Data Sizes https://paperswithcode.com/paper/coatnet-marrying-convolution-and-attention ImageNet Top1 Acc:85%
Swin Transformer V2 Scaling Up Capacity and Resolution https://paperswithcode.com/paper/swin-transformer-v2-scaling-up-capacity-and ImageNet Top1 Acc:85%
Focal Transformer https://arxiv.org/pdf/2107.00641.pdf ImageNet-1k Top1 Acc:83.6%
conformer https://arxiv.org/pdf/2105.03889.pdf ImageNet-1k Top1 Acc:81.31%
Twins https://arxiv.org/pdf/2104.13840.pdf ImageNet-1k Top1 Acc:81.2%
VAN https://arxiv.org/pdf/2202.09741.pdf ImageNet-1k Top1 Acc:82.8 %
convmixer https://openreview.net/forum?id=TVHS5Y4dNvM ImageNet-1k Top1 Acc:81.37 %
BEiT https://arxiv.org/abs/2106.08254 https://github.com/microsoft/unilm/tree/master/beit ImageNet-1k Top1 Acc: 85.2 %

PyTorch

PyTorch is a popular AI framework. The original pytorch is only compatible with CPU/GPU. On the Altas 800 server, the PyTorch is a modified version that is compatible with NPU, with minor changes on some APIs (refer to Ascend PyTorch ).

You are recommended to migrate One of the following deep learning models to Ascend PyTorch, and train it on the Altas 800 server with NPU, to reach the target baseline accuracy.

Model Paper Code Dataset Accuracy Baseline
AWSRN Lightweight Image Super-Resolution with Adaptive Weighted Learning Network https://github.com/ChaofWang/AWSRN DIV2K Set5, Set14, B100, Urban100, Manga109 refer to the original paper’s result
GFN Gated Fusion Network for Joint Image Deblurring and Super-Resolution https://github.com/jacquelinelala/GFN GOPRO_Large refer to the original paper’s result
ESPCN Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network https://github.com/Lornatang/ESPCN-PyTorch

https://github.com/leftthomas/espcn | DIV2K, DIV8K, Flickr2K, OST, T91, Set5, Set14, BSDS100 and BSDS200 | refer to the original paper’s result | | GridDehazeNet | GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing | https://github.com/proteus1991/GridDehazeNet | RESIDE | refer to the original paper’s result | | SRGAN | Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network | https://github.com/Lornatang/SRGAN-PyTorch | DIV2K, DIV8K, Flickr2K, OST, T91, Set5, Set14, BSDS100 and BSDS200 | refer to the original paper’s result | | multilogue-net | Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation | https://github.com/amanshenoy/multilogue-net | CMU-MOSEI | refer to the original paper’s result | | infomax | Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis | https://github.com/declare-lab/multimodal-infomax | CMU-MOSI and CMU-MOSEI | CMU-MOSI and CMU-MOSEI |