华为 Atlas 800I A2 大模型部署实战(一):驱动/固件安装与MCU升级
这份文档提供了关于华为 Atlas 800I A2 推理服务器的详细部署指南。它首先介绍了服务器的硬件配置,包括其基于鲲鹏 920 CPU和昇腾 910 AI 处理器的架构。随后,文档详细阐述了驱动、固件和 MCU 的安装流程,明确区分了首次安装和覆盖安装的步骤差异。此外,它还提供了在安装前检查操作系统和 NPU 芯片状态的指导,并展示了如何获取所需的软件包和创建运行用户。最后,文档通过命令行示例,分步演示了驱动和固件的安装,以及MCU 固件的升级方法,确保了服务器的正常运行和功能完备。
服务器配置
AI 服务器:华为 Atlas 800I A2 推理服务器
| 组件 | 规格 | 
|---|---|
| CPU | 鲲鹏 920(5250) | 
| NPU | 昇腾 910B4(8X32G) | 
| 内存 | 1024GB | 
| 硬盘 | 系统盘:450GB SSDX2 RAID1 数据盘:3.5TB NVME SSDX4  | 
    
| 操作系统 | openEuler 22.03 LTS | 
Atlas 800I A2 介绍
Atlas 800I A2 推理服务器是基于鲲鹏920+昇腾910 AI处理器的AI推理设备。


| 1 | 铜排模块 | 2 | NPU模组 | 
|---|---|---|---|
| 3 | 硬盘背板 | 4 | 加强横梁 | 
| 5 | NPU载板 | 6 | 参数面板接口卡 | 
| 7 | 机箱 | 8 | CPU主板导风罩 | 
| 9 | 硬盘 | 10 | 风扇模块 | 
| 11 | CPU散热器 | 12 | DIMM | 
| 13 | 灵活IO卡 (选配) | 14 | CPU主板 | 
| 15 | 电源框 | 16 | 电源模块 | 
| 17 | Riser模组2 | 18 | Riser模组1 | 
安装流程

graph LR
    A[开始] --> B{确认操作系统}
    B --> C{获取软件包}
    C --> D{创建运行用户}
    D --> E{检查环境}
    E -- 首次安装 --> F[安装驱动]
    F --> H[安装固件]
    E -- 覆盖安装 --> G[安装固件]
    G --> I[安装驱动]
    H --> J[升级MCU]
    I --> J[升级MCU]
    J --> K[结束]
- 首次安装场景:硬件设备刚出厂时未安装驱动,或者硬件设备前期安装过驱动固件但是当前已卸载,上述场景属于首次安装场景,需按照
“驱动->固件”的顺序安装驱动固件。 - 覆盖安装场景:硬件设备前期安装过驱动固件且未卸载,当前要再次安装驱动固件,此场景属于覆盖安装场景,需按照
“固件->驱动”的顺序安装固件驱动。 - 由于设备出厂时已集成了MCU初始版本,为了保障所有功能正常使用,需将MCU升级到和驱动固件配套的版本。MCU升级操作具体请参见《Atlas A2 中心推理和训练硬件 24.1.0 NPU驱动和固件升级指导书》中“物理机升级>升级MCU”章节。
 
确认操作系统
查询服务器支持的操作系统
参考昇腾计算兼容性查询助手。

备注说明:
- 87.默认不支持virt-manager管理
 - 92.SOC SAS控制器下的SAS/SATA硬盘安装系统时请选择直接安装方式安装,请在操作系统兼容性添加RAID控制卡型号查询RAID控制卡与OS的兼容性
 - 125.推荐使用kernel-5.10.0-60.70.0.94.oe2203.aarch64及以上的内核版本
 - 127.仅兼容内核版本4.19.90-24.4
 - 128.仅兼容内核版本4.19.90-52.22
 - 129.仅兼容内核版本4.19.90-2107.6.0.0098.oe1.bclinux.aarch64。
 - 134.建议使用最小安装方式安装OS
 
查询服务器的操作系统架构及版本
uname -m && cat /etc/*release
aarch64
openEuler release 22.03 LTS
NAME="openEuler"
VERSION="22.03 LTS"
ID="openEuler"
VERSION_ID="22.03"
PRETTY_NAME="openEuler 22.03 LTS"
ANSI_COLOR="0;31"
openEuler release 22.03 LTS
查询 CPU 信息
lscpu
Architecture:           aarch64
  CPU op-mode(s):       64-bit
  Byte Order:           Little Endian
CPU(s):                 192
  On-line CPU(s) list:  0-191
Vendor ID:              HiSilicon
  BIOS Vendor ID:       HiSilicon
  Model name:           Kunpeng-920
    BIOS Model name:    HUAWEI Kunpeng 920 5250
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 48
    Socket(s):          4
    Stepping:           0x1
    Frequency boost:    disabled
    CPU max MHz:        2600.0000
    CPU min MHz:        200.0000
    BogoMIPS:           200.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
Caches (sum of all):    
  L1d:                  12 MiB (192 instances)
  L1i:                  12 MiB (192 instances)
  L2:                   96 MiB (192 instances)
  L3:                   192 MiB (8 instances)
NUMA:                   
  NUMA node(s):         8
  NUMA node0 CPU(s):    0-23
  NUMA node1 CPU(s):    24-47
  NUMA node2 CPU(s):    48-71
  NUMA node3 CPU(s):    72-95
  NUMA node4 CPU(s):    96-119
  NUMA node5 CPU(s):    120-143
  NUMA node6 CPU(s):    144-167
  NUMA node7 CPU(s):    168-191
Vulnerabilities:        
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected
获取软件包
ll
total 123M
-rw-r--r--. 1 root root 6.8M Jul 17 10:29 Ascend-hdk-910b-mcu_25.50.10.zip
-rw-r--r--. 1 root root 116M Jul 17 10:28 Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run
-rw-r--r--. 1 root root 277K Jul 17 10:28 Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run
创建运行用户
安装用户为安装驱动和固件所使用的用户,运行用户为驱动固件安装完成后,后续运行推理或训练业务时启动运行驱动和固件的用户。
- 如果创建的用户和用户组是HwHiAiUser,安装软件包时无需指定运行用户,默认即为HwHiAiUser。
 - 如果创建的用户和用户组是非HwHiAiUser(含root),安装软件包时必须指定运行用户(通过–install-username=username –install-usergroup=usergroup参数指定)。因此如果对运行用户名称没有特殊要求,建议使用HwHiAiUser。
 
请参见如下方法创建运行用户。
- 以root用户登录服务器。
 - 执行如下命令,创建运行用户。
 
groupadd usergroup
useradd -g usergroup -d /home/username -m username -s /bin/bash
示例:
groupadd HwHiAiUser
useradd -g HwHiAiUser -d /home/HwHiAiUser -m HwHiAiUser -s /bin/bash
我这里直接使用的 root 用户。
检查环境
系统是否安装过软件包
执行 lsmod | grep drv_pcie_host 命令查询系统是否安装过软件包。
- 如无内容表示未安装过软件包。可以直接安装软件包。
 - 如有内容,表示安装过软件包。需要先卸载驱动包后,再安装新版本软件包。
 
检测芯片是否正常在位
执行 lspci | grep d802 命令,如果服务器上有N(N>0)张NPU芯片,回显中含“d802”字段的行数为N,则表示NPU芯片正常在位。
01:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
02:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
41:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
42:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
81:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
82:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
c1:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
c2:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d802 (rev 20)
安装驱动
增加软件包的可执行权限
chmod +x Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run
校验安装包的一致性和完整性
./Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run --check
Makeself logfile: /root/log/makeself/makeself.log
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCEND DRIVER RUN PACKAGE  100%  
[Driver] [2025-07-17 10:31:56] [INFO]Start time: 2025-07-17 10:31:56
[Driver] [2025-07-17 10:31:56] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2025-07-17 10:31:56] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2025-07-17 10:31:56] [INFO]End time: 2025-07-17 10:31:56
安装驱动
软件包默认安装路径为 “/usr/local/Ascend”。
./Ascend-hdk-910b-npu-driver_25.0.rc1.1_linux-aarch64.run --full --install-username=root --install-usergroup=root --install-for-all
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCEND DRIVER RUN PACKAGE  100%  
[Driver] [2025-07-17 10:35:21] [INFO]Start time: 2025-07-17 10:35:21
[Driver] [2025-07-17 10:35:21] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Driver] [2025-07-17 10:35:21] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Driver] [2025-07-17 10:35:21] [WARNING]Do not power off or restart the system during the installation/upgrade
[Driver] [2025-07-17 10:35:21] [INFO]set username and usergroup, root:root
[Driver] [2025-07-17 10:35:22] [INFO]driver install type: DKMS
[Driver] [2025-07-17 10:35:22] [INFO]upgradePercentage:10%
[Driver] [2025-07-17 10:35:30] [INFO]upgradePercentage:30%
[Driver] [2025-07-17 10:35:30] [INFO]upgradePercentage:40%
[Driver] [2025-07-17 10:35:44] [INFO]upgradePercentage:90%
[Driver] [2025-07-17 10:35:45] [INFO]Waiting for device startup...
[Driver] [2025-07-17 10:35:49] [INFO]Device startup success
[Driver] [2025-07-17 10:36:06] [INFO]upgradePercentage:100%
[Driver] [2025-07-17 10:36:24] [INFO]Driver package installed successfully! The new version takes effect immediately. 
[Driver] [2025-07-17 10:36:24] [INFO]End time: 2025-07-17 10:36:24
注意:root 用户安装驱动时,需使用--install-username=root --install-usergroup=root --install-for-all参数。
查看驱动加载是否成功
npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 25.0.rc1.1               Version: 25.0.rc1.1                                           |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B4               | OK            | 87.2        33                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 1     910B4               | OK            | 82.7        33                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 2     910B4               | OK            | 82.7        33                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 3     910B4               | OK            | 89.1        34                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 4     910B4               | OK            | 84.5        39                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 5     910B4               | OK            | 84.8        37                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 6     910B4               | OK            | 87.1        38                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          2654 / 32768         |
+===========================+===============+====================================================+
| 7     910B4               | OK            | 81.6        37                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          2653 / 32768         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+
安装固件
增加软件包的可执行权限
chmod +x Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run 
校验安装包的一致性和完整性
./Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run --check
Makeself logfile: /root/log/makeself/makeself.log
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCEND-HDK-910B-NPU FIRMWARE RUN PACKAGE  100%  
[Firmware] [2025-07-17 10:42:55] [INFO]Start time: 2025-07-17 10:42:55
[Firmware] [2025-07-17 10:42:55] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Firmware] [2025-07-17 10:42:55] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Firmware] [2025-07-17 10:42:56] [INFO]End time: 2025-07-17 10:42:56
安装固件
./Ascend-hdk-910b-npu-firmware_7.7.0.1.231.run --full
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCEND-HDK-910B-NPU FIRMWARE RUN PACKAGE  100%  
[Firmware] [2025-07-17 10:43:18] [INFO]Start time: 2025-07-17 10:43:18
[Firmware] [2025-07-17 10:43:18] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Firmware] [2025-07-17 10:43:18] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Firmware] [2025-07-17 10:43:18] [WARNING]Do not power off or restart the system during the installation/upgrade
[Firmware] [2025-07-17 10:43:20] [INFO]upgradePercentage: 0%
[Firmware] [2025-07-17 10:43:32] [INFO]upgradePercentage: 90%
[Firmware] [2025-07-17 10:43:42] [INFO]upgradePercentage: 90%
[Firmware] [2025-07-17 10:43:52] [INFO]upgradePercentage: 90%
[Firmware] [2025-07-17 10:43:53] [INFO]upgradePercentage: 100%
[Firmware] [2025-07-17 10:43:53] [INFO]The firmware of [8] chips are successfully upgraded.
[Firmware] [2025-07-17 10:43:54] [INFO]Firmware package installed successfully! Reboot now or after driver installation for the installation/upgrade to take effect.
[Firmware] [2025-07-17 10:43:54] [INFO]End time: 2025-07-17 10:43:54
升级MCU
MCU是带外管理模块,具备单板监测、故障上报等功能。Atlas 900 A2 PoD 集群基础单元、Atlas 800T A2 训练服务器和Atlas 800I A2 推理服务器已集成了初始版本,为了保障所有功能正常使用,请将MCU升级到最新版本。
- 通过 
npu-smi工具升级MCU。npu-smi工具可以将单个NPU的MCU升级到相应版本,如果配备了多个NPU,需要逐个升级。
 
准备软件包
解压 zip 文件,准备软件包 “Ascend-hdk-xxx-mcu_Y.hpm”,这里是 Ascend-hdk-910b-mcu_25.50.10.hpm。
unzip Ascend-hdk-910b-mcu_25.50.10.zip 
Archive:  Ascend-hdk-910b-mcu_25.50.10.zip
  inflating: Ascend-hdk-910b-mcu_25.50.10.hpm  
  inflating: Ascend-hdk-910b-mcu_25.50.10.hpm.cms  
  inflating: crldata.crl             
  inflating: version.xml             
  inflating: version.xml.cms         
显示所有设备的映射信息
npu-smi info -m
        NPU ID                         Chip ID                        Chip Logic ID                  Chip Name                     
        0                              0                              0                              Ascend 910B4
        0                              1                              -                              Mcu                           
        1                              0                              1                              Ascend 910B4
        1                              1                              -                              Mcu                           
        2                              0                              2                              Ascend 910B4
        2                              1                              -                              Mcu                           
        3                              0                              3                              Ascend 910B4
        3                              1                              -                              Mcu                           
        4                              0                              4                              Ascend 910B4
        4                              1                              -                              Mcu                           
        5                              0                              5                              Ascend 910B4
        5                              1                              -                              Mcu                           
        6                              0                              6                              Ascend 910B4
        6                              1                              -                              Mcu                           
        7                              0                              7                              Ascend 910B4
        7                              1                              -                              Mcu                           
显示所有设备的拓扑信息
npu-smi info -l
        Total Count                    : 8
        NPU ID                         : 0
        Chip Count                     : 1
        NPU ID                         : 1
        Chip Count                     : 1
        NPU ID                         : 2
        Chip Count                     : 1
        NPU ID                         : 3
        Chip Count                     : 1
        NPU ID                         : 4
        Chip Count                     : 1
        NPU ID                         : 5
        Chip Count                     : 1
        NPU ID                         : 6
        Chip Count                     : 1
        NPU ID                         : 7
        Chip Count                     : 1
查询MCU版本号
npu-smi upgrade -b mcu -i 0
        Version                        : 23.3.13
升级指定NPU的MCU
npu-smi upgrade -t mcu -i 0 -f Ascend-hdk-910b-mcu_25.50.10.hpm
[WARNING]: Do not power off or restart the system during the upgrade.
        Validity                       : success
file_len(554991)--offset(554991) [100].
        transfile                      : successfully
        Status                         : start to upgrade
Start upgrade [100].
        Status                         : OK
        Message                        : Start device upgrade successfully
        Message                        : need active mcu
使新版本生效
npu-smi upgrade -a mcu -i 0
        Status                         : OK
        Message                        : The upgrade has taken effect after performed reboot successfully.
查询MCU版本号
npu-smi upgrade -b mcu -i 0
        Version                        : 25.50.10
定制升级MCU的工具
支持三种调用方式:
- 
    
指定单个 NPU_ID:
./upgrade_mcu.sh 3
./upgrade_mcu.sh 5 Ascend-hdk-910b-mcu_25.50.10.hpm - 
    
指定多个 NPU_ID:
./upgrade_mcu.sh 2,4,6
./upgrade_mcu.sh 1,3 Ascend-hdk-910b-mcu_25.50.10.hpm - 
    
一键升级 0-7 全部:
./upgrade_mcu.sh all
./upgrade_mcu.sh all Ascend-hdk-910b-mcu_25.50.10.hpm 
保存为 upgrade_mcu.sh,chmod +x upgrade_mcu.sh 即可。
#!/bin/bash
# upgrade_mcu.sh <NPU_ID|ID1,ID2,...|all> [Ascend-hdk-xxx-mcu_Y.hpm]
usage() {
    echo "Usage:"
    echo "  Single:  $0 <0-7> [fw.hpm]"
    echo "  Multi:   $0 <id1,id2,...> [fw.hpm]"
    echo "  All:     $0 all [fw.hpm]"
    exit 1
}
#---------- 参数校验 ----------
[[ $# -eq 0 ]] && usage
# 解析 NPU_ID 列表
case "$1" in
    all|ALL)
        NPU_LIST=(0 1 2 3 4 5 6 7)
        ;;
    *)
        # 尝试按逗号分割
        IFS=',' read -ra NPU_LIST <<< "$1"
        for id in "${NPU_LIST[@]}"; do
            [[ "$id" =~ ^[0-7]$ ]] || usage
        done
        ;;
esac
# 固件文件
FW_FILE=${2:-"Ascend-hdk-910b-mcu_25.50.10.hpm"}
#---------- 循环升级 ----------
for NPU_ID in "${NPU_LIST[@]}"; do
    echo "=== MCU upgrade for NPU_ID $NPU_ID ==="
    echo "Firmware: $FW_FILE"
    npu-smi upgrade -b mcu -i "$NPU_ID"
    npu-smi upgrade -t mcu -i "$NPU_ID" -f "$FW_FILE"
    npu-smi upgrade -a mcu -i "$NPU_ID"
    npu-smi upgrade -b mcu -i "$NPU_ID"
    echo "NPU_ID $NPU_ID done."
    echo
done
echo "All requested NPUs upgraded."
使用示例
# 升级 0 号
./upgrade_mcu.sh 0
# 升级 2、5、7 三块
./upgrade_mcu.sh 2,5,7 Ascend-hdk-910b-mcu_25.50.10.hpm
# 一键升级全部
./upgrade_mcu.sh all Ascend-hdk-910b-mcu_25.50.10.hpm