Overview

This article walks the installer through the process of upgrading NVIDA drivers from a previous install. In working through the installation, references will be made as appropriate to provide context.

System Parameters/Versions:

  • Base OS:
    • Amazon Linux 2023: ami-05576a079321f21f8
    • Rocky 8.10

Upgrade NVIDIA Drivers

Step 1: Update base OS

Each base OS has different package updates.
  • Amazon Linux 2023:
# update OS
dnf check-release-update
sudo dnf update -y
  • Rocky 8:
# update system
dnf update -y

Step 2: Update Driver

Each base OS has different package requirements.
  • Amazon Linux 2023:
# update kernel
sudo dnf install -y dkms kernel-devel kernel-modules-extra

# enable dkms for the drivers
sudo systemctl enable --now dkms

# add nvidia repo
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/cuda-amzn2023.repo

# clean cache
sudo dnf clean expire-cache

# optional, lilt available driver packages
dnf module list nvidia-driver

# because a prevous install exisits, need to reset and enable new driver
sudo dnf module reset -y nvidia-driver
sudo dnf module enable -y nvidia-driver:570-dkms

# install driver
sudo dnf module install -y nvidia-driver:570-dkms
  • Rocky 8:
# extra required packages
sudo dnf -y install epel-release

# get rhel/rocky OS current version
export cur_ver="rhel$(rpm -E %rhel)"

# install cuda-toolkit
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$cur_ver/x86_64/cuda-$cur_ver.repo

# optional, lilt available driver packages
dnf module list nvidia-driver

# because a prevous install exisits, need to reset and enable new driver
sudo dnf module reset -y nvidia-driver
sudo dnf module enable -y nvidia-driver:570-dkms

# install nvidia driver
sudo dnf -y module install nvidia-driver:570-dkms

Step 3: Reboot

Necessary for new settings and drivers to be applied.
# reboot to complete install
reboot

Step 4: Verify new Driver

Verify that the NVIDIA drivers were correctly installed.
nvidia-smi
Output should be similar to the following (depending on the number of server GPUs). If not, reinstall NVIDIA drivers from the previous section:
 NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10G                    Off |   00000000:00:16.0 Off |                    0 |
|  0%   23C    P8              9W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A10G                    Off |   00000000:00:17.0 Off |                    0 |
|  0%   23C    P8             12W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A10G                    Off |   00000000:00:18.0 Off |                    0 |
|  0%   23C    P8              9W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A10G                    Off |   00000000:00:19.0 Off |                    0 |
|  0%   24C    P8             10W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A10G                    Off |   00000000:00:1A.0 Off |                    0 |
|  0%   23C    P8             16W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A10G                    Off |   00000000:00:1B.0 Off |                    0 |
|  0%   24C    P8             10W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A10G                    Off |   00000000:00:1C.0 Off |                    0 |
|  0%   23C    P8             13W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A10G                    Off |   00000000:00:1D.0 Off |                    0 |
|  0%   24C    P8             13W /  300W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+