起因是這樣,有鑒於實驗室的電腦在每次更新後重開機之後,輸入nvidia-smi
後總是會發生以下情形:
1
| NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
|
就是Nvidia的驅動程式又掉線了。

使用套件管理而非.run file安裝Driver#
這個原因有可能是因為安裝Driver時使用了Cuda的Runfile附贈的安裝程式。
在安裝的時候一般是輸入:
1
| sudo sh cuda_<Cuda>_<Driver>_linux.run
|
而非
1
| sudo sh cuda_<Cuda>_<Driver>_linux.run --dkms
|
這會導致Driver會在Kernel升級之後無法運作。由此可知,問題出在apt upgrade
後,重開機使Kernel更新生效,導致Driver無法運作。
因此應該要用Package Manager安裝#
首先先增加圖形韌體的Repository,並且安裝ubuntu用的驅動。
1
2
3
| sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install ubuntu-drivers-common
|
並且使用指令觀察應該安裝哪種驅動。以RTX 2080 Ti為例:
輸出
1
2
3
4
5
6
7
8
9
10
11
| == /sys/devices/pci0000:64/0000:64:00.0/0000:65:00.0/0000:66:10.0/0000:68:00.0 ==
modalias : pci:v000010DEd00001E04sv00001458sd000037C4bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-460 - distro non-free recommended
driver : nvidia-driver-440-server - distro non-free
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-415 - third-party free
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
|
可以看到他推薦安裝460版本的驅動程式。並且可以利用
1
| apt-cache search nvidia-driver
|
確定套件名稱,確定玩直接安裝,並且重開機。
1
2
| sudo apt install nvidia-driver-460 -y
sudo reboot
|
重開機後輸入nvidia-smi
,有介面輸出就代表成功了。
僅緬懷我在重新安裝數次驅動後終於知道原因的光陰。
Reference#
- https://forums.developer.nvidia.com/t/nvidia-driver-not-work-after-reboot-on-ubuntu/70831/2
- https://gitpress.io/@chchang/install-nvidia-driver-cuda-pgstrom-in-ubuntu-1804