Linux 服务器 cuda安装

Linux 服务器 cuda安装

1. 驱动安装

驱动下载地址:

https://www.nvidia.com/Download/index.aspx

安装方法:

sudo sh 文件名

遇到问题,X服务没关闭问题:

ERROR: You appear to be running an X server; please exit X before            
         installing.  For further details, please see the section INSTALLING   
         THE NVIDIA DRIVER in the README available on the Linux driver         
         download page at www.nvidia.com.

解决办法:

在上面的安装指令后加 -no-x-check

仍然报错:

ERROR: Unable to find the development tool `cc` in your path; please make sure that you have the package 'gcc' installed.  If gcc is installed on your system, then please check that `cc` is in your PATH. 

没有gcc,安装

sudo yum install gcc

gcc --version #检查是否安装成功

仍然报错:

ERROR: Unable to find the kernel source tree for the currently running kernel.  Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat      
         Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.  If you know the correct kernel source files are installed, you may specify the kernel source path with   
         the '--kernel-source-path' command line option.

安装内核:

sudo yum install kernel-devel-$(uname -r)

再次执行安装指令:

sudo sh NVIDIA-Linux-x86_64-535.129.03.run -no-x-check

#安装成功后使用
nvidia-smi检查是否成功

2. Cuda 安装

2.1 安装

cuda安装链接:

https://developer.nvidia.com/cuda-toolkit-archive

选择如下配置:

在这里插入图片描述

下载安装指令:

wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_535.104.05_linux.run
sudo sh cuda_12.2.2_535.104.05_linux.run

在这里插入图片描述

安装失败:

Installation failed. See log at /var/log/cuda-installer.log for details.

[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /bin/gcc

[INFO]: gcc version: gcc 版本 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) 

[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.17.5)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 535.104.05
[INFO]: Executing NVIDIA-Linux-x86_64-535.104.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 535.104.05 failed, quitting

原因是我们之前已经安装过显卡驱动了,因此在如下界面时要将Driver中的X按一下回车去掉,否则安装失败

在这里插入图片描述

显示下面样例就是安装成功了:

在这里插入图片描述

2.2 配置环境变量

配置命令:

vim ~/.bashrc
export CUDA_HOME=/usr/local/cuda-12.2# 在上图中有显示位置
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64 
export PATH=${CUDA_HOME}/bin:${PATH}
source ~/.bashrc
nvcc -V

3. cuDNN安装

地址:

https://developer.nvidia.com/rdp/cudnn-archive

选择对应的cuda版本,下载可能很慢,需要一些方法

xz -d  cudnn-linux-x86_64-8.9.5.30_cuda12-archive.tar.xz
tar xvf cudnn-linux-x86_64-8.9.5.30_cuda12-archive.tar

# 将文件mv一下
mv /root/cu122/cudnn-linux-x86_64-8.9.5.30_cuda12-archive/include/* /usr/local/cuda-12.2/include/
mv /root/cu122/cudnn-linux-x86_64-8.9.5.30_cuda12-archive/lib/* /usr/local/cuda-12.2/lib64/

#查看cudnn版本:
cat /usr/local/cuda/include/cudnn_version.h