CUDA安装及系统版和conda版区别¶

  • 运行以下命令查看当前显卡支持的 CUDA 版本:
nvidia-smi
  • 使用 conda 安装指定版本的 PyTorch、TorchVision、TorchAudio 以及 CUDA Toolkit,命令如下:
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
  • 使用 conda 安装 Numba,执行以下命令:
conda install numba
  • 使用 conda 安装 PyCUDA,运行以下命令:
conda install pycuda

以下操作均基于未安装 CUDA Toolkit 系统版本的环境进行。¶

  • 使用conda CUDA toolkit
In [1]:
import torch

# 检查 CUDA 是否可用
if torch.cuda.is_available():
    print("CUDA 可用")
    # 打印 CUDA 版本
    print(f"CUDA 版本: {torch.version.cuda}")
else:
    print("CUDA 不可用,请检查 PyTorch CUDA 版本是否正确安装。")
CUDA 可用
CUDA 版本: 11.6
In [5]:
import torch

# 检查 CUDA 是否可用
if torch.cuda.is_available():
    device = torch.device("cuda")  # 如果可用,使用 CUDA 设备
    print("使用 CUDA 设备进行计算")
else:
    device = torch.device("cpu")  # 若不可用,使用 CPU 设备
    print("CUDA 不可用,使用 CPU 设备进行计算")

# 定义两个矩阵
matrix_a = torch.randn(500, 500, device=device)
matrix_b = torch.randn(500, 500, device=device)

# 进行矩阵乘法计算
result = torch.matmul(matrix_a, matrix_b)

# 打印结果的形状
print("结果矩阵的形状:", result.shape)

# 若使用 CUDA,将结果移到 CPU 上(可选,方便后续处理)
if device.type == "cuda":
    result = result.cpu()

# 打印结果的一部分(取前 5x5 子矩阵)
print("结果矩阵的前 5x5 部分:\n", result[:5, :5])
使用 CUDA 设备进行计算
结果矩阵的形状: torch.Size([500, 500])
结果矩阵的前 5x5 部分:
 tensor([[ 13.7241,   9.4655,  -9.6982, -27.5854, -24.5696],
        [ -0.0620,   3.6682,  -1.3772, -21.8573,  23.0339],
        [-30.8551, -27.5281, -15.6172, -16.3898,  25.8177],
        [-20.3297,  35.3533,   7.8745, -45.0541,   0.0785],
        [-19.8129,  26.4169,   9.7432, -14.2702, -25.9542]])
In [2]:
from numba import cuda

# 检查Numba CUDA是否可用
print("Numba CUDA available:", cuda.is_available())

# 检查当前GPU设备
if cuda.is_available():
    print("Current device:", cuda.gpus.current)
    print("Device name:", cuda.gpus[0].name)
else:
    print("Numba CUDA is not available.")
Numba CUDA available: True
Current device: None
Device name: b'NVIDIA GeForce RTX 2070'
In [ ]:
import pycuda.driver as cuda
import pycuda.autoinit

print("PyCUDA installed successfully!")

# 用户警告:在尝试将 CUDA 安装目录添加到 Python 的 DLL 路径时,无法找到 CUDA 安装目录。请设置 CUDA_PATH 环境变量,或确保 nvcc.exe 在系统路径中。
PyCUDA installed successfully!
c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\driver.py:45: UserWarning: Unable to discover CUDA installation directory while attempting to add it to Python's DLL path. Either set the 'CUDA_PATH' environment variable or ensure that 'nvcc.exe' is on the path.
  warn(
In [4]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np

# 定义CUDA内核
mod = SourceModule("""
__global__ void vector_add(float *a, float *b, float *c, int n) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx < n) {
        c[idx] = a[idx] + b[idx];
    }
}
""")

# 获取内核函数
vector_add = mod.get_function("vector_add")

# 准备数据
n = 10
a = np.random.randn(n).astype(np.float32)
b = np.random.randn(n).astype(np.float32)
c = np.zeros_like(a)

# 分配GPU内存并传输数据
a_gpu = cuda.mem_alloc(a.nbytes)
b_gpu = cuda.mem_alloc(b.nbytes)
c_gpu = cuda.mem_alloc(c.nbytes)
cuda.memcpy_htod(a_gpu, a)
cuda.memcpy_htod(b_gpu, b)

# 调用内核
block_size = 4
grid_size = (n + block_size - 1) // block_size
vector_add(a_gpu, b_gpu, c_gpu, np.int32(n), block=(block_size, 1, 1), grid=(grid_size, 1))

# 将结果从GPU复制回CPU
cuda.memcpy_dtoh(c, c_gpu)

print("a:", a)
print("b:", b)
print("c:", c)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\__init__.py:724, in memoize.<locals>._decorator.<locals>.wrapper(*args)
    723 try:
--> 724     return func._memoize_dic[args]
    725 except AttributeError:
    726     # _memoize_dic doesn't exist yet.

AttributeError: 'function' object has no attribute '_memoize_dic'

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\prefork.py:101, in DirectForker.call_capture_output(self, cmdline, cwd, error_on_nonzero)
    100 try:
--> 101     popen = Popen(cmdline, cwd=cwd, stdin=PIPE, stdout=PIPE,
    102                   stderr=PIPE)
    103     stdout_data, stderr_data = popen.communicate()

File c:\Users\ASUS\.conda\envs\cuda116\lib\subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
    968             self.stderr = io.TextIOWrapper(self.stderr,
    969                     encoding=encoding, errors=errors)
--> 971     self._execute_child(args, executable, preexec_fn, close_fds,
    972                         pass_fds, cwd, env,
    973                         startupinfo, creationflags, shell,
    974                         p2cread, p2cwrite,
    975                         c2pread, c2pwrite,
    976                         errread, errwrite,
    977                         restore_signals,
    978                         gid, gids, uid, umask,
    979                         start_new_session)
    980 except:
    981     # Cleanup if the child failed starting.

File c:\Users\ASUS\.conda\envs\cuda116\lib\subprocess.py:1456, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session)
   1455 try:
-> 1456     hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
   1457                              # no special security
   1458                              None, None,
   1459                              int(not close_fds),
   1460                              creationflags,
   1461                              env,
   1462                              cwd,
   1463                              startupinfo)
   1464 finally:
   1465     # Child is launched. Close the parent's copy of those pipe
   1466     # handles that only the child should have open.  You need
   (...)
   1469     # pipe will not close when the child process exits and the
   1470     # ReadFile will hang.

FileNotFoundError: [WinError 2] 系统找不到指定的文件。

The above exception was the direct cause of the following exception:

ExecError                                 Traceback (most recent call last)
Cell In[4], line 7
      4 import numpy as np
      6 # 定义CUDA内核
----> 7 mod = SourceModule("""
      8 __global__ void vector_add(float *a, float *b, float *c, int n) {
      9     int idx = threadIdx.x + blockIdx.x * blockDim.x;
     10     if (idx < n) {
     11         c[idx] = a[idx] + b[idx];
     12     }
     13 }
     14 """)
     16 # 获取内核函数
     17 vector_add = mod.get_function("vector_add")

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:355, in SourceModule.__init__(self, source, nvcc, options, keep, no_extern_c, arch, code, cache_dir, include_dirs)
    341 def __init__(
    342     self,
    343     source,
   (...)
    351     include_dirs=[],
    352 ):
    353     self._check_arch(arch)
--> 355     cubin = compile(
    356         source,
    357         nvcc,
    358         options,
    359         keep,
    360         no_extern_c,
    361         arch,
    362         code,
    363         cache_dir,
    364         include_dirs,
    365     )
    367     from pycuda.driver import module_from_buffer
    369     self.module = module_from_buffer(cubin)

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:304, in compile(source, nvcc, options, keep, no_extern_c, arch, code, cache_dir, include_dirs, target)
    301 for i in include_dirs:
    302     options.append("-I" + i)
--> 304 return compile_plain(source, options, keep, nvcc, cache_dir, target)

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:96, in compile_plain(source, options, keep, nvcc, cache_dir, target)
     94 for option in options:
     95     checksum.update(option.encode("utf-8"))
---> 96 checksum.update(get_nvcc_version(nvcc).encode("utf-8"))
     97 from pycuda.characterize import platform_bits
     99 checksum.update(str(platform_bits()).encode("utf-8"))

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\__init__.py:727, in memoize.<locals>._decorator.<locals>.wrapper(*args)
    724     return func._memoize_dic[args]
    725 except AttributeError:
    726     # _memoize_dic doesn't exist yet.
--> 727     result = func(*args)
    728     func._memoize_dic = {args: result}
    729     return result

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:16, in get_nvcc_version(nvcc)
     13 @memoize
     14 def get_nvcc_version(nvcc):
     15     cmdline = [nvcc, "--version"]
---> 16     result, stdout, stderr = call_capture_output(cmdline)
     18     if result != 0 or not stdout:
     19         from warnings import warn

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\prefork.py:308, in call_capture_output(cmdline, cwd, error_on_nonzero)
    305 def call_capture_output(cmdline: Sequence[str],
    306                         cwd: str | None = None,
    307                         error_on_nonzero: bool = True) -> tuple[int, bytes, bytes]:
--> 308     return forker.call_capture_output(cmdline, cwd, error_on_nonzero)

File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\prefork.py:113, in DirectForker.call_capture_output(self, cmdline, cwd, error_on_nonzero)
    111     return popen.returncode, stdout_data, stderr_data
    112 except OSError as e:
--> 113     raise ExecError(
    114             "error invoking '{}': {}".format(" ".join(cmdline), e)) from e

ExecError: error invoking 'nvcc --version': [WinError 2] 系统找不到指定的文件。

安装CUDA toolkit和cuDNN系统版¶

  • 在 Conda 环境里安装 CUDA Toolkit 和 cuDNN 后,PyTorch、Numba 等工具能够正常运行,但 pyCUDA 却无法执行自定义的 CUDA 内核函数。以下是解决该问题的操作步骤:

    • 下载CUDA Toolkit:
      • https://developer.nvidia.com/cuda-12-5-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local
    • 安装系统版CUDA toolkit
      • 安装完成后,可通过执行 nvcc --version 命令来确认 CUDA 是否安装成功。
    • 系统 CuDNN 配置(pyCUDA 无需配置)
      • CUDA和CuDNN对应版本:https://developer.nvidia.com/rdp/cudnn-archive?referer=https%3A%2F%2Fcloud.tencent.com%2Fdeveloper%2Farticle%2F2158333
      • 解压文件,在解压后的 cuda 文件夹中,将 bin、include 和 lib 文件夹剪切并粘贴到 CUDA 安装目录下。
    • 安装系统版 CUDA Toolkit 后 pyCUDA 配置
      • CUDA相关路径
        • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin
        • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include
        • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib\x64
      • cl.exe路径
        • C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\Hostx64\x64
  • 安装系统版 CUDA Toolkit 后,pyCUDA 即可运行自定义的 CUDA 内核函数

In [1]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np

# 定义CUDA内核
mod = SourceModule("""
__global__ void vector_add(float *a, float *b, float *c, int n) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx < n) {
        c[idx] = a[idx] + b[idx];
    }
}
""")

# 获取内核函数
vector_add = mod.get_function("vector_add")

# 准备数据
n = 10
a = np.random.randn(n).astype(np.float32)
b = np.random.randn(n).astype(np.float32)
c = np.zeros_like(a)

# 分配GPU内存并传输数据
a_gpu = cuda.mem_alloc(a.nbytes)
b_gpu = cuda.mem_alloc(b.nbytes)
c_gpu = cuda.mem_alloc(c.nbytes)
cuda.memcpy_htod(a_gpu, a)
cuda.memcpy_htod(b_gpu, b)

# 调用内核
block_size = 4
grid_size = (n + block_size - 1) // block_size
vector_add(a_gpu, b_gpu, c_gpu, np.int32(n), block=(block_size, 1, 1), grid=(grid_size, 1))

# 将结果从GPU复制回CPU
cuda.memcpy_dtoh(c, c_gpu)

print("a:", a)
print("b:", b)
print("c:", c)
a: [ 0.01387011  1.9071813   0.9551705   0.605184   -0.83006436  0.8486385
  0.23033296 -0.47039524  1.2744122  -0.964242  ]
b: [-1.4487596  -1.7577629   1.6819235  -1.443184   -0.03467776 -1.7946774
 -0.7409208   0.52808577 -1.9727508   1.9872469 ]
c: [-1.4348894   0.14941835  2.637094   -0.838      -0.8647421  -0.9460389
 -0.5105878   0.05769053 -0.6983386   1.0230049 ]
C:\Users\ASUS\AppData\Local\Temp\ipykernel_26592\3657882237.py:7: UserWarning: The CUDA compiler succeeded, but said the following:
kernel.cu

  mod = SourceModule("""