CUDA安装及系统版和conda版区别¶
- 运行以下命令查看当前显卡支持的 CUDA 版本:
nvidia-smi
- 使用 conda 安装指定版本的 PyTorch、TorchVision、TorchAudio 以及 CUDA Toolkit,命令如下:
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
- 使用 conda 安装 Numba,执行以下命令:
conda install numba
- 使用 conda 安装 PyCUDA,运行以下命令:
conda install pycuda
以下操作均基于未安装 CUDA Toolkit 系统版本的环境进行。¶
- 使用conda CUDA toolkit
In [1]:
import torch
# 检查 CUDA 是否可用
if torch.cuda.is_available():
print("CUDA 可用")
# 打印 CUDA 版本
print(f"CUDA 版本: {torch.version.cuda}")
else:
print("CUDA 不可用,请检查 PyTorch CUDA 版本是否正确安装。")
CUDA 可用 CUDA 版本: 11.6
In [5]:
import torch
# 检查 CUDA 是否可用
if torch.cuda.is_available():
device = torch.device("cuda") # 如果可用,使用 CUDA 设备
print("使用 CUDA 设备进行计算")
else:
device = torch.device("cpu") # 若不可用,使用 CPU 设备
print("CUDA 不可用,使用 CPU 设备进行计算")
# 定义两个矩阵
matrix_a = torch.randn(500, 500, device=device)
matrix_b = torch.randn(500, 500, device=device)
# 进行矩阵乘法计算
result = torch.matmul(matrix_a, matrix_b)
# 打印结果的形状
print("结果矩阵的形状:", result.shape)
# 若使用 CUDA,将结果移到 CPU 上(可选,方便后续处理)
if device.type == "cuda":
result = result.cpu()
# 打印结果的一部分(取前 5x5 子矩阵)
print("结果矩阵的前 5x5 部分:\n", result[:5, :5])
使用 CUDA 设备进行计算 结果矩阵的形状: torch.Size([500, 500]) 结果矩阵的前 5x5 部分: tensor([[ 13.7241, 9.4655, -9.6982, -27.5854, -24.5696], [ -0.0620, 3.6682, -1.3772, -21.8573, 23.0339], [-30.8551, -27.5281, -15.6172, -16.3898, 25.8177], [-20.3297, 35.3533, 7.8745, -45.0541, 0.0785], [-19.8129, 26.4169, 9.7432, -14.2702, -25.9542]])
In [2]:
from numba import cuda
# 检查Numba CUDA是否可用
print("Numba CUDA available:", cuda.is_available())
# 检查当前GPU设备
if cuda.is_available():
print("Current device:", cuda.gpus.current)
print("Device name:", cuda.gpus[0].name)
else:
print("Numba CUDA is not available.")
Numba CUDA available: True Current device: None Device name: b'NVIDIA GeForce RTX 2070'
In [ ]:
import pycuda.driver as cuda
import pycuda.autoinit
print("PyCUDA installed successfully!")
# 用户警告:在尝试将 CUDA 安装目录添加到 Python 的 DLL 路径时,无法找到 CUDA 安装目录。请设置 CUDA_PATH 环境变量,或确保 nvcc.exe 在系统路径中。
PyCUDA installed successfully!
c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\driver.py:45: UserWarning: Unable to discover CUDA installation directory while attempting to add it to Python's DLL path. Either set the 'CUDA_PATH' environment variable or ensure that 'nvcc.exe' is on the path. warn(
In [4]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np
# 定义CUDA内核
mod = SourceModule("""
__global__ void vector_add(float *a, float *b, float *c, int n) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < n) {
c[idx] = a[idx] + b[idx];
}
}
""")
# 获取内核函数
vector_add = mod.get_function("vector_add")
# 准备数据
n = 10
a = np.random.randn(n).astype(np.float32)
b = np.random.randn(n).astype(np.float32)
c = np.zeros_like(a)
# 分配GPU内存并传输数据
a_gpu = cuda.mem_alloc(a.nbytes)
b_gpu = cuda.mem_alloc(b.nbytes)
c_gpu = cuda.mem_alloc(c.nbytes)
cuda.memcpy_htod(a_gpu, a)
cuda.memcpy_htod(b_gpu, b)
# 调用内核
block_size = 4
grid_size = (n + block_size - 1) // block_size
vector_add(a_gpu, b_gpu, c_gpu, np.int32(n), block=(block_size, 1, 1), grid=(grid_size, 1))
# 将结果从GPU复制回CPU
cuda.memcpy_dtoh(c, c_gpu)
print("a:", a)
print("b:", b)
print("c:", c)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\__init__.py:724, in memoize.<locals>._decorator.<locals>.wrapper(*args) 723 try: --> 724 return func._memoize_dic[args] 725 except AttributeError: 726 # _memoize_dic doesn't exist yet. AttributeError: 'function' object has no attribute '_memoize_dic' During handling of the above exception, another exception occurred: FileNotFoundError Traceback (most recent call last) File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\prefork.py:101, in DirectForker.call_capture_output(self, cmdline, cwd, error_on_nonzero) 100 try: --> 101 popen = Popen(cmdline, cwd=cwd, stdin=PIPE, stdout=PIPE, 102 stderr=PIPE) 103 stdout_data, stderr_data = popen.communicate() File c:\Users\ASUS\.conda\envs\cuda116\lib\subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize) 968 self.stderr = io.TextIOWrapper(self.stderr, 969 encoding=encoding, errors=errors) --> 971 self._execute_child(args, executable, preexec_fn, close_fds, 972 pass_fds, cwd, env, 973 startupinfo, creationflags, shell, 974 p2cread, p2cwrite, 975 c2pread, c2pwrite, 976 errread, errwrite, 977 restore_signals, 978 gid, gids, uid, umask, 979 start_new_session) 980 except: 981 # Cleanup if the child failed starting. File c:\Users\ASUS\.conda\envs\cuda116\lib\subprocess.py:1456, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session) 1455 try: -> 1456 hp, ht, pid, tid = _winapi.CreateProcess(executable, args, 1457 # no special security 1458 None, None, 1459 int(not close_fds), 1460 creationflags, 1461 env, 1462 cwd, 1463 startupinfo) 1464 finally: 1465 # Child is launched. Close the parent's copy of those pipe 1466 # handles that only the child should have open. You need (...) 1469 # pipe will not close when the child process exits and the 1470 # ReadFile will hang. FileNotFoundError: [WinError 2] 系统找不到指定的文件。 The above exception was the direct cause of the following exception: ExecError Traceback (most recent call last) Cell In[4], line 7 4 import numpy as np 6 # 定义CUDA内核 ----> 7 mod = SourceModule(""" 8 __global__ void vector_add(float *a, float *b, float *c, int n) { 9 int idx = threadIdx.x + blockIdx.x * blockDim.x; 10 if (idx < n) { 11 c[idx] = a[idx] + b[idx]; 12 } 13 } 14 """) 16 # 获取内核函数 17 vector_add = mod.get_function("vector_add") File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:355, in SourceModule.__init__(self, source, nvcc, options, keep, no_extern_c, arch, code, cache_dir, include_dirs) 341 def __init__( 342 self, 343 source, (...) 351 include_dirs=[], 352 ): 353 self._check_arch(arch) --> 355 cubin = compile( 356 source, 357 nvcc, 358 options, 359 keep, 360 no_extern_c, 361 arch, 362 code, 363 cache_dir, 364 include_dirs, 365 ) 367 from pycuda.driver import module_from_buffer 369 self.module = module_from_buffer(cubin) File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:304, in compile(source, nvcc, options, keep, no_extern_c, arch, code, cache_dir, include_dirs, target) 301 for i in include_dirs: 302 options.append("-I" + i) --> 304 return compile_plain(source, options, keep, nvcc, cache_dir, target) File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:96, in compile_plain(source, options, keep, nvcc, cache_dir, target) 94 for option in options: 95 checksum.update(option.encode("utf-8")) ---> 96 checksum.update(get_nvcc_version(nvcc).encode("utf-8")) 97 from pycuda.characterize import platform_bits 99 checksum.update(str(platform_bits()).encode("utf-8")) File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\__init__.py:727, in memoize.<locals>._decorator.<locals>.wrapper(*args) 724 return func._memoize_dic[args] 725 except AttributeError: 726 # _memoize_dic doesn't exist yet. --> 727 result = func(*args) 728 func._memoize_dic = {args: result} 729 return result File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pycuda\compiler.py:16, in get_nvcc_version(nvcc) 13 @memoize 14 def get_nvcc_version(nvcc): 15 cmdline = [nvcc, "--version"] ---> 16 result, stdout, stderr = call_capture_output(cmdline) 18 if result != 0 or not stdout: 19 from warnings import warn File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\prefork.py:308, in call_capture_output(cmdline, cwd, error_on_nonzero) 305 def call_capture_output(cmdline: Sequence[str], 306 cwd: str | None = None, 307 error_on_nonzero: bool = True) -> tuple[int, bytes, bytes]: --> 308 return forker.call_capture_output(cmdline, cwd, error_on_nonzero) File c:\Users\ASUS\.conda\envs\cuda116\lib\site-packages\pytools\prefork.py:113, in DirectForker.call_capture_output(self, cmdline, cwd, error_on_nonzero) 111 return popen.returncode, stdout_data, stderr_data 112 except OSError as e: --> 113 raise ExecError( 114 "error invoking '{}': {}".format(" ".join(cmdline), e)) from e ExecError: error invoking 'nvcc --version': [WinError 2] 系统找不到指定的文件。
安装CUDA toolkit和cuDNN系统版¶
在 Conda 环境里安装 CUDA Toolkit 和 cuDNN 后,PyTorch、Numba 等工具能够正常运行,但 pyCUDA 却无法执行自定义的 CUDA 内核函数。以下是解决该问题的操作步骤:
- 下载CUDA Toolkit:
- 安装系统版CUDA toolkit
- 安装完成后,可通过执行
nvcc --version
命令来确认 CUDA 是否安装成功。
- 安装完成后,可通过执行
- 系统 CuDNN 配置(pyCUDA 无需配置)
- CUDA和CuDNN对应版本:https://developer.nvidia.com/rdp/cudnn-archive?referer=https%3A%2F%2Fcloud.tencent.com%2Fdeveloper%2Farticle%2F2158333
- 解压文件,在解压后的 cuda 文件夹中,将
bin
、include
和lib
文件夹剪切并粘贴到 CUDA 安装目录下。
- 安装系统版 CUDA Toolkit 后 pyCUDA 配置
- CUDA相关路径
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\bin
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\lib\x64
- cl.exe路径
- C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\Hostx64\x64
- CUDA相关路径
安装系统版 CUDA Toolkit 后,pyCUDA 即可运行自定义的 CUDA 内核函数
In [1]:
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np
# 定义CUDA内核
mod = SourceModule("""
__global__ void vector_add(float *a, float *b, float *c, int n) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < n) {
c[idx] = a[idx] + b[idx];
}
}
""")
# 获取内核函数
vector_add = mod.get_function("vector_add")
# 准备数据
n = 10
a = np.random.randn(n).astype(np.float32)
b = np.random.randn(n).astype(np.float32)
c = np.zeros_like(a)
# 分配GPU内存并传输数据
a_gpu = cuda.mem_alloc(a.nbytes)
b_gpu = cuda.mem_alloc(b.nbytes)
c_gpu = cuda.mem_alloc(c.nbytes)
cuda.memcpy_htod(a_gpu, a)
cuda.memcpy_htod(b_gpu, b)
# 调用内核
block_size = 4
grid_size = (n + block_size - 1) // block_size
vector_add(a_gpu, b_gpu, c_gpu, np.int32(n), block=(block_size, 1, 1), grid=(grid_size, 1))
# 将结果从GPU复制回CPU
cuda.memcpy_dtoh(c, c_gpu)
print("a:", a)
print("b:", b)
print("c:", c)
a: [ 0.01387011 1.9071813 0.9551705 0.605184 -0.83006436 0.8486385 0.23033296 -0.47039524 1.2744122 -0.964242 ] b: [-1.4487596 -1.7577629 1.6819235 -1.443184 -0.03467776 -1.7946774 -0.7409208 0.52808577 -1.9727508 1.9872469 ] c: [-1.4348894 0.14941835 2.637094 -0.838 -0.8647421 -0.9460389 -0.5105878 0.05769053 -0.6983386 1.0230049 ]
C:\Users\ASUS\AppData\Local\Temp\ipykernel_26592\3657882237.py:7: UserWarning: The CUDA compiler succeeded, but said the following: kernel.cu mod = SourceModule("""