PyTorch¶
- 开源的深度学习框架
- 由 Facebook 在 2016 年发布
- https://pytorch.org/
- 教程
PyTorch安装¶
- CUDA版本
- https://cloud.tencent.com/developer/article/2158333
- cuDNN (CUDA Deep Neural Network Library) 是由 NVIDIA 开发的一个 GPU 加速库,专门为深度学习的计算需求而设计。
张量¶
In [151]:
import torch
import numpy as np
In [152]:
# 初始化张量
# 直接来自数据
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
# 来自NumPy数组
np_array = np.array(data)
# 生成的 Tensor 和原始的 NumPy 数组共享相同的内存空间。
x_np = torch.from_numpy(np_array)
# 使用随机或恒定值
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
In [153]:
# 张量的属性
tensor = torch.rand(3, 4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")
Shape of tensor: torch.Size([3, 4]) Datatype of tensor: torch.float32 Device tensor is stored on: cpu
张量上的运算¶
In [154]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
tensor = tensor.to('cuda')
# 标准的 numpy 式索引和切片:
tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)
First row: tensor([1., 1., 1., 1.]) First column: tensor([1., 1., 1., 1.]) Last column: tensor([1., 1., 1., 1.]) tensor([[1., 0., 1., 1.], [1., 0., 1., 1.], [1., 0., 1., 1.], [1., 0., 1., 1.]])
In [155]:
# 连接张量,您可以使用torch.cat沿给定维度连接一系列张量
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.], [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.], [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.], [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])
In [156]:
# 算术运算
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
# ``tensor.T`` returns the transpose of a tensor
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)
# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
Out[156]:
tensor([[1., 0., 1., 1.], [1., 0., 1., 1.], [1., 0., 1., 1.], [1., 0., 1., 1.]])
与NumPy桥接¶
In [157]:
# 张量转为NumPy数组
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")
t: tensor([1., 1., 1., 1., 1.]) n: [1. 1. 1. 1. 1.]
In [158]:
# NumPy 数组转张量
n = np.ones(5)
t = torch.from_numpy(n)
数据集和数据加载器¶
- 使用以下参数加载
FashionMNIST
数据集:root
是存储训练/测试数据的路径,train
指定训练或测试数据集,download=True
如果目录中没有数据,则从互联网上下载数据root
。transform
并target_transform
指定特征和标签转换
In [159]:
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
print(type(training_data.data), len(training_data), len(test_data))
<class 'torch.Tensor'> 60000 10000
In [160]:
## 迭代并可视化数据集
labels_map = {
0: "T-Shirt", 1: "Trouser", 2: "Pullover", 3: "Dress", 4: "Coat",
5: "Sandal", 6: "Shirt", 7: "Sneaker", 8: "Bag", 9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
# 将单元素张量的值提取出来(标量)
sample_idx = torch.randint(len(training_data), size=(1,)).item()
img, label = training_data[sample_idx]
figure.add_subplot(rows, cols, i)
plt.title(labels_map[label])
plt.axis("off")
# 用于 移除张量中维度为1的所有轴(即单维度)
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
自定义 Dataset 类必须实现三个函数:__init__
、__len__
和__getitem__
。¶
__init__
:实例化 Dataset 对象时,会运行一次 init 函数__len__
:__len__
函数返回数据集中的样本数量。__getitem__
:__getitem__
函数加载并返回给定索引处数据集中的样本idx
。
In [161]:
import torch
from torch.utils.data import Dataset, DataLoader
import pandas as pd
class CustomDataset(Dataset):
def __init__(self, file_path):
# 读取 CSV 文件
self.data = pd.read_csv(file_path)
# 获取特征和标签
self.features = self.data.iloc[:, :-1].values # 所有列,除了最后一列
self.labels = self.data.iloc[:, -1].values # 最后一列作为标签
# 转换为 PyTorch 张量
self.features = torch.tensor(self.features, dtype=torch.float32)
self.labels = torch.tensor(self.labels, dtype=torch.long)
def __len__(self):
return len(self.features)
def __getitem__(self, idx):
# 给定索引,返回一个样本和标签
return self.features[idx], self.labels[idx]
In [162]:
# 示例:加载数据
dataset = CustomDataset('dataset_test.csv')
# 使用 DataLoader 加载数据
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)
# 通过dataloader访问数据
for features, labels in dataloader:
print(features)
print(labels)
break
print(dataset[0])
tensor([[2., 3., 4.], [1., 2., 3.], [3., 4., 5.]]) tensor([1, 0, 0]) (tensor([1., 2., 3.]), tensor(0))
使用 DataLoaders 准备训练数据¶
- 我们通常希望以“小批量”传递样本,在每个时期重新调整数据以减少模型过度拟合。
- DataLoader是一个可迭代对象
In [163]:
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
In [ ]:
# 遍历 DataLoader
# 我们已将该数据集加载到中DataLoader,并且可以根据需要迭代数据集。
# 下面的每次迭代都会返回一批train_features和train_labels(分别包含batch_size=64特征和标签)。
# 因为我们指定了shuffle=True,所以在迭代所有批次后,数据会被打乱
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.title(str(label.item()))
plt.show()
Feature batch shape: torch.Size([64, 1, 28, 28]) Labels batch shape: torch.Size([64])
变换¶
- 所有 TorchVision 数据集都有两个参数 -transform修改特征和 target_transform修改标签
In [ ]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
ds = datasets.FashionMNIST(
root="data",
train=True,
download=True,
# ToTensor 将 PIL 图像或 NumPy 转换ndarray为FloatTensor.
# 并将图像的像素强度值缩放到范围 [0., 1.] 内
transform=ToTensor(),
# Lambda 变换可应用任何用户定义的 lambda 函数。
# 在这里,我们定义一个函数将整数转换为独热编码张量。
# 它首先创建一个大小为 10(我们数据集中的标签数量)的零张量,
# 然后调用 scatter_,该函数将value=1赋值给索引y的位置。
target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))
)
In [166]:
dsloader =torch.utils.data.DataLoader(ds, batch_size=1, shuffle=True)
data_iter = iter(dsloader)
images, labels = next(data_iter)
print(labels, images)
tensor([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]]) tensor([[[[0.0000, 0.0000, 0.0000, 0.0078, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0471, 0.1137, 0.2078, 0.1804, 0.5882, 0.4824, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0235, 0.0824, 0.4549, 0.3882, 0.2824, 0.6078, 0.6314, 0.4980, 0.0000, 0.0549, 0.0235, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0980, 0.2000, 0.2078, 0.2235, 0.2471, 0.4824, 0.0000, 1.0000, 0.4314, 0.0000, 0.0314, 0.1216, 0.1137, 0.0902, 0.0902, 0.0471, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.1882, 0.1333, 0.0980, 0.1490, 0.0824, 0.1490, 0.0000, 0.9647, 0.2549, 0.0000, 0.0549, 0.0314, 0.0667, 0.0471, 0.1137, 0.1137, 0.0392, 0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1216, 0.1412, 0.1137, 0.1059, 0.1216, 0.1059, 0.1333, 0.0745, 0.0549, 0.0824, 0.0078, 0.0392, 0.0392, 0.0745, 0.0078, 0.0549, 0.0667, 0.0392, 0.0392, 0.0157, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2314, 0.1059, 0.0902, 0.1216, 0.1137, 0.0980, 0.1059, 0.0980, 0.0392, 0.2980, 0.0157, 0.0392, 0.0000, 0.0392, 0.0745, 0.0471, 0.0157, 0.0314, 0.0392, 0.0157, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0157, 0.1725, 0.2314, 0.0980, 0.1333, 0.0980, 0.1137, 0.1059, 0.0667, 0.0667, 0.3059, 0.0000, 0.0471, 0.0392, 0.0824, 0.1490, 0.0000, 0.0549, 0.0392, 0.0471, 0.0235, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0902, 0.0824, 0.2078, 0.2980, 0.0824, 0.1216, 0.1059, 0.1137, 0.0314, 0.0980, 0.1490, 0.0078, 0.0902, 0.0314, 0.0314, 0.0314, 0.0235, 0.0157, 0.0392, 0.0471, 0.0314, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.1137, 0.0745, 0.1333, 0.5137, 0.0471, 0.1412, 0.0745, 0.0667, 0.0392, 0.0824, 0.1490, 0.0392, 0.0549, 0.0392, 0.0471, 0.0078, 0.0078, 0.0392, 0.1412, 0.0314, 0.0549, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.1059, 0.0667, 0.0471, 0.5137, 0.0824, 0.1725, 0.0745, 0.0745, 0.0549, 0.0902, 0.1216, 0.0157, 0.0392, 0.0314, 0.0157, 0.0157, 0.0157, 0.0902, 0.1490, 0.0157, 0.0235, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.1333, 0.0745, 0.0000, 0.6157, 0.1804, 0.1490, 0.0745, 0.0745, 0.0667, 0.1137, 0.1059, 0.0078, 0.0392, 0.0392, 0.0157, 0.0235, 0.0000, 0.1216, 0.1804, 0.0157, 0.0235, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0235, 0.1647, 0.0902, 0.0000, 0.7333, 0.2980, 0.1137, 0.0667, 0.0667, 0.0745, 0.1137, 0.1137, 0.0157, 0.0471, 0.0549, 0.0314, 0.0314, 0.0000, 0.1216, 0.2157, 0.0157, 0.0314, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0235, 0.1412, 0.0824, 0.0000, 0.5725, 0.2549, 0.1412, 0.0667, 0.0667, 0.0824, 0.1216, 0.1490, 0.0157, 0.0314, 0.0314, 0.0235, 0.0235, 0.0000, 0.1216, 0.2314, 0.0157, 0.0314, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0392, 0.1216, 0.0549, 0.0667, 0.2824, 0.0902, 0.2078, 0.0549, 0.0980, 0.0667, 0.1137, 0.1490, 0.0157, 0.0314, 0.0392, 0.0314, 0.0235, 0.0000, 0.0745, 0.2000, 0.0392, 0.0392, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0471, 0.1137, 0.0667, 0.1216, 0.1490, 0.1412, 0.2000, 0.0471, 0.1137, 0.0392, 0.0902, 0.1490, 0.0157, 0.0314, 0.0392, 0.0471, 0.0157, 0.0000, 0.0235, 0.1804, 0.0824, 0.0471, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0824, 0.1137, 0.0549, 0.1569, 0.0667, 0.1490, 0.1725, 0.0471, 0.1490, 0.0471, 0.1059, 0.1490, 0.0157, 0.0392, 0.0314, 0.0549, 0.0235, 0.0078, 0.0000, 0.1569, 0.1059, 0.0667, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0980, 0.1059, 0.0745, 0.1216, 0.0000, 0.0980, 0.1490, 0.0824, 0.1725, 0.0078, 0.1804, 0.1725, 0.0078, 0.0549, 0.0078, 0.0549, 0.0471, 0.0157, 0.0000, 0.1059, 0.1569, 0.0902, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0902, 0.0902, 0.0745, 0.0902, 0.0000, 0.0980, 0.1412, 0.1333, 0.1569, 0.0078, 0.1216, 0.1490, 0.0078, 0.0549, 0.0157, 0.0471, 0.0392, 0.0235, 0.0000, 0.0745, 0.1725, 0.0824, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1137, 0.0980, 0.0824, 0.0980, 0.0000, 0.0902, 0.1647, 0.1412, 0.1216, 0.0157, 0.1137, 0.1333, 0.0078, 0.0549, 0.0235, 0.0392, 0.0314, 0.0235, 0.0000, 0.0549, 0.1804, 0.0824, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1333, 0.1137, 0.0902, 0.0667, 0.0000, 0.0980, 0.1490, 0.1137, 0.1569, 0.0000, 0.2902, 0.1647, 0.0000, 0.0549, 0.0471, 0.0314, 0.0314, 0.0314, 0.0000, 0.0314, 0.2078, 0.0824, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1412, 0.1059, 0.0980, 0.0235, 0.0000, 0.1333, 0.1490, 0.1216, 0.1569, 0.0000, 0.2471, 0.2314, 0.0000, 0.0235, 0.0667, 0.0471, 0.0314, 0.0392, 0.0000, 0.0000, 0.2392, 0.0667, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1333, 0.1137, 0.0902, 0.0000, 0.0000, 0.1490, 0.1804, 0.1412, 0.1137, 0.0000, 0.2000, 0.2549, 0.0157, 0.0157, 0.0392, 0.0392, 0.0392, 0.0549, 0.0000, 0.0000, 0.2314, 0.0980, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1333, 0.1216, 0.0824, 0.0000, 0.0000, 0.1804, 0.2000, 0.1647, 0.1137, 0.0078, 0.3137, 0.1804, 0.0314, 0.0392, 0.0471, 0.0471, 0.0314, 0.0667, 0.0235, 0.0000, 0.1725, 0.1569, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1333, 0.1333, 0.0667, 0.0000, 0.0157, 0.2549, 0.1569, 0.1569, 0.1333, 0.0078, 0.3882, 0.1137, 0.0235, 0.0471, 0.0392, 0.0314, 0.0314, 0.0549, 0.0314, 0.0000, 0.1137, 0.1647, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1137, 0.1216, 0.0667, 0.0000, 0.0549, 0.2471, 0.1647, 0.1569, 0.1333, 0.0157, 0.3216, 0.2157, 0.0471, 0.0392, 0.0549, 0.0314, 0.0392, 0.0549, 0.0667, 0.0000, 0.1059, 0.2078, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1725, 0.1647, 0.0471, 0.0000, 0.0824, 0.3216, 0.1647, 0.1490, 0.1333, 0.0235, 0.2392, 0.2902, 0.0745, 0.0157, 0.0471, 0.0549, 0.0549, 0.0667, 0.0902, 0.0000, 0.0824, 0.2392, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.1216, 0.1216, 0.0471, 0.0000, 0.0000, 0.0824, 0.2392, 0.2235, 0.1569, 0.0667, 0.1569, 0.3333, 0.1216, 0.0471, 0.0745, 0.0745, 0.0667, 0.0314, 0.0235, 0.0000, 0.0667, 0.2902, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0078, 0.0980, 0.0745, 0.0745, 0.1216, 0.2745, 0.1333, 0.0471, 0.0549, 0.0235, 0.0078, 0.0000, 0.0000, 0.0000, 0.0000, 0.0078, 0.0000, 0.0000, 0.0000]]]])
建立神经网络¶
- 神经网络由对数据执行操作的层/模块组成。torch.nn命名空间提供了构建自己的神经网络所需的所有构建块
In [167]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# 获取训练设备
device = (
"cuda"
if torch.cuda.is_available()
else "mps"
if torch.backends.mps.is_available()
else "cpu"
)
print(f"Using {device} device")
Using cpu device
In [168]:
# 定义类,我们通过子类化来定义我们的神经网络nn.Module
class NeuralNetwork(nn.Module):
# 初始化神经网络层__init__
def __init__(self):
super().__init__()
# nn.Flatten: 将每个 2D 28x28 图像转换为 1*784 个像素值的连续数组
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
# 每个nn.Module子类都在方法中实现对输入数据的操作forward。
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
In [169]:
model = NeuralNetwork().to(device)
print(model)
NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) ) )
In [170]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax()
print(f"Predicted class: {y_pred}")
Predicted class: 1
In [171]:
# nn.layer
# nn.Flatten: 将每个 2D 28x28 图像转换为 784 个像素值的连续数组
input_image = torch.rand(3,28,28)
print(input_image.size())
flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
torch.Size([3, 28, 28]) torch.Size([3, 784])
In [172]:
# nn.Linear
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
torch.Size([3, 20])
In [173]:
# 激活函数nn.ReLU()
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
Before ReLU: tensor([[-0.0352, 0.3687, -0.0205, -0.8129, -0.1142, -0.1906, 0.1555, -0.1771, 0.1998, -0.8133, -0.4845, -0.1326, -0.2223, -0.0382, 0.1887, 0.3767, -0.1302, 0.1470, -0.3160, 0.5849], [ 0.0620, 0.1493, 0.0075, -0.7920, -0.5060, 0.2113, -0.1387, -0.2204, -0.1517, -0.6394, 0.0995, -0.0873, 0.0126, 0.0259, -0.1149, 0.3250, 0.0067, 0.3314, -0.1100, 0.3132], [-0.1156, 0.1825, 0.3135, -0.3895, -0.5712, -0.0657, 0.3998, -0.0390, 0.0115, -0.7960, -0.3482, -0.1874, -0.0615, 0.0939, 0.0515, 0.3261, 0.0203, 0.0515, -0.1541, 0.2378]], grad_fn=<AddmmBackward0>) After ReLU: tensor([[0.0000, 0.3687, 0.0000, 0.0000, 0.0000, 0.0000, 0.1555, 0.0000, 0.1998, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1887, 0.3767, 0.0000, 0.1470, 0.0000, 0.5849], [0.0620, 0.1493, 0.0075, 0.0000, 0.0000, 0.2113, 0.0000, 0.0000, 0.0000, 0.0000, 0.0995, 0.0000, 0.0126, 0.0259, 0.0000, 0.3250, 0.0067, 0.3314, 0.0000, 0.3132], [0.0000, 0.1825, 0.3135, 0.0000, 0.0000, 0.0000, 0.3998, 0.0000, 0.0115, 0.0000, 0.0000, 0.0000, 0.0000, 0.0939, 0.0515, 0.3261, 0.0203, 0.0515, 0.0000, 0.2378]], grad_fn=<ReluBackward0>)
In [174]:
# nn.Sequential()
seq_modules = nn.Sequential(
flatten,
layer1,
nn.ReLU(),
nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
print(logits.shape)
torch.Size([3, 10])
In [175]:
# 模型参数
# 神经网络中的许多层都是参数化的,即具有在训练期间优化的相关权重和偏差。
# 子类化nn.Module会自动跟踪模型对象中定义的所有字段,并使所有参数都可使用模型parameters()或named_parameters()方法访问。
print(f"Model structure: {model}\n\n")
for name, param in model.named_parameters():
print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
Model structure: NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) ) ) Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0108, 0.0289, 0.0215, ..., -0.0038, 0.0029, 0.0062], [-0.0105, -0.0167, -0.0166, ..., -0.0103, -0.0068, -0.0346]], grad_fn=<SliceBackward0>) Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0354, -0.0114], grad_fn=<SliceBackward0>) Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0196, -0.0271, 0.0337, ..., -0.0114, -0.0305, 0.0041], [-0.0239, 0.0228, 0.0216, ..., 0.0061, 0.0115, 0.0421]], grad_fn=<SliceBackward0>) Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0142, -0.0071], grad_fn=<SliceBackward0>) Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0285, 0.0152, 0.0152, ..., -0.0055, 0.0204, 0.0295], [ 0.0299, -0.0214, -0.0198, ..., -0.0019, -0.0048, -0.0259]], grad_fn=<SliceBackward0>) Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([-0.0437, 0.0105], grad_fn=<SliceBackward0>)
自动微分torch.autograd¶
- 在训练神经网络时,最常用的算法是 反向传播。在该算法中,根据损失函数关于给定参数的梯度来调整参数(模型权重) 。
- 为了计算这些梯度,PyTorch 有一个内置的微分引擎,称为torch.autograd。
In [176]:
import torch
x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
In [177]:
## 计算梯度
# 为了计算这些导数,我们调用loss.backward(),
# 然后检索值:w.grad, b.grad
loss.backward()
print(w.grad)
print(b.grad)
tensor([[0.1218, 0.2512, 0.0263], [0.1218, 0.2512, 0.0263], [0.1218, 0.2512, 0.0263], [0.1218, 0.2512, 0.0263], [0.1218, 0.2512, 0.0263]]) tensor([0.1218, 0.2512, 0.0263])
In [178]:
# 禁用梯度追踪
# 但是,有些情况下我们不需要这样做,
# 例如,当我们已经训练了模型并只想将其应用于一些输入数据时,即我们只想通过网络进行前向计算。
# 我们可以通过在计算代码周围添加 torch.no_grad()块来停止跟踪计算:
z = torch.matmul(x, w)+b
print(z.requires_grad)
with torch.no_grad():
z = torch.matmul(x, w)+b
print(z.requires_grad)
True False
In [179]:
# 实现相同结果的另一种方法是使用detach()张量上的方法:
z = torch.matmul(x, w)+b
z = z.detach()
print(z.requires_grad)
False
优化模型参数¶
In [180]:
# 先决条件代码
# 我们从前面关于数据集和数据加载器 以及构建模型的部分加载代码。
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
In [181]:
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
超参数¶
- 超参数是可调整的参数,可让您控制模型优化过程。不同的超参数值会影响模型训练和收敛速度
- 我们为训练定义以下超参数:
- 迭代次数——对数据集进行迭代的次数
- 批次大小- 参数更新之前通过网络传播的数据样本数量
- 学习率- 每次批次/时期更新模型参数的程度。较小的值会导致学习速度变慢,而较大的值可能会导致训练期间出现不可预测的行为。
In [182]:
learning_rate = 1e-3
batch_size = 64
epochs = 5
损失函数¶
- 当提供一些训练数据时,未经训练的网络很可能不会给出正确答案。
- 损失函数测量所得结果与目标值的差异程度,而这正是我们在训练期间想要最小化的损失函数。
- 为了计算损失,我们使用给定数据样本的输入进行预测,并将其与真实数据标签值进行比较。
In [183]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()
优化器¶
- 优化是调整模型参数以减少每个训练步骤中的模型误差的过程。
- 优化算法定义了如何执行此过程(在此示例中,我们使用随机梯度下降)。
- 所有优化逻辑都封装在optimizer对象中。在这里,我们使用 SGD 优化器;
- 此外,PyTorch 中还有许多不同的优化器, 例如 ADAM 和 RMSProp,它们更适合不同类型的模型和数据。
In [184]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
- 我们定义train_loop循环优化代码
In [185]:
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
# Set the model to training mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.train()
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
loss.backward() # 计算梯度
optimizer.step() # 更新参数
optimizer.zero_grad() # 清空梯度
if batch % 100 == 0:
loss, current = loss.item(), batch * batch_size + len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
- 并test_loop根据测试数据评估模型的性能。
In [186]:
def test_loop(dataloader, model, loss_fn):
# Set the model to evaluation mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.eval()
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
In [187]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1 ------------------------------- loss: 2.308522 [ 64/60000] loss: 2.295923 [ 6464/60000] loss: 2.268579 [12864/60000] loss: 2.270369 [19264/60000] loss: 2.253008 [25664/60000] loss: 2.225763 [32064/60000] loss: 2.237298 [38464/60000] loss: 2.199639 [44864/60000] loss: 2.203797 [51264/60000] loss: 2.180187 [57664/60000] Test Error: Accuracy: 44.2%, Avg loss: 2.164722 Epoch 2 ------------------------------- loss: 2.179605 [ 64/60000] loss: 2.172289 [ 6464/60000] loss: 2.107276 [12864/60000] loss: 2.126005 [19264/60000] loss: 2.080749 [25664/60000] loss: 2.023450 [32064/60000] loss: 2.055069 [38464/60000] loss: 1.974781 [44864/60000] loss: 1.986132 [51264/60000] loss: 1.924118 [57664/60000] Test Error: Accuracy: 48.6%, Avg loss: 1.912762 Epoch 3 ------------------------------- loss: 1.950278 [ 64/60000] loss: 1.927295 [ 6464/60000] loss: 1.799702 [12864/60000] loss: 1.842861 [19264/60000] loss: 1.737014 [25664/60000] loss: 1.688945 [32064/60000] loss: 1.715796 [38464/60000] loss: 1.612278 [44864/60000] loss: 1.647766 [51264/60000] loss: 1.544659 [57664/60000] Test Error: Accuracy: 57.3%, Avg loss: 1.554464 Epoch 4 ------------------------------- loss: 1.622964 [ 64/60000] loss: 1.594595 [ 6464/60000] loss: 1.431038 [12864/60000] loss: 1.506117 [19264/60000] loss: 1.386950 [25664/60000] loss: 1.381334 [32064/60000] loss: 1.393124 [38464/60000] loss: 1.318907 [44864/60000] loss: 1.363219 [51264/60000] loss: 1.254118 [57664/60000] Test Error: Accuracy: 61.9%, Avg loss: 1.284583 Epoch 5 ------------------------------- loss: 1.362799 [ 64/60000] loss: 1.350789 [ 6464/60000] loss: 1.173308 [12864/60000] loss: 1.279025 [19264/60000] loss: 1.154825 [25664/60000] loss: 1.181451 [32064/60000] loss: 1.192610 [38464/60000] loss: 1.137865 [44864/60000] loss: 1.185223 [51264/60000] loss: 1.086521 [57664/60000] Test Error: Accuracy: 63.6%, Avg loss: 1.116825 Epoch 6 ------------------------------- loss: 1.188814 [ 64/60000] loss: 1.195889 [ 6464/60000] loss: 1.002661 [12864/60000] loss: 1.137915 [19264/60000] loss: 1.010257 [25664/60000] loss: 1.046437 [32064/60000] loss: 1.069733 [38464/60000] loss: 1.022061 [44864/60000] loss: 1.069942 [51264/60000] loss: 0.982505 [57664/60000] Test Error: Accuracy: 65.0%, Avg loss: 1.008274 Epoch 7 ------------------------------- loss: 1.067001 [ 64/60000] loss: 1.094617 [ 6464/60000] loss: 0.884778 [12864/60000] loss: 1.043996 [19264/60000] loss: 0.918539 [25664/60000] loss: 0.949302 [32064/60000] loss: 0.989025 [38464/60000] loss: 0.944871 [44864/60000] loss: 0.988496 [51264/60000] loss: 0.912483 [57664/60000] Test Error: Accuracy: 66.2%, Avg loss: 0.933203 Epoch 8 ------------------------------- loss: 0.976123 [ 64/60000] loss: 1.022986 [ 6464/60000] loss: 0.799399 [12864/60000] loss: 0.976859 [19264/60000] loss: 0.856569 [25664/60000] loss: 0.876091 [32064/60000] loss: 0.931678 [38464/60000] loss: 0.891402 [44864/60000] loss: 0.927947 [51264/60000] loss: 0.861693 [57664/60000] Test Error: Accuracy: 67.3%, Avg loss: 0.878109 Epoch 9 ------------------------------- loss: 0.905446 [ 64/60000] loss: 0.968479 [ 6464/60000] loss: 0.734980 [12864/60000] loss: 0.926013 [19264/60000] loss: 0.812019 [25664/60000] loss: 0.819447 [32064/60000] loss: 0.888065 [38464/60000] loss: 0.853141 [44864/60000] loss: 0.881495 [51264/60000] loss: 0.822642 [57664/60000] Test Error: Accuracy: 68.5%, Avg loss: 0.835817 Epoch 10 ------------------------------- loss: 0.848743 [ 64/60000] loss: 0.924670 [ 6464/60000] loss: 0.684697 [12864/60000] loss: 0.886120 [19264/60000] loss: 0.778283 [25664/60000] loss: 0.775283 [32064/60000] loss: 0.853050 [38464/60000] loss: 0.824834 [44864/60000] loss: 0.844829 [51264/60000] loss: 0.791272 [57664/60000] Test Error: Accuracy: 70.0%, Avg loss: 0.802104 Done!
保存并加载模型¶
- 加载模型时需要定义模型类,这是因为模型的权重依赖于模型的结构。
In [188]:
torch.save(model, 'model.pth')
In [189]:
model = torch.load('model.pth', weights_only=False)
model
Out[189]:
NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) ) )
Python数据处理 14. PyTorch 主讲人:丁平尖