Python数据处理
¶

14. PyTorch
¶

主讲人:丁平尖

PyTorch¶

  • 开源的深度学习框架
    • 由 Facebook 在 2016 年发布
  • https://pytorch.org/
  • 教程
    • https://pytorch.org/tutorials/beginner/basics/intro.html

PyTorch安装¶

  • CPU版本

image.png

PyTorch安装¶

  • CUDA版本
  • https://cloud.tencent.com/developer/article/2158333
  • cuDNN (CUDA Deep Neural Network Library) 是由 NVIDIA 开发的一个 GPU 加速库,专门为深度学习的计算需求而设计。

张量¶

  • 张量是一种特殊的数据结构,与数组和矩阵非常相似。

image.png

张量¶

In [151]:
import torch
import numpy as np
In [152]:
# 初始化张量
# 直接来自数据
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
# 来自NumPy数组
np_array = np.array(data)
# 生成的 Tensor 和原始的 NumPy 数组共享相同的内存空间。
x_np = torch.from_numpy(np_array)
# 使用随机或恒定值
shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
In [153]:
# 张量的属性
tensor = torch.rand(3, 4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

张量上的运算¶

In [154]:
# We move our tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to('cuda')
# 标准的 numpy 式索引和切片:
tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:,1] = 0
print(tensor)
First row: tensor([1., 1., 1., 1.])
First column: tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])
In [155]:
# 连接张量,您可以使用torch.cat沿给定维度连接一系列张量
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])
In [156]:
# 算术运算
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
# ``tensor.T`` returns the transpose of a tensor
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
Out[156]:
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

与NumPy桥接¶

In [157]:
# 张量转为NumPy数组
t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")
t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]
In [158]:
# NumPy 数组转张量
n = np.ones(5)
t = torch.from_numpy(n)

数据集和数据加载器¶

  • 使用以下参数加载FashionMNIST数据集:
    • root是存储训练/测试数据的路径,
    • train指定训练或测试数据集,
    • download=True如果目录中没有数据,则从互联网上下载数据root。
    • transform并target_transform指定特征和标签转换
In [159]:
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)
print(type(training_data.data), len(training_data), len(test_data))
<class 'torch.Tensor'> 60000 10000
In [160]:
## 迭代并可视化数据集
labels_map = {
    0: "T-Shirt", 1: "Trouser", 2: "Pullover", 3: "Dress", 4: "Coat", 
    5: "Sandal", 6: "Shirt", 7: "Sneaker", 8: "Bag", 9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    # 将单元素张量的值提取出来(标量)
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(labels_map[label])
    plt.axis("off")
    # 用于 移除张量中维度为1的所有轴(即单维度)
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()
No description has been provided for this image

自定义 Dataset 类必须实现三个函数:__init__、__len__和__getitem__。¶

  • __init__:实例化 Dataset 对象时,会运行一次 init 函数
  • __len__:__len__ 函数返回数据集中的样本数量。
  • __getitem__:__getitem__ 函数加载并返回给定索引处数据集中的样本idx。
In [161]:
import torch
from torch.utils.data import Dataset, DataLoader
import pandas as pd

class CustomDataset(Dataset):
    def __init__(self, file_path):
        # 读取 CSV 文件
        self.data = pd.read_csv(file_path)
        # 获取特征和标签
        self.features = self.data.iloc[:, :-1].values  # 所有列,除了最后一列
        self.labels = self.data.iloc[:, -1].values  # 最后一列作为标签        
        # 转换为 PyTorch 张量
        self.features = torch.tensor(self.features, dtype=torch.float32)
        self.labels = torch.tensor(self.labels, dtype=torch.long)
    
    def __len__(self):
        return len(self.features)
    
    def __getitem__(self, idx):
        # 给定索引,返回一个样本和标签
        return self.features[idx], self.labels[idx]
In [162]:
# 示例:加载数据
dataset = CustomDataset('dataset_test.csv')
# 使用 DataLoader 加载数据
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

# 通过dataloader访问数据
for features, labels in dataloader:
    print(features)
    print(labels)
    break

print(dataset[0])
tensor([[2., 3., 4.],
        [1., 2., 3.],
        [3., 4., 5.]])
tensor([1, 0, 0])
(tensor([1., 2., 3.]), tensor(0))

使用 DataLoaders 准备训练数据¶

  • 我们通常希望以“小批量”传递样本,在每个时期重新调整数据以减少模型过度拟合。
  • DataLoader是一个可迭代对象
In [163]:
from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
In [ ]:
# 遍历 DataLoader
# 我们已将该数据集加载到中DataLoader,并且可以根据需要迭代数据集。
# 下面的每次迭代都会返回一批train_features和train_labels(分别包含batch_size=64特征和标签)。
# 因为我们指定了shuffle=True,所以在迭代所有批次后,数据会被打乱
train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")
img = train_features[0].squeeze()
label = train_labels[0]
plt.imshow(img, cmap="gray")
plt.title(str(label.item()))
plt.show()
Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
No description has been provided for this image

变换¶

  • 所有 TorchVision 数据集都有两个参数 -transform修改特征和 target_transform修改标签
In [ ]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    # ToTensor 将 PIL 图像或 NumPy 转换ndarray为FloatTensor. 
    # 并将图像的像素强度值缩放到范围 [0., 1.] 内
    transform=ToTensor(),
    # Lambda 变换可应用任何用户定义的 lambda 函数。
    # 在这里,我们定义一个函数将整数转换为独热编码张量。
    # 它首先创建一个大小为 10(我们数据集中的标签数量)的零张量,
    # 然后调用 scatter_,该函数将value=1赋值给索引y的位置。
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))
)
In [166]:
dsloader =torch.utils.data.DataLoader(ds, batch_size=1, shuffle=True)
data_iter = iter(dsloader)
images, labels = next(data_iter)
print(labels, images)
tensor([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.]]) tensor([[[[0.0000, 0.0000, 0.0000, 0.0078, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000, 0.0471, 0.1137, 0.2078, 0.1804,
           0.5882, 0.4824, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0235, 0.0824, 0.4549, 0.3882, 0.2824, 0.6078, 0.6314,
           0.4980, 0.0000, 0.0549, 0.0235, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0980,
           0.2000, 0.2078, 0.2235, 0.2471, 0.4824, 0.0000, 1.0000, 0.4314,
           0.0000, 0.0314, 0.1216, 0.1137, 0.0902, 0.0902, 0.0471, 0.0000,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.1882,
           0.1333, 0.0980, 0.1490, 0.0824, 0.1490, 0.0000, 0.9647, 0.2549,
           0.0000, 0.0549, 0.0314, 0.0667, 0.0471, 0.1137, 0.1137, 0.0392,
           0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1216, 0.1412, 0.1137,
           0.1059, 0.1216, 0.1059, 0.1333, 0.0745, 0.0549, 0.0824, 0.0078,
           0.0392, 0.0392, 0.0745, 0.0078, 0.0549, 0.0667, 0.0392, 0.0392,
           0.0157, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2314, 0.1059, 0.0902,
           0.1216, 0.1137, 0.0980, 0.1059, 0.0980, 0.0392, 0.2980, 0.0157,
           0.0392, 0.0000, 0.0392, 0.0745, 0.0471, 0.0157, 0.0314, 0.0392,
           0.0157, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0157, 0.1725, 0.2314, 0.0980,
           0.1333, 0.0980, 0.1137, 0.1059, 0.0667, 0.0667, 0.3059, 0.0000,
           0.0471, 0.0392, 0.0824, 0.1490, 0.0000, 0.0549, 0.0392, 0.0471,
           0.0235, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0902, 0.0824, 0.2078, 0.2980,
           0.0824, 0.1216, 0.1059, 0.1137, 0.0314, 0.0980, 0.1490, 0.0078,
           0.0902, 0.0314, 0.0314, 0.0314, 0.0235, 0.0157, 0.0392, 0.0471,
           0.0314, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.1137, 0.0745, 0.1333, 0.5137,
           0.0471, 0.1412, 0.0745, 0.0667, 0.0392, 0.0824, 0.1490, 0.0392,
           0.0549, 0.0392, 0.0471, 0.0078, 0.0078, 0.0392, 0.1412, 0.0314,
           0.0549, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.1059, 0.0667, 0.0471, 0.5137,
           0.0824, 0.1725, 0.0745, 0.0745, 0.0549, 0.0902, 0.1216, 0.0157,
           0.0392, 0.0314, 0.0157, 0.0157, 0.0157, 0.0902, 0.1490, 0.0157,
           0.0235, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.1333, 0.0745, 0.0000, 0.6157,
           0.1804, 0.1490, 0.0745, 0.0745, 0.0667, 0.1137, 0.1059, 0.0078,
           0.0392, 0.0392, 0.0157, 0.0235, 0.0000, 0.1216, 0.1804, 0.0157,
           0.0235, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0235, 0.1647, 0.0902, 0.0000, 0.7333,
           0.2980, 0.1137, 0.0667, 0.0667, 0.0745, 0.1137, 0.1137, 0.0157,
           0.0471, 0.0549, 0.0314, 0.0314, 0.0000, 0.1216, 0.2157, 0.0157,
           0.0314, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0235, 0.1412, 0.0824, 0.0000, 0.5725,
           0.2549, 0.1412, 0.0667, 0.0667, 0.0824, 0.1216, 0.1490, 0.0157,
           0.0314, 0.0314, 0.0235, 0.0235, 0.0000, 0.1216, 0.2314, 0.0157,
           0.0314, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0392, 0.1216, 0.0549, 0.0667, 0.2824,
           0.0902, 0.2078, 0.0549, 0.0980, 0.0667, 0.1137, 0.1490, 0.0157,
           0.0314, 0.0392, 0.0314, 0.0235, 0.0000, 0.0745, 0.2000, 0.0392,
           0.0392, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0471, 0.1137, 0.0667, 0.1216, 0.1490,
           0.1412, 0.2000, 0.0471, 0.1137, 0.0392, 0.0902, 0.1490, 0.0157,
           0.0314, 0.0392, 0.0471, 0.0157, 0.0000, 0.0235, 0.1804, 0.0824,
           0.0471, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0824, 0.1137, 0.0549, 0.1569, 0.0667,
           0.1490, 0.1725, 0.0471, 0.1490, 0.0471, 0.1059, 0.1490, 0.0157,
           0.0392, 0.0314, 0.0549, 0.0235, 0.0078, 0.0000, 0.1569, 0.1059,
           0.0667, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0980, 0.1059, 0.0745, 0.1216, 0.0000,
           0.0980, 0.1490, 0.0824, 0.1725, 0.0078, 0.1804, 0.1725, 0.0078,
           0.0549, 0.0078, 0.0549, 0.0471, 0.0157, 0.0000, 0.1059, 0.1569,
           0.0902, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0902, 0.0902, 0.0745, 0.0902, 0.0000,
           0.0980, 0.1412, 0.1333, 0.1569, 0.0078, 0.1216, 0.1490, 0.0078,
           0.0549, 0.0157, 0.0471, 0.0392, 0.0235, 0.0000, 0.0745, 0.1725,
           0.0824, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1137, 0.0980, 0.0824, 0.0980, 0.0000,
           0.0902, 0.1647, 0.1412, 0.1216, 0.0157, 0.1137, 0.1333, 0.0078,
           0.0549, 0.0235, 0.0392, 0.0314, 0.0235, 0.0000, 0.0549, 0.1804,
           0.0824, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1333, 0.1137, 0.0902, 0.0667, 0.0000,
           0.0980, 0.1490, 0.1137, 0.1569, 0.0000, 0.2902, 0.1647, 0.0000,
           0.0549, 0.0471, 0.0314, 0.0314, 0.0314, 0.0000, 0.0314, 0.2078,
           0.0824, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1412, 0.1059, 0.0980, 0.0235, 0.0000,
           0.1333, 0.1490, 0.1216, 0.1569, 0.0000, 0.2471, 0.2314, 0.0000,
           0.0235, 0.0667, 0.0471, 0.0314, 0.0392, 0.0000, 0.0000, 0.2392,
           0.0667, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1333, 0.1137, 0.0902, 0.0000, 0.0000,
           0.1490, 0.1804, 0.1412, 0.1137, 0.0000, 0.2000, 0.2549, 0.0157,
           0.0157, 0.0392, 0.0392, 0.0392, 0.0549, 0.0000, 0.0000, 0.2314,
           0.0980, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1333, 0.1216, 0.0824, 0.0000, 0.0000,
           0.1804, 0.2000, 0.1647, 0.1137, 0.0078, 0.3137, 0.1804, 0.0314,
           0.0392, 0.0471, 0.0471, 0.0314, 0.0667, 0.0235, 0.0000, 0.1725,
           0.1569, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1333, 0.1333, 0.0667, 0.0000, 0.0157,
           0.2549, 0.1569, 0.1569, 0.1333, 0.0078, 0.3882, 0.1137, 0.0235,
           0.0471, 0.0392, 0.0314, 0.0314, 0.0549, 0.0314, 0.0000, 0.1137,
           0.1647, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1137, 0.1216, 0.0667, 0.0000, 0.0549,
           0.2471, 0.1647, 0.1569, 0.1333, 0.0157, 0.3216, 0.2157, 0.0471,
           0.0392, 0.0549, 0.0314, 0.0392, 0.0549, 0.0667, 0.0000, 0.1059,
           0.2078, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1725, 0.1647, 0.0471, 0.0000, 0.0824,
           0.3216, 0.1647, 0.1490, 0.1333, 0.0235, 0.2392, 0.2902, 0.0745,
           0.0157, 0.0471, 0.0549, 0.0549, 0.0667, 0.0902, 0.0000, 0.0824,
           0.2392, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.1216, 0.1216, 0.0471, 0.0000, 0.0000,
           0.0824, 0.2392, 0.2235, 0.1569, 0.0667, 0.1569, 0.3333, 0.1216,
           0.0471, 0.0745, 0.0745, 0.0667, 0.0314, 0.0235, 0.0000, 0.0667,
           0.2902, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0000, 0.0078, 0.0980, 0.0745, 0.0745, 0.1216, 0.2745, 0.1333,
           0.0471, 0.0549, 0.0235, 0.0078, 0.0000, 0.0000, 0.0000, 0.0000,
           0.0078, 0.0000, 0.0000, 0.0000]]]])

建立神经网络¶

  • 神经网络由对数据执行操作的层/模块组成。torch.nn命名空间提供了构建自己的神经网络所需的所有构建块
In [167]:
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# 获取训练设备
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")
Using cpu device
In [168]:
# 定义类,我们通过子类化来定义我们的神经网络nn.Module
class NeuralNetwork(nn.Module):
    # 初始化神经网络层__init__
    def __init__(self):
        super().__init__()
        # nn.Flatten: 将每个 2D 28x28 图像转换为 1*784 个像素值的连续数组
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )
    # 每个nn.Module子类都在方法中实现对输入数据的操作forward。
    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits
In [169]:
model = NeuralNetwork().to(device)
print(model)
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
In [170]:
X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax()
print(f"Predicted class: {y_pred}")
Predicted class: 1
In [171]:
# nn.layer
# nn.Flatten: 将每个 2D 28x28 图像转换为 784 个像素值的连续数组
input_image = torch.rand(3,28,28)
print(input_image.size())

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())
torch.Size([3, 28, 28])
torch.Size([3, 784])
In [172]:
# nn.Linear
layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())
torch.Size([3, 20])
In [173]:
# 激活函数nn.ReLU()
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")
Before ReLU: tensor([[-0.0352,  0.3687, -0.0205, -0.8129, -0.1142, -0.1906,  0.1555, -0.1771,
          0.1998, -0.8133, -0.4845, -0.1326, -0.2223, -0.0382,  0.1887,  0.3767,
         -0.1302,  0.1470, -0.3160,  0.5849],
        [ 0.0620,  0.1493,  0.0075, -0.7920, -0.5060,  0.2113, -0.1387, -0.2204,
         -0.1517, -0.6394,  0.0995, -0.0873,  0.0126,  0.0259, -0.1149,  0.3250,
          0.0067,  0.3314, -0.1100,  0.3132],
        [-0.1156,  0.1825,  0.3135, -0.3895, -0.5712, -0.0657,  0.3998, -0.0390,
          0.0115, -0.7960, -0.3482, -0.1874, -0.0615,  0.0939,  0.0515,  0.3261,
          0.0203,  0.0515, -0.1541,  0.2378]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.3687, 0.0000, 0.0000, 0.0000, 0.0000, 0.1555, 0.0000, 0.1998,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.1887, 0.3767, 0.0000, 0.1470,
         0.0000, 0.5849],
        [0.0620, 0.1493, 0.0075, 0.0000, 0.0000, 0.2113, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0995, 0.0000, 0.0126, 0.0259, 0.0000, 0.3250, 0.0067, 0.3314,
         0.0000, 0.3132],
        [0.0000, 0.1825, 0.3135, 0.0000, 0.0000, 0.0000, 0.3998, 0.0000, 0.0115,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0939, 0.0515, 0.3261, 0.0203, 0.0515,
         0.0000, 0.2378]], grad_fn=<ReluBackward0>)
In [174]:
# nn.Sequential()
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
print(logits.shape)
torch.Size([3, 10])
In [175]:
# 模型参数
# 神经网络中的许多层都是参数化的,即具有在训练期间优化的相关权重和偏差。
# 子类化nn.Module会自动跟踪模型对象中定义的所有字段,并使所有参数都可使用模型parameters()或named_parameters()方法访问。
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0108,  0.0289,  0.0215,  ..., -0.0038,  0.0029,  0.0062],
        [-0.0105, -0.0167, -0.0166,  ..., -0.0103, -0.0068, -0.0346]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0354, -0.0114], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0196, -0.0271,  0.0337,  ..., -0.0114, -0.0305,  0.0041],
        [-0.0239,  0.0228,  0.0216,  ...,  0.0061,  0.0115,  0.0421]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0142, -0.0071], grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0285,  0.0152,  0.0152,  ..., -0.0055,  0.0204,  0.0295],
        [ 0.0299, -0.0214, -0.0198,  ..., -0.0019, -0.0048, -0.0259]],
       grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([-0.0437,  0.0105], grad_fn=<SliceBackward0>) 

自动微分torch.autograd¶

  • 在训练神经网络时,最常用的算法是 反向传播。在该算法中,根据损失函数关于给定参数的梯度来调整参数(模型权重) 。
  • 为了计算这些梯度,PyTorch 有一个内置的微分引擎,称为torch.autograd。
In [176]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
In [177]:
## 计算梯度
# 为了计算这些导数,我们调用loss.backward(),
# 然后检索值:w.grad, b.grad
loss.backward()
print(w.grad)
print(b.grad)
tensor([[0.1218, 0.2512, 0.0263],
        [0.1218, 0.2512, 0.0263],
        [0.1218, 0.2512, 0.0263],
        [0.1218, 0.2512, 0.0263],
        [0.1218, 0.2512, 0.0263]])
tensor([0.1218, 0.2512, 0.0263])
In [178]:
# 禁用梯度追踪
# 但是,有些情况下我们不需要这样做,
# 例如,当我们已经训练了模型并只想将其应用于一些输入数据时,即我们只想通过网络进行前向计算。
# 我们可以通过在计算代码周围添加 torch.no_grad()块来停止跟踪计算:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)
True
False
In [179]:
# 实现相同结果的另一种方法是使用detach()张量上的方法:
z = torch.matmul(x, w)+b
z = z.detach()
print(z.requires_grad)
False

优化模型参数¶

In [180]:
# 先决条件代码
# 我们从前面关于数据集和数据加载器 以及构建模型的部分加载代码。
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)
In [181]:
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork()

超参数¶

  • 超参数是可调整的参数,可让您控制模型优化过程。不同的超参数值会影响模型训练和收敛速度
  • 我们为训练定义以下超参数:
    • 迭代次数——对数据集进行迭代的次数
    • 批次大小- 参数更新之前通过网络传播的数据样本数量
    • 学习率- 每次批次/时期更新模型参数的程度。较小的值会导致学习速度变慢,而较大的值可能会导致训练期间出现不可预测的行为。
In [182]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

损失函数¶

  • 当提供一些训练数据时,未经训练的网络很可能不会给出正确答案。
  • 损失函数测量所得结果与目标值的差异程度,而这正是我们在训练期间想要最小化的损失函数。
  • 为了计算损失,我们使用给定数据样本的输入进行预测,并将其与真实数据标签值进行比较。
In [183]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

优化器¶

  • 优化是调整模型参数以减少每个训练步骤中的模型误差的过程。
  • 优化算法定义了如何执行此过程(在此示例中,我们使用随机梯度下降)。
  • 所有优化逻辑都封装在optimizer对象中。在这里,我们使用 SGD 优化器;
  • 此外,PyTorch 中还有许多不同的优化器, 例如 ADAM 和 RMSProp,它们更适合不同类型的模型和数据。
In [184]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
  • 我们定义train_loop循环优化代码
In [185]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward() # 计算梯度
        optimizer.step() # 更新参数
        optimizer.zero_grad() # 清空梯度

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
  • 并test_loop根据测试数据评估模型的性能。
In [186]:
def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
In [187]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 2.308522  [   64/60000]
loss: 2.295923  [ 6464/60000]
loss: 2.268579  [12864/60000]
loss: 2.270369  [19264/60000]
loss: 2.253008  [25664/60000]
loss: 2.225763  [32064/60000]
loss: 2.237298  [38464/60000]
loss: 2.199639  [44864/60000]
loss: 2.203797  [51264/60000]
loss: 2.180187  [57664/60000]
Test Error: 
 Accuracy: 44.2%, Avg loss: 2.164722 

Epoch 2
-------------------------------
loss: 2.179605  [   64/60000]
loss: 2.172289  [ 6464/60000]
loss: 2.107276  [12864/60000]
loss: 2.126005  [19264/60000]
loss: 2.080749  [25664/60000]
loss: 2.023450  [32064/60000]
loss: 2.055069  [38464/60000]
loss: 1.974781  [44864/60000]
loss: 1.986132  [51264/60000]
loss: 1.924118  [57664/60000]
Test Error: 
 Accuracy: 48.6%, Avg loss: 1.912762 

Epoch 3
-------------------------------
loss: 1.950278  [   64/60000]
loss: 1.927295  [ 6464/60000]
loss: 1.799702  [12864/60000]
loss: 1.842861  [19264/60000]
loss: 1.737014  [25664/60000]
loss: 1.688945  [32064/60000]
loss: 1.715796  [38464/60000]
loss: 1.612278  [44864/60000]
loss: 1.647766  [51264/60000]
loss: 1.544659  [57664/60000]
Test Error: 
 Accuracy: 57.3%, Avg loss: 1.554464 

Epoch 4
-------------------------------
loss: 1.622964  [   64/60000]
loss: 1.594595  [ 6464/60000]
loss: 1.431038  [12864/60000]
loss: 1.506117  [19264/60000]
loss: 1.386950  [25664/60000]
loss: 1.381334  [32064/60000]
loss: 1.393124  [38464/60000]
loss: 1.318907  [44864/60000]
loss: 1.363219  [51264/60000]
loss: 1.254118  [57664/60000]
Test Error: 
 Accuracy: 61.9%, Avg loss: 1.284583 

Epoch 5
-------------------------------
loss: 1.362799  [   64/60000]
loss: 1.350789  [ 6464/60000]
loss: 1.173308  [12864/60000]
loss: 1.279025  [19264/60000]
loss: 1.154825  [25664/60000]
loss: 1.181451  [32064/60000]
loss: 1.192610  [38464/60000]
loss: 1.137865  [44864/60000]
loss: 1.185223  [51264/60000]
loss: 1.086521  [57664/60000]
Test Error: 
 Accuracy: 63.6%, Avg loss: 1.116825 

Epoch 6
-------------------------------
loss: 1.188814  [   64/60000]
loss: 1.195889  [ 6464/60000]
loss: 1.002661  [12864/60000]
loss: 1.137915  [19264/60000]
loss: 1.010257  [25664/60000]
loss: 1.046437  [32064/60000]
loss: 1.069733  [38464/60000]
loss: 1.022061  [44864/60000]
loss: 1.069942  [51264/60000]
loss: 0.982505  [57664/60000]
Test Error: 
 Accuracy: 65.0%, Avg loss: 1.008274 

Epoch 7
-------------------------------
loss: 1.067001  [   64/60000]
loss: 1.094617  [ 6464/60000]
loss: 0.884778  [12864/60000]
loss: 1.043996  [19264/60000]
loss: 0.918539  [25664/60000]
loss: 0.949302  [32064/60000]
loss: 0.989025  [38464/60000]
loss: 0.944871  [44864/60000]
loss: 0.988496  [51264/60000]
loss: 0.912483  [57664/60000]
Test Error: 
 Accuracy: 66.2%, Avg loss: 0.933203 

Epoch 8
-------------------------------
loss: 0.976123  [   64/60000]
loss: 1.022986  [ 6464/60000]
loss: 0.799399  [12864/60000]
loss: 0.976859  [19264/60000]
loss: 0.856569  [25664/60000]
loss: 0.876091  [32064/60000]
loss: 0.931678  [38464/60000]
loss: 0.891402  [44864/60000]
loss: 0.927947  [51264/60000]
loss: 0.861693  [57664/60000]
Test Error: 
 Accuracy: 67.3%, Avg loss: 0.878109 

Epoch 9
-------------------------------
loss: 0.905446  [   64/60000]
loss: 0.968479  [ 6464/60000]
loss: 0.734980  [12864/60000]
loss: 0.926013  [19264/60000]
loss: 0.812019  [25664/60000]
loss: 0.819447  [32064/60000]
loss: 0.888065  [38464/60000]
loss: 0.853141  [44864/60000]
loss: 0.881495  [51264/60000]
loss: 0.822642  [57664/60000]
Test Error: 
 Accuracy: 68.5%, Avg loss: 0.835817 

Epoch 10
-------------------------------
loss: 0.848743  [   64/60000]
loss: 0.924670  [ 6464/60000]
loss: 0.684697  [12864/60000]
loss: 0.886120  [19264/60000]
loss: 0.778283  [25664/60000]
loss: 0.775283  [32064/60000]
loss: 0.853050  [38464/60000]
loss: 0.824834  [44864/60000]
loss: 0.844829  [51264/60000]
loss: 0.791272  [57664/60000]
Test Error: 
 Accuracy: 70.0%, Avg loss: 0.802104 

Done!

保存并加载模型¶

  • 加载模型时需要定义模型类,这是因为模型的权重依赖于模型的结构。
In [188]:
torch.save(model, 'model.pth')
In [189]:
model = torch.load('model.pth', weights_only=False)
model
Out[189]:
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)