Pytorch学习笔记13-使用GPU训练模型

由于时效问题，该文某些代码、技术可能已经过期，请注意！！！本文最后更新于：3 年前

如题

训练过程的耗时主要来自于两个部分，一部分来自数据准备，另一部分来自参数迭代。

当数据准备过程还是模型训练时间的主要瓶颈时，我们可以使用更多进程来准备数据。

当参数迭代过程成为训练时间的主要瓶颈时，我们通常的方法是应用GPU来进行加速。

Pytorch中使用GPU加速模型非常简单，只要将模型和数据移动到GPU上。核心代码只有以下几行。

# 定义模型  
...   

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  
model.to(device) # 移动模型到cuda  

# 训练模型  
...  

features = features.to(device) # 移动数据到cuda  
labels = labels.to(device) # 或者  labels = labels.cuda() if torch.cuda.is_available() else labels  
...

如果要使用多个GPU训练模型，也非常简单。只需要在将模型设置为数据并行风格模型。
则模型移动到GPU上之后，会在每一个GPU上拷贝一个副本，并把数据平分到各个GPU上进行训练。核心代码如下。

# 定义模型  
...   

if torch.cuda.device_count() > 1:  
    model = nn.DataParallel(model) # 包装为并行风格模型  

# 训练模型  
...  
features = features.to(device) # 移动数据到cuda  
labels = labels.to(device) # 或者 labels = labels.cuda() if torch.cuda.is_available() else labels

一些和GPU有关的基本操作汇总

import torch 
from torch import nn 

# 1，查看gpu信息
if_cuda = torch.cuda.is_available()
print("if_cuda=",if_cuda)

gpu_count = torch.cuda.device_count()
print("gpu_count=",gpu_count)


# 2，将张量在gpu和cpu间移动
tensor = torch.rand((100,100))
tensor_gpu = tensor.to("cuda:0") # 或者 tensor_gpu = tensor.cuda()
print(tensor_gpu.device)
print(tensor_gpu.is_cuda)

tensor_cpu = tensor_gpu.to("cpu") # 或者 tensor_cpu = tensor_gpu.cpu() 
print(tensor_cpu.device)

# 3，将模型中的全部张量移动到gpu上
net = nn.Linear(2,1)
print(next(net.parameters()).is_cuda)
net.to("cuda:0") # 将模型中的全部参数张量依次到GPU上，注意，无需重新赋值为 net = net.to("cuda:0")
#查看模型是否已经移动到GPU上
print("if on cuda:",next(net.parameters()).is_cuda)
#print(next(net.parameters()).is_cuda)
print(next(net.parameters()).device)

# 4，创建支持多个gpu数据并行的模型
linear = nn.Linear(2,1)
print(next(linear.parameters()).device)

model = nn.DataParallel(linear)
print(model.device_ids)
print(next(model.module.parameters()).device) 

#注意保存参数时要指定保存model.module的参数
torch.save(model.module.state_dict(), "model_parameter.pkl") 

linear = nn.Linear(2,1)
linear.load_state_dict(torch.load("model_parameter.pkl")) 


# 5，清空cuda缓存

# 该方法在cuda超内存时十分有用
torch.cuda.empty_cache()

搬运自：

https://www.heywhale.com/home/competition/61bff9a84b63a700179b7f8d/content/1

Pytorch

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Pytorch学习笔记14-结构化数据建模流程范例上一篇

Pytorch学习笔记12-训练模型的3种方法下一篇