Best way to save a trained model in PyTorch

Question

I was looking for alternative ways to save a trained model in PyTorch  So far  I have found two alternatives    torch save   to save a model and torch load   to load a model  model state dict   to save a trained model and model load state dict   to load the saved model    I have come across to this discussion where approach 2 is recommended over approach 1   My question is  why the second approach is preferred  Is it only because torch nn modules have those two function and we are encouraged to use them

User · Answer

The pickle Python library implements binary protocols for serializing and de-serializing a Python object.

When you import torch (or when you use PyTorch) it will import pickle for you and you don't need to call pickle.dump() and pickle.load() directly, which are the methods to save and to load the object.

In fact, torch.save() and torch.load() will wrap pickle.dump() and pickle.load() for you.

A state_dict the other answer mentioned deserves just few more notes.

What state_dict do we have inside PyTorch? There are actually two state_dicts.

The PyTorch model is torch.nn.Module has model.parameters() call to get learnable parameters (w and b). These learnable parameters, once randomly set, will update over time as we learn. Learnable parameters are the first state_dict.

The second state_dict is the optimizer state dict. You recall that the optimizer is used to improve our learnable parameters. But the optimizer state_dict is fixed. Nothing to learn in there.

Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.

Let's create a super simple model to explain this:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Model weight:")    
print(model.weight)

print("Model bias:")    
print(model.bias)

print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

This code will output the following:

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328,  0.1360,  0.1553, -0.1838, -0.0316],
        [ 0.0479,  0.1760,  0.1712,  0.2244,  0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]

Note this is a minimal model. You may try to add stack of sequential

model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.Conv2d(A, B, C)
          torch.nn.Linear(H, D_out),
        )

Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm layers) have entries in the model's state_dict.

Non learnable things, belong to the optimizer object state_dict, which contains information about the optimizer's state, as well as the hyperparameters used.

The rest of the story is the same; in the inference phase (this is a phase when we use the model after training) for predicting; we do predict based on the parameters we learned. So for the inference, we just need to save the parameters model.state_dict().

torch.save(model.state_dict(), filepath)

And to use later model.load_state_dict(torch.load(filepath)) model.eval()

Note: Don't forget the last line model.eval() this is crucial after loading the model.

Also don't try to save torch.save(model.parameters(), filepath). The model.parameters() is just the generator object.

On the other side, torch.save(model, filepath) saves the model object itself, but keep in mind the model doesn't have the optimizer's state_dict. Check the other excellent answer by @Jadiel de Armas to save the optimizer's state dict.

User · Answer

A common PyTorch convention is to save models using either a  pt or  pth file extension   Save Load Entire Model Save   path    username directory lstmmodelgpu pth  torch save trainer  path     Load   Model class must be defined somewhere  model   torch load PATH  model eval

User · Answer

I ve found this page on their github repo  I ll just paste the content here   Recommended approach for saving a model There are two main approaches for serializing and restoring a model  The first  recommended  saves and loads only the model parameters  torch save the model state dict    PATH   Then later  the model   TheModelClass  args    kwargs  the model load state dict torch load PATH    The second saves and loads the entire model  torch save the model  PATH   Then later  the model   torch load PATH   However in this case  the serialized data is bound to the specific classes and the exact directory structure used  so it can break in various ways when used in other projects  or after some serious refactors

User · Answer

If you want to save the model and wants to resume the training later   Single GPU  Save   state              epoch   epoch           state dict   model state dict             optimizer   optimizer state dict      savepath  checkpoint t7  torch save state savepath    Load   checkpoint   torch load  checkpoint t7   model load state dict checkpoint  state dict    optimizer load state dict checkpoint  optimizer    epoch   checkpoint  epoch     Multiple GPU  Save  state              epoch   epoch           state dict   model module state dict             optimizer   optimizer state dict      savepath  checkpoint t7  torch save state savepath    Load   checkpoint   torch load  checkpoint t7   model load state dict checkpoint  state dict    optimizer load state dict checkpoint  optimizer    epoch   checkpoint  epoch     Don t call DataParallel before loading the model otherwise you will get an error  model   nn DataParallel model   ignore the line if you want to load on Single GPU

User · Answer

It depends on what you want to do   Case   1  Save the model to use it yourself for inference  You save the model  you restore it  and then you change the model to evaluation mode  This is done because you usually have BatchNorm and Dropout layers that by default are in train mode on construction   torch save model state dict    filepath    Later to restore  model load state dict torch load filepath   model eval     Case   2  Save model to resume training later  If you need to keep training the model that you are about to save  you need to save more than just the model  You also need to save the state of the optimizer  epochs  score  etc  You would do it like this   state          epoch   epoch       state dict   model state dict         optimizer   optimizer state dict              torch save state  filepath    To resume training you would do things like  state   torch load filepath   and then  to restore the state of each individual object  something like this    model load state dict state  state dict    optimizer load state dict state  optimizer      Since you are resuming training  DO NOT call model eval   once you restore the states when loading   Case   3  Model to be used by someone else with no access to your code  In Tensorflow you can create a  pb file that defines both the architecture and the weights of the model  This is very handy  specially when using Tensorflow serve   The equivalent way to do this in Pytorch would be   torch save model  filepath     Then later  model   torch load filepath    This way is still not bullet proof and since pytorch is still undergoing a lot of changes  I wouldn t recommend it

[python] Best way to save a trained model in PyTorch?

Examples related to python

Examples related to serialization

Examples related to deep-learning

Examples related to pytorch

Examples related to tensor