Transformers provide APIs for State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX and tools to download and train state-of-the-art pre-trained models easily. Using pre-trained models can reduce your compute costs, and carbon footprint, and save you time and resources required to train a model from scratch.

In this tutorial, We will Install the Transformers library, set up your cache, and download the pre-train Transformers model to run offline and load the model from a local directory.

You can install Transformers with the following command:

! pip install transformers

Default Cache Directory

Pretrained models are downloaded and locally cached at ~/.cache/huggingface/hub. This is the default directory given by the shell environment variable TRANSFORMERS_CACHE. On Windows, the default directory is given by C:\Users\username\.cache\huggingface\hub

Change huggingface transformers default cache directory

Transformers will use the shell environment variables PYTORCH_TRANSFORMERS_CACHE or PYTORCH_PRETRAINED_BERT_CACHE .You can change the shell environment variables shown below – in order of priority – to specify a different cache directory. Example for Python:

os.environ['HF_HOME'] = '/content/bert_base/misc'
os.environ['HF_DATASETS_CACHE'] = '/content/bert_base/datasets'
os.environ['TRANSFORMERS_CACHE'] = '/content/bert_base/models'

You can specify the cache directory every time you load a model with .from_pretrained by setting the parameter cache_dir:

from transformers import AutoTokenizer,AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("roberta-base", cache_dir="/content/bert_base")
model = AutoModelForMaskedLM.from_pretrained("roberta-base", cache_dir="/content/bert_base")

Download model from Huggingface

You can use Transformers mode offline. You need to download the files ahead of time, and then point to their local path when you need to use them offline. There are three ways to do this:

1. You can just git clone a model from Model Hub to your local disk and make from_pretrained point to it:

Hugging Face Clone Repository
!git clone https://huggingface.co/bert-base-uncased

model = AutoModel.from_pretrained('/content/bert-base-cased')

2. Use the PreTrainedModel.from_pretrained() and PreTrainedModel.save_pretrained() workflow:

The models are automatically cached locally when you first use them. So, to download a model, all you have to do is run the code that is provided in the model card (I chose the corresponding model card for bert-base-uncased).

Use Transformers Model Locely

At the top right of the page, you can find a button called “Use in Transformers”, which even gives you the sample code, showing you how to use it in Python. Again, for bert-base-uncased, this gives you the following code snippet:

# Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
from transformers import AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

When you run this code for the first time, you will see a download bar appear on the screen. Save your files to a specified directory with PreTrainedModel.save_pretrained()

tokenizer.save_pretrained("/content/bert_base_uncased")
model.save_pretrained("/content/bert_base_uncased")

Now when you’re offline, reload your files with PreTrainedModel.from_pretrained() from the specified directory:

tokenizer = AutoTokenizer.from_pretrained("/content/bert_base_uncased")
model = AutoModel.from_pretrained("/content/bert_base_uncased")

3. Programmatically download files with the huggingface_hub library. First, install the huggingface_hub library in your virtual environment:

python -m pip install huggingface_hub

Use the hf_hub_download function to download a file to a specific path. For example, the following command downloads the config.json file from the T0 model to your desired path:

from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0")

Once your file is downloaded and locally cached, specify its local path to load and use it:

from transformers import AutoConfig

config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")

Related Post

Save and Load fine-tuned Huggingface Transformers model from local disk