text-generation-inference

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Safetensors

Safetensors is a model serialization format for deep learning models. It is faster and safer compared to other serialization formats like pickle (which is used under the hood in many deep learning libraries).

TGI depends on safetensors format mainly to enable tensor parallelism sharding. For a given model repository during serving, TGI looks for safetensors weights. If there are no safetensors weights, TGI converts the PyTorch weights to safetensors format.

You can learn more about safetensors by reading the safetensors documentation.

Update on GitHub

←PagedAttention Flash Attention→