Model Memory Estimator

Estimate inference memory requirements for Hugging Face models. Enter a model ID to see total parameters and estimated memory (GB/MB) without downloading the full model.

Model ID

Suggested models from Hugging Face, or enter any model ID

Revision (optional)

Default: main

Hugging Face Token (optional)

Required for gated or private models

GPU

Required — see if the model fits and how many GPUs needed

Results

Enter a model ID and click "Estimate Memory" to see parameters and estimated memory

What is this?

The Model Memory Estimator helps you understand how much memory (RAM or VRAM) a Hugging Face model will need for inference. It works with Transformers, Diffusers, Sentence Transformers, and any model that uses Safetensors weights.

How does it work?

This tool uses the hf-mem package to fetch only the Safetensors metadata from Hugging Face via HTTP Range requests—typically the first ~100KB of each weight file. From that metadata (tensor shapes and data types), it computes total parameter count and estimated memory. No full model download is required.

Browser-only & Privacy

This tool runs entirely in your browser. Nothing is sent to our servers. Your Hugging Face token (if you enter one) is used only between your browser and Hugging Face—it never goes to any other server and is not stored. Only metadata is fetched from Hugging Face; model weights are not downloaded. The model ID you enter is sent to Hugging Face to list and read file metadata.