Tokenizer Playground
Explore how different language models tokenize text. Test GPT-2, BERT, T5, and other models to see how they break down your input into tokens.
What is Tokenization?
Tokenization is the process of breaking down text into smaller units called tokens. Different models use different tokenization strategies: some split on whitespace, others use subword units, and some use character-level encoding. This tool lets you explore how various language models tokenize your text.
How does it work?
This playground uses Transformers.js to run tokenizers directly in your browser. No data is sent to any server - everything happens locally. Select a model, enter your text, and see how it gets broken down into tokens with their corresponding IDs.
Privacy & Security
All tokenization happens locally in your browser using WebAssembly. Your text never leaves your device or touches any servers. This makes it safe for sensitive data including API keys, personal information, or proprietary business data.