Tokenizer
The public tokenizer surface is built around `TokenizerModel` and `TokenCounter`. Default mode uses `cl100k_base`; custom mode falls back to a chars-per-token estimate.
Tokenizer models
| Variant | Meaning | When to use it |
|---|---|---|
| TokenizerModel::Default | Uses `cl100k_base` through `tiktoken-rs`. | Best default for modern chat models. |
| TokenizerModel::Custom { chars_per_token_x100 } | Uses a lightweight approximation. | Good when you need cheap counting without BPE. |
Token counter
1use forgetless::{TokenCounter, TokenizerModel};23let exact = TokenCounter::new(TokenizerModel::Default)?;4let rough = TokenCounter::new(TokenizerModel::Custom {5 chars_per_token_x100: 400,6})?;78let fits = exact.fits_budget("hello world", 32);9let tokens = exact.count("release notes and code review summary");10let clipped = exact.truncate_to_budget("very long text ...", 128);
Image estimation
`TokenCounter::count_image` mirrors the vision-token heuristic in the crate: images are scaled to fit a maximum side of 2048, then the shortest side is scaled toward 768 before 512x512 tiles are counted.
| Detail | Cost model |
|---|---|
| Low | Always 85 tokens. |
| High | `170 * number_of_tiles + 85` after the crate's resize pipeline. |
| Auto | Uses the same branch as `High`. |