Tokenizer

The public tokenizer surface is built around `TokenizerModel` and `TokenCounter`. Default mode uses `cl100k_base`; custom mode falls back to a chars-per-token estimate.

Tokenizer models

VariantMeaningWhen to use it
TokenizerModel::DefaultUses `cl100k_base` through `tiktoken-rs`.Best default for modern chat models.
TokenizerModel::Custom { chars_per_token_x100 }Uses a lightweight approximation.Good when you need cheap counting without BPE.

Token counter

1use forgetless::{TokenCounter, TokenizerModel};
2
3let exact = TokenCounter::new(TokenizerModel::Default)?;
4let rough = TokenCounter::new(TokenizerModel::Custom {
5 chars_per_token_x100: 400,
6})?;
7
8let fits = exact.fits_budget("hello world", 32);
9let tokens = exact.count("release notes and code review summary");
10let clipped = exact.truncate_to_budget("very long text ...", 128);

Image estimation

`TokenCounter::count_image` mirrors the vision-token heuristic in the crate: images are scaled to fit a maximum side of 2048, then the shortest side is scaled toward 768 before 512x512 tiles are counted.

DetailCost model
LowAlways 85 tokens.
High`170 * number_of_tiles + 85` after the crate's resize pipeline.
AutoUses the same branch as `High`.