Forgetless context management for production LLM workflows

Forgetless helps teams compress long prompts, files, screenshots, PDFs, and raw byte inputs into ranked context that fits strict token budgets while preserving the signal needed by production LLM systems. The homepage links into documentation, builder guidance, API usage, inputs, chunking, tokenizer behavior, server deployment, error handling, and result interpretation so teams can evaluate the full context pipeline from one entry point. It is designed for developers who need predictable context preparation before model calls, including ranking, deduplication, chunk boundaries, token counting, file extraction, and response payloads that can be tested in automation. The product focuses on keeping the important parts of long source material while making the final prompt smaller, clearer, and safer to pass through production systems.

Read the Forgetless documentation

Forgetless

As invested in context as you are.

If you're building a production LLM workflow, you need a context layer just as deliberate as the rest of your system.

Forgetless turns oversized prompts, files, screenshots, PDFs, and raw bytes into a budget-aware payload that still preserves the signal.

That means fewer manual cutdowns, fewer broken prompts, and more reliable inputs before every model call.

Learn more about Forgetless

Questions, answered.

Forgetless is a Rust context optimizer for LLM workflows. It previews, ranks, chunks, and compresses oversized prompts, files, screenshots, PDFs, and raw byte inputs into a budget-aware context block so production systems can stay within strict token budgets while keeping the important signal.

Forgetless is published as a Rust crate. You add it with cargo add forgetless, build a pipeline through the Forgetless builder by chaining add, add_file, and query, then call run to produce an optimized context block. An optional HTTP server, enabled with the server feature, exposes the same optimization over a multipart API for use from other languages.

You can add text content, files on disk, and in-memory byte payloads. Inputs can be tagged with priorities such as critical, high, and low using WithPriority and FileWithPriority so the optimizer knows which content to preserve first when the budget is tight.

You set a token budget through the context_limit configuration option, which defaults to 128,000. Forgetless counts tokens, scores and selects chunks to fit within that budget, and returns statistics including input and output tokens and the compression ratio for the run.

Yes. Forgetless includes optional local AI helpers in Rust, including embeddings with cosine similarity, an optional local LLM for post-selection polishing, and vision helpers for describing images. These are gated behind opt-in configuration and Cargo features rather than enabled by default.

Yes. Forgetless is open source and the code is available on GitHub at github.com/pzzaworks/forgetless.

Read the full documentation