Pipeline

The runtime pipeline is built around lazy loading, content-aware chunking, embedding-based relevance and strict token-budget selection.

Pipeline

1

Collect eager inputs and lazy files

Strings and byte inputs are stored immediately, while file paths stay lazy until the optimizer decides to read them.

2

Preview files for query filtering

If a query exists and there are more than five lazy files, Forgetless reads fast previews and embeds them to keep only the most relevant paths.

3

Read only the selected files

PDFs, images, text and code are loaded in parallel. PDFs use `pdftotext` when available, and images can optionally invoke the vision model.

4

Chunk by content type

The chunker adapts to text, code, markdown and structured data, then labels every chunk with priority and source metadata.

5

Score and rank chunks

Embeddings, priority boosts, position heuristics, and conversation-style recency combine into the final ordering.

6

Select within budget and assemble output

Top chunks are kept until the token limit is met, then the final context is grouped by source and optionally polished with the local text model.

Scoring signals

SignalSourceNotes
Priorityuser inputCritical content outranks everything else and survives budget pressure first.
Algorithmic relevancechunk scoringThe exported `algorithmic` breakdown field currently tracks the position-based heuristic.
Semantic similarityembeddingsPreview text and chunk embeddings are compared to the query when one is present.
Recency factorscoring breakdownConversation-like chunks receive a recency-biased factor during ranking.
Optional LLM polishcontext_llmA local SmolLM2 pass can reorganize selected chunks after budget selection.

Models used locally

PurposeModelWhen it loads
Embeddingsall-MiniLM-L6-v2Used for semantic similarity and preview filtering.
Text polishHuggingFaceTB/SmolLM2-135M-InstructOnly when `context_llm(true)` is enabled.
VisionHuggingFaceTB/SmolVLM-256M-InstructOnly when `vision_llm(true)` is enabled.

Output shape

1## system.md
2You are preparing a release summary.
3
4---
5
6## design-review.pdf
7Important design decisions and open questions...
8
9---
10
11## src/lib.rs
12Public exports and library entry points...