How It Works

The optimization pipeline that powers forgetless.

Pipeline

Content Ingestion

Your content (text, files, conversations) is collected and prepared. Files are read lazily - only when needed.

Smart Chunking

Content is split into semantic chunks that respect natural boundaries - paragraphs, functions, headers, or messages.

Local Embedding

Each chunk is embedded using all-MiniLM-L6-v2 model locally. No external API calls required.

Hybrid Scoring

Chunks are scored using priority, recency, semantic similarity, and position signals.

Budget Selection

Top-scoring chunks are selected to fit within your token budget. Critical content is always preserved.

Context Assembly

Selected chunks are assembled into coherent, optimized context ready for your LLM.

Scoring Signals

rust

Priority   // User-defined importance levels
Recency    // Recent content scored higher
Semantic   // Query similarity via embeddings
Position   // Context flow preservation

Result

rust

Input:       1,847,291 tokens
Output:        127,843 tokens
Compression:      14.5x

Getting Started

API