Chunking

Chunking is content-aware. `forgetless` picks a strategy from `ContentType`, then applies token targets, overlap, and deduplication before scoring begins.

Chunk presets

PresetTargetMaxGood for
ChunkConfig::default()5121024General text and mixed project context.
ChunkConfig::for_code()256512Codebases where smaller functions need independent ranking.
ChunkConfig::for_conversation()200400Chat transcripts and message histories.
ChunkConfig::for_speed()10002000Fast coarse compression with larger chunks.
ChunkConfig::for_quality()256512More selective ranking with finer chunk boundaries.

Content types

TypeDetected fromNotes
TextFallback defaultUsed for plain text and unknown extensions.
Code`.rs`, `.py`, `.ts`, `.tsx`, `.go`, `.java`, and similarOptimized for source-oriented chunking.
Markdown`.md`, `.markdown`Keeps markdown documents out of the plain-text path.
ConversationExplicit config pathBest for message-oriented histories.
Structured`.json`, `.yaml`, `.toml`, `.xml`Useful for config and data files.

Customization

1use forgetless::{ChunkConfig, Config, ForgetlessConfig, ScoringConfig};
2
3let advanced = ForgetlessConfig::new(
4 Config::default().context_limit(64_000),
5)
6.with_chunk(
7 ChunkConfig::for_quality()
8 .with_target_tokens(256)
9 .with_max_tokens(512)
10 .with_min_tokens(10)
11 .with_deduplication(true),
12)
13.with_scoring(ScoringConfig {
14 semantic_weight: 0.6,
15 keyword_weight: 0.25,
16 priority_weight: 0.15,
17});