Chunking

Name: Forgetless
Author: Berke (pzzaworks)

Chunking is content-aware. `forgetless` picks a strategy from `ContentType`, then applies token targets, overlap, and deduplication before scoring begins.

Written byBerke (pzzaworks)

Chunk presets

Preset	Target	Max	Good for
ChunkConfig::default()	512	1024	General text and mixed project context.
ChunkConfig::for_code()	256	512	Codebases where smaller functions need independent ranking.
ChunkConfig::for_conversation()	200	400	Chat transcripts and message histories.
ChunkConfig::for_speed()	1000	2000	Fast coarse compression with larger chunks.
ChunkConfig::for_quality()	256	512	More selective ranking with finer chunk boundaries.

Content types

Type	Detected from	Notes
Text	Fallback default	Used for plain text and unknown extensions.
Code	`.rs`, `.py`, `.ts`, `.tsx`, `.go`, `.java`, and similar	Optimized for source-oriented chunking.
Markdown	`.md`, `.markdown`	Keeps markdown documents out of the plain-text path.
Conversation	Explicit config path	Best for message-oriented histories.
Structured	`.json`, `.yaml`, `.toml`, `.xml`	Useful for config and data files.

Customization

1use forgetless::{ChunkConfig, Config, ForgetlessConfig, ScoringConfig};
2
3let advanced = ForgetlessConfig::new(
4    Config::default().context_limit(64_000),
5)
6.with_chunk(
7    ChunkConfig::for_quality()
8        .with_target_tokens(256)
9        .with_max_tokens(512)
10        .with_min_tokens(10)
11        .with_deduplication(true),
12)
13.with_scoring(ScoringConfig {
14    semantic_weight: 0.6,
15    keyword_weight: 0.25,
16    priority_weight: 0.15,
17});

Inputs and PrioritiesCore PipelineCore IntroductionBasics

Pipeline

Configuration

Chunk presets

Content types

Customization

Related docs