Architecture
Reflex is a trigram-based full-text code search engine written in Rust, with runtime symbol detection via Tree-sitter. This page describes the internal architecture for contributors and users who want to understand how it works.
System overview
Section titled “System overview”Core components
Section titled “Core components”Trigram Indexer
Section titled “Trigram Indexer”Extracts every three-character sequence from source files and builds an inverted index: trigram → [file_ids]. At query time, trigrams from the search term are intersected to narrow the candidate file set by 100–1000x.
Content Store
Section titled “Content Store”Binary format with memory-mapped I/O. File contents are concatenated with a file index for O(1) lookup. Memory mapping means the OS handles paging — only accessed regions are loaded into RAM.
Symbol Parser (runtime)
Section titled “Symbol Parser (runtime)”Tree-sitter parsers run at query time, not during indexing. When a query includes --symbols, Reflex parses only the candidate files (already narrowed by trigrams) to extract symbol definitions.
This is the key architectural decision: indexing is instant because it skips parsing entirely. Symbol search is still fast because trigrams eliminate most files before parsing begins.
Cache Manager
Section titled “Cache Manager”Manages the .reflex/ directory:
meta.db— SQLite database with file metadata, statistics, and configurationtrigrams.bin— rkyv-serialized trigram inverted indexcontent.bin— memory-mapped file contentshashes.json— blake3 hashes for incremental indexing
Indexing pipeline
Section titled “Indexing pipeline”- Walk — traverse the file tree, respecting
.gitignoreand language filters - Hash — compute blake3 hash of each file, skip unchanged files (incremental)
- Extract — generate trigrams from each file’s content
- Build — construct the inverted index from all trigrams
- Write — serialize to
trigrams.bin(rkyv) andcontent.bin, updatemeta.db
Incremental performance: 1,000 files full index ~2s, 10 changed files ~200ms.
Query pipeline
Section titled “Query pipeline”Full-text search
Section titled “Full-text search”- Extract trigrams from the query string
- Look up each trigram’s posting list in the inverted index
- Intersect posting lists to get candidate files
- Scan candidates with memory-mapped content to verify matches
- Apply filters (language, path) and sort results
Complexity: O(n + k log k + m) where n = posting list sizes, k = candidates, m = matches.
Symbol search
Section titled “Symbol search”- Run full-text search to get candidate files
- Parse each candidate with the appropriate Tree-sitter grammar
- Walk the AST to find symbol definitions matching the query
- Filter by
--kindif specified
Regex search
Section titled “Regex search”- Extract literal substrings from the regex pattern
- Use literals for trigram narrowing (if available)
- Fall back to scanning all files if no literals can be extracted
- Apply the full regex to candidate content
Data formats
Section titled “Data formats”trigrams.bin
Section titled “trigrams.bin”rkyv (zero-copy deserialization) serialized format:
- Header:
RFTGmagic bytes - Trigram postings:
HashMap<[u8; 3], Vec<u32>>mapping trigrams to file IDs - File list: ordered list of indexed file paths
Zero-copy means the index is usable directly from the memory-mapped file without deserialization.
content.bin
Section titled “content.bin”Custom binary format:
- Header:
RFCTmagic bytes (32 bytes total) - Concatenated file contents
- File index at the end:
(offset, length)pairs for O(1) file lookup
meta.db
Section titled “meta.db”SQLite database with three tables:
| Table | Purpose |
|---|---|
files | File paths, sizes, last modified times |
statistics | Aggregate stats (file count, language breakdown) |
config | Index configuration snapshot |
Performance optimizations
Section titled “Performance optimizations”Memory-mapped I/O
Section titled “Memory-mapped I/O”Both trigrams.bin and content.bin are accessed via memmap2. Benefits:
- No explicit loading — the OS pages data on demand
- Shared across processes
- Efficient for large indexes
rkyv serialization
Section titled “rkyv serialization”Zero-copy deserialization for the trigram index. Unlike serde, rkyv’s archived format is directly usable without copying data into Rust structs.
blake3 hashing
Section titled “blake3 hashing”blake3 is used for incremental indexing. It’s ~10x faster than SHA-256 for file hashing, making the “what changed?” check negligible.
Parallel indexing
Section titled “Parallel indexing”Rayon parallelizes indexing across ~80% of available CPU cores.
Technology stack
Section titled “Technology stack”| Crate | Purpose |
|---|---|
tree-sitter | Runtime parsing for symbol extraction |
rkyv | Zero-copy serialization for trigram index |
memmap2 | Memory-mapped file I/O |
rusqlite | SQLite for metadata |
blake3 | Fast content hashing |
ignore | .gitignore-aware file walking |
rayon | Parallel indexing |
clap | CLI argument parsing |
axum | HTTP API server |
tokio | Async runtime |
serde_json | JSON output |
Design principles
Section titled “Design principles”- Performance first — every design choice prioritizes query speed
- Completeness over precision — find every occurrence, let the user filter
- Simplicity over features — do fewer things well
- Determinism — same query, same results, every time
- Extensibility — adding a language means adding one parser file
References
Section titled “References”- Russ Cox — Regular Expression Matching with a Trigram Index
- Zoekt — trigram code search
- Sourcegraph — code intelligence platform
- ripgrep — regex search tool
Next steps
Section titled “Next steps”- Contributing — development setup and code organization
- CLI Commands — full command reference
- Supported Languages — parser details per language