Hybrid Search: Smart Search Architecture with FTS5 + Vector + RRF

Q: What is BM25?

Best Matching 25 is a text search algorithm. It calculates a score based on word frequency in the document, document length, and rarity across the collection. It was used as the core search logic in Google's early years.

Q: What is embedding search?

A search method that converts text into a high-dimensional vector and measures similarity through distance between vectors (cosine similarity). It works on 'meaning': it can find the same concept expressed with different words.

Q: What is RRF (Reciprocal Rank Fusion)?

A method for combining two different search result lists. It scores each document based on its rank in each list: 1/(k+rank). k is typically 60. Documents that rank high in both lists get the highest score.

Q: Why don't we just use vector search?

Vector search captures meaning but is weak at exact matching. When searching for 'GTM', the embedding might not distinguish between 'Google Tag Manager' and 'tag management'. BM25 directly matches the abbreviation 'GTM'.

Q: Can you do vector search with SQLite?

Yes. The sqlite-vec extension adds vector search capability to SQLite. FTS5 is already built-in. Both run in the same database without requiring separate servers.

TL;DR

Keyword search (BM25) finds words but misses meaning. Vector search (embedding) captures meaning but misses abbreviations. When you combine them with Reciprocal Rank Fusion (RRF), results that both methods rank highly rise to the top. dnomia-knowledge implements this hybrid approach with SQLite FTS5 + sqlite-vec + RRF in a single file. A focused 340-character chunk scores 2.3x higher in similarity than a 1.5KB multi-topic document.

You search for “GTM datalayer debug”. Keyword search finds documents containing the words “GTM” and “datalayer”. But it can’t find the document titled “troubleshooting event tracking issues”, which is exactly what you’re looking for, because those words don’t appear in it.

You try vector search. The embedding model captures the semantic similarity between “GTM datalayer debug” and “troubleshooting event tracking issues”. But this time it misses the configuration file containing the abbreviation “GTM”, because the embedding can’t semantically match a 3-letter abbreviation.

When you run both together, results where both methods are strong rise to the top. This is hybrid search.

Keyword Search: Word Matching with BM25

BM25 (Best Matching 25) is a text search algorithm. Its core logic is simple but effective:

Term frequency (TF): The more often a search term appears in a document, the higher the score
Document length normalization: A correction is applied so long documents don’t score high simply for being long
Inverse document frequency (IDF): If a word appears in every document in the collection (e.g., “the”, “a”), its value drops. Rare words are more valuable

SQLite’s built-in FTS5 (Full-Text Search 5) module uses BM25 scoring. No separate server required; it runs inside the database file.

When is it strong?

Abbreviations: “GTM”, “sGTM”, “CAPI”, “RRF”
Variable/function names: handleSubmit, datalayer.push
Error codes: “ERR_CONNECTION_REFUSED”, “404”
Exact phrase: “consent mode v2 advanced”

When is it weak?

Synonyms: searching for “performance” can’t return “speed” or “hız”
Cross-language: searching for “sepet terk” can’t find “cart abandonment”
Conceptual queries: abstract questions like “why is the user leaving the page?”

Vector Search: Semantic Matching with Embeddings

Embedding converts text into a high-dimensional vector (an array of numbers). “Sepet terk oranı” and “cart abandonment rate” are different words, but they occupy nearby points in embedding space.

Similarity is measured using cosine similarity: the cosine of the angle between two vectors. 1.0 = identical direction, 0.0 = unrelated, -1.0 = opposite.

dnomia-knowledge uses the intfloat/multilingual-e5-base model (768 dimensions, multilingual). The sqlite-vec extension adds KNN (K-Nearest Neighbors) vector search to SQLite.

When is it strong?

Semantic queries: searching “why isn’t tracking working?” finds the debug guide
Cross-language: Turkish question, English document (or vice versa)
Conceptual similarity: searching “data loss” returns “data loss prevention”

When is it weak?

Abbreviations: embedding is too generic for “GTM”, specific matching is weak
New/rare terms: words not in the model’s training data
Exact match: keyword is more accurate for technical terms like “consent_mode_v2”

Hybrid Search: Combining Both

Two search engines with different strengths. Hybrid search runs both together and merges the results.

Reciprocal Rank Fusion (RRF)

RRF is a method for merging two ranked lists into a single ranked list. The formula:

RRF_score(d) = Σ 1 / (k + rank_i(d))

The constant k is typically 60 (the value from the original paper). rank_i(d) is the document’s rank in the i-th list.

Concrete example:

BM25 results:     1. file-A, 2. file-C, 3. file-B
Vector results:   1. file-B, 2. file-A, 3. file-D

RRF merge (k=60):
  file-A: 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325
  file-B: 1/(60+3) + 1/(60+1) = 0.0159 + 0.0164 = 0.0323
  file-C: 1/(60+2) + 0         = 0.0161
  file-D: 0         + 1/(60+3) = 0.0159

Final ranking: file-A, file-B, file-C, file-D

Think of it as two search engines “voting”. If both rank a document highly, it’s definitely relevant. If only one does, it still makes the list but lower down.

Why not a simple score average?

BM25 scores and cosine similarity are on different scales. BM25 ranges from 0-25, cosine from 0-1. A direct average would make the larger scale (BM25) dominant. Since RRF works on rankings rather than raw scores, it’s unaffected by scale differences.

Chunk Size: Small and Focused Wins

Hybrid search effectiveness depends directly on chunk quality. There’s a critical finding here:

A focused 340-character rule scored 0.57 similarity for its target query. A 1.5KB multi-topic document containing the same information scored 0.25 for the same query¹.

A 2.3x difference. Same information, different packaging. Why?

The embedding model compresses the entire chunk into a single vector. As the chunk grows, each concept gets less representation. When a 340-character chunk says “in consent mode v2 advanced mode, cookies aren’t written but cookieless pings are sent”, the embedding focuses entirely on that meaning. Even if a 1.5KB document contains the same sentence, the other 20 sentences dilute the embedding.

dnomia-knowledge uses heading-based chunking: splits at ## and ### headings, applies a minimum 200-character merge. This ensures each chunk stays focused on a single topic.

Structured Matching: TF-IDF Is Sometimes Better

Not everything needs embeddings. For structured matching (skill routing, tag matching, exact category matching), TF-IDF can be 250x more efficient than embeddings².

Why? Loading, running, and performing inference with an embedding model is computationally expensive. For a routing decision like “which skill should this query go to?”, keyword matching (TF-IDF) responds in milliseconds while an embedding model takes hundreds of milliseconds.

Rule of thumb: embeddings for complex, semantic queries. Keyword/TF-IDF for structural, categorical matching.

Practical Implementation: dnomia-knowledge

dnomia-knowledge implements this hybrid approach in a single file on SQLite:

SQLite database (single .db file)
  ├── FTS5 table (BM25 keyword search)
  ├── sqlite-vec table (vector KNN search)
  └── chunk table (metadata, file path, project)

Search flow:

Query arrives: "GTM consent mode debug"
    ↓
1. FTS5 search (BM25): search for "GTM", "consent", "mode", "debug"
   → Results: [chunk-A (score 12.3), chunk-C (score 8.1), chunk-E (score 5.2)]
    ↓
2. Vector search (cosine): embed the query, find nearest chunks
   → Results: [chunk-B (score 0.82), chunk-A (score 0.74), chunk-D (score 0.68)]
    ↓
3. RRF merge (k=60): combine two lists by rank
   → Final: [chunk-A (0.0325), chunk-B (0.0323), chunk-C (0.0161), ...]

chunk-A ranks high in both lists, making it the most relevant result. chunk-B is strong only in vector (semantic similarity), chunk-C only in keyword (word match). Both make the list but below chunk-A.

Performance

FTS5 search: < 1ms (SQLite native, indexed)
Vector search: < 10ms (sqlite-vec, 768 dimensions, a few thousand chunks)
Embedding generation: ~50ms/query (multilingual-e5-base, CPU)
Total: ~60ms/query

No separate Elasticsearch + Pinecone setup required. Single SQLite file, single Python process.

Prefix Rule

The intfloat/e5 model requires prefixes:

Queries: "query: GTM consent mode debug"
Documents: "passage: This guide covers consent mode v2 configuration..."

Without prefixes, similarity scores drop. The model was trained with these prefixes, mapping query and document vectors into different spaces.

Fallback Strategy

Sometimes even hybrid search returns no results. dnomia-knowledge uses a three-tier fallback:

Hybrid search (RRF): First attempt, best results
Prefix fallback: If no results, search by prefix matching of query terms (GTM → “GTM server”, “GTM preview”)
FTS5 only: If vector returns nothing, return keyword results alone

This minimizes the “no results found” scenario.

When to Use Which Approach?

Scenario	Best Approach	Why
Abbreviation/code search	BM25 (keyword)	Exact match is critical
”Why isn’t this working?”	Vector (embedding)	Semantic similarity needed
General search	Hybrid (RRF)	Combines strengths of both
Skill routing	TF-IDF	Speed and efficiency
Cross-language search	Vector (embedding)	Multilingual model crosses language barrier

Conclusion

Hybrid search is the way out of the “keyword or vector” dilemma. RRF merging transforms two different strengths into a single ranking.

But chunk quality matters as much as technology choice. A focused 340-character piece of information is 2.3x more effective than a sprawling 1.5KB document. Before improving the search engine, improving the data being searched usually has a bigger impact.

The next level beyond chunk quality is eliminating chunks entirely. jCodeMunch uses Tree-sitter AST parsing to index code symbols (functions, classes, imports) directly and provides O(1) access via byte-offset. While chunk-based RAG risks splitting a function mid-body, symbol-based retrieval returns a complete logical unit. On the FastAPI codebase, chunk RAG yields 330K tokens at 74% precision, while symbol retrieval yields 480 tokens at 96% precision³. For code search, extracting symbols from the AST instead of chunking takes what hybrid search does for text one step further.

Source code: dnomia-knowledge (MIT license, SQLite + FTS5 + sqlite-vec + RRF)

Footnotes

Memory Vault shared brain study. 340-character focused rule vs 1.5KB multi-topic document similarity comparison. Source: dev.to/tars_mistaike ↩
Skill resolver token economics. TF-IDF vs embeddings structured matching benchmark. Source: dev.to/comeonoliver ↩
Gravelle, J. (2026). Bringing The Receipts: 95% AI LLM Token Savings. Dev.to. jCodeMunch: Tree-sitter AST parsing for symbol-level code retrieval. FastAPI benchmark: chunk RAG 330K tokens / 74% precision vs symbol retrieval 480 tokens / 96% precision. 95% average token reduction across 15 tasks and 3 repositories. ↩

Key Takeaways

01 BM25 (keyword) matches words using TF-IDF weighting. Strong for abbreviations, variable names, and error codes.
02 Vector search (embedding) captures semantic similarity. Searching for 'sepet terk oranı' returns 'cart abandonment rate'.
03 When both work together, they cover each other's weaknesses. RRF merges result lists using rank-based scoring.
04 Chunk size is critical: a focused 340-character rule gets 0.57 similarity, a 1.5KB multi-topic document gets 0.25. Small and focused always wins.
05 For structured matching (skill routing, tag matching), TF-IDF can be 250x more efficient than embeddings.

Frequently Asked Questions (FAQ)

+ What is BM25?

Best Matching 25 is a text search algorithm. It calculates a score based on word frequency in the document, document length, and rarity across the collection. It was used as the core search logic in Google's early years.

+ What is embedding search?

A search method that converts text into a high-dimensional vector and measures similarity through distance between vectors (cosine similarity). It works on 'meaning': it can find the same concept expressed with different words.

+ What is RRF (Reciprocal Rank Fusion)?

A method for combining two different search result lists. It scores each document based on its rank in each list: 1/(k+rank). k is typically 60. Documents that rank high in both lists get the highest score.

+ Why don't we just use vector search?

Vector search captures meaning but is weak at exact matching. When searching for 'GTM', the embedding might not distinguish between 'Google Tag Manager' and 'tag management'. BM25 directly matches the abbreviation 'GTM'.

+ Can you do vector search with SQLite?

Yes. The sqlite-vec extension adds vector search capability to SQLite. FTS5 is already built-in. Both run in the same database without requiring separate servers.

developer-tools ai

Keyword Search: Word Matching with BM25

When is it strong?

When is it weak?

Vector Search: Semantic Matching with Embeddings

When is it strong?

When is it weak?

Hybrid Search: Combining Both

Reciprocal Rank Fusion (RRF)

Why not a simple score average?

Chunk Size: Small and Focused Wins

Structured Matching: TF-IDF Is Sometimes Better

Practical Implementation: dnomia-knowledge

Performance

Prefix Rule

Fallback Strategy

When to Use Which Approach?

Conclusion

Footnotes

RELATED

Claude Code Context Management: Three Different Approaches

RAG Chunking: Strategies, Limitations, and Decision Map

Code Search for AI Agents: ripgrep, ast-grep, or Semantic?

CLI tool masterychanges your workflow

CLI tool mastery
changes your workflow