@@ -8,14 +8,18 @@ products: [cloud, self_hosted]
88
99import EA1125 from "versionContent/_ partials/_ early_access_11_25.mdx";
1010import SINCE010 from "versionContent/_ partials/_ since_0_1_0.mdx";
11+ import SINCE040 from "versionContent/_ partials/_ since_0_4_0.mdx";
1112import IntegrationPrereqs from "versionContent/_ partials/_ integration-prereqs.mdx";
1213
1314# Optimize full text search with BM25
1415
15- $PG full-text search at scale consistently hits a wall where performance degrades catastrophically.
16+ $PG full-text search at scale consistently hits a wall where performance degrades catastrophically.
1617$COMPANY's [ pg_textsearch] [ pg_textsearch-github-repo ] brings modern [ BM25] [ bm25-wiki ] -based full-text search directly into $PG,
17- with a memtable architecture for efficient indexing and ranking. ` pg_textsearch ` integrates seamlessly with SQL and
18- provides better search quality and performance than the $PG built-in full-text search.
18+ with a memtable architecture for efficient indexing and ranking. ` pg_textsearch ` integrates seamlessly with SQL and
19+ provides better search quality and performance than the $PG built-in full-text search. With Block-Max WAND optimization,
20+ ` pg_textsearch ` delivers up to ** 4x faster top-k queries** compared to native BM25 implementations. Advanced compression
21+ using delta encoding and bitpacking reduces index sizes by ** 41%** while improving query performance by 10-20% for
22+ shorter queries.
1923
2024BM25 scores in ` pg_textsearch ` are returned as negative values, where lower (more negative) numbers indicate better
2125matches. ` pg_textsearch ` implements the following:
@@ -73,7 +77,7 @@ You have installed `pg_textsearch` on $CLOUD_LONG.
7377
7478# # Create BM25 indexes on your data
7579
76- BM25 indexes provide modern relevance ranking that outperforms $PG' s built-in ts_rank functions by using corpus
80+ BM25 indexes provide modern relevance ranking that outperforms $PG' s built-in ts_rank functions by using corpus
7781statistics and better algorithmic design.
7882
7983To create a BM25 index with pg_textsearch:
@@ -109,21 +113,31 @@ To create a BM25 index with pg_textsearch:
109113 WITH (text_config=' english' );
110114 ```
111115
112- BM25 supports single-column indexes only.
116+ BM25 supports single-column indexes only. For optimal performance, load your data first, then create the index.
113117
114118</Procedure>
115119
116120You have created a BM25 index for full-text search.
117121
118122## Optimize search queries for performance
119123
120- Use efficient query patterns to leverage BM25 ranking and optimize search performance.
124+ Use efficient query patterns to leverage BM25 ranking and optimize search performance. The `<@>` operator provides
125+ BM25-based ranking scores as negative values, where lower (more negative) scores indicate better matches. In `ORDER BY`
126+ clauses, the index is automatically detected from the column. For `WHERE` clause filtering, use `to_bm25query()` with
127+ an explicit index name.
121128
122129<Procedure>
123130
1241311. **Perform ranked searches using the distance operator**
125132
126133 ```sql
134+ -- Simplified syntax: index is automatically detected in ORDER BY
135+ SELECT name, description, description <@> ' ergonomic work' as score
136+ FROM products
137+ ORDER BY score
138+ LIMIT 3;
139+
140+ -- Alternative explicit syntax (works in all contexts)
127141 SELECT name, description, description <@> to_bm25query(' ergonomic work' , ' products_search_idx' ) as score
128142 FROM products
129143 ORDER BY score
@@ -142,6 +156,8 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor
142156
1431571. **Filter results by score threshold**
144158
159+ For filtering with WHERE clauses, use explicit index specification with `to_bm25query()`:
160+
145161 ```sql
146162 SELECT name, description <@> to_bm25query(' wireless' , ' products_search_idx' ) as score
147163 FROM products
@@ -163,7 +179,7 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor
163179 FROM products
164180 WHERE price < 500
165181 AND description <@> to_bm25query(' ergonomic' , ' products_search_idx' ) < -0.5
166- ORDER BY description <@> to_bm25query( ' ergonomic ' , ' products_search_idx ' )
182+ ORDER BY score
167183 LIMIT 5;
168184 ```
169185
@@ -342,17 +358,30 @@ Customize `pg_textsearch` behavior for your specific use case and data character
342358 threshold, it automatically flushes to a segment at transaction commit.
343359
344360 ```sql
345- -- Set memtable spill threshold (default 800000 posting entries, ~8MB segments )
346- SET pg_textsearch.memtable_spill_threshold = 1000000 ;
361+ -- Set memtable spill threshold (default 32000000 posting entries, ~1M docs/segment )
362+ SET pg_textsearch.memtable_spill_threshold = 32000000 ;
347363
348364 -- Set bulk load spill threshold (default 100000 terms per transaction)
349365 SET pg_textsearch.bulk_load_threshold = 150000;
350366
351367 -- Set default query limit when no LIMIT clause is present (default 1000)
352368 SET pg_textsearch.default_limit = 5000;
369+
370+ -- Enable Block-Max WAND optimization for faster top-k queries (enabled by default)
371+ SET pg_textsearch.enable_bmw = true;
372+
373+ -- Log block skip statistics for debugging query performance (disabled by default)
374+ SET pg_textsearch.log_bmw_stats = false;
353375 ```
354376 <SINCE010 />
355377
378+ ```sql
379+ -- Enable segment compression using delta encoding and bitpacking (enabled by default)
380+ -- Reduces index size by ~41% with 10-20% query performance improvement for shorter queries
381+ SET pg_textsearch.compress_segments = on;
382+ ```
383+ <SINCE040 />
384+
3563851. **Configure language-specific text processing**
357386
358387 You can create multiple BM25 indexes on the same column with different language configurations:
@@ -387,11 +416,26 @@ Customize `pg_textsearch` behavior for your specific use case and data character
387416 WHERE indexrelid::regclass::text ~ ' bm25' ;
388417 ```
389418
390- - View detailed index information
419+ - View index summary with corpus statistics and memory usage
420+ ```sql
421+ SELECT bm25_summarize_index(' products_search_idx' );
422+ ```
423+
424+ - View detailed index structure (output is truncated for display)
391425 ```sql
392426 SELECT bm25_dump_index(' products_search_idx' );
393427 ```
394428
429+ - Export full index dump to a file for detailed analysis
430+ ```sql
431+ SELECT bm25_dump_index(' products_search_idx' , ' / tmp/ index_dump .txt ' );
432+ ```
433+
434+ - Force memtable spill to disk (useful for testing or memory management)
435+ ```sql
436+ SELECT bm25_spill_index(' products_search_idx' );
437+ ```
438+
395439</Procedure>
396440
397441You have configured `pg_textsearch` for optimal performance. For production applications, consider implementing result
0 commit comments