Postgres Full Text Search
Full-text search in PostgreSQL using tsvector, tsquery, ranking, and GIN indexes
You are an expert in PostgreSQL full-text search capabilities including tsvector, tsquery, ranking, and index strategies. ## Key Points - Use generated stored columns for tsvector to keep the search index automatically synchronized with source data. - Apply weights (A, B, C, D) to differentiate importance of fields — title matches should rank higher than body matches. - Always create a GIN index on tsvector columns; without it, every query performs a sequential scan. - Use `websearch_to_tsquery` for user-facing search input as it handles natural query syntax gracefully. - Use `ts_headline` for result snippets but be aware it re-parses the original text at query time, so limit its use to the final displayed results. - Choose the correct text search configuration for your language (`'english'`, `'spanish'`, etc.) to get proper stemming and stop words. - Combine FTS with trigram similarity (`pg_trgm`) for fuzzy matching on misspellings. - **Forgetting the GIN index**: Full-text search without a GIN index is extremely slow on tables with more than a few thousand rows. - **Using `plainto_tsquery` for boolean queries**: `plainto_tsquery` implicitly ANDs all terms and does not support operators. Use `to_tsquery` or `websearch_to_tsquery` for boolean logic. - **Language mismatch**: Using `'simple'` configuration skips stemming, so "running" will not match "run". Always specify the correct language. - **tsvector column drift**: If using a trigger-maintained tsvector column (instead of a generated column), bulk updates that bypass triggers will leave the search vector stale. - **Over-ranking short documents**: `ts_rank` without normalization flags favors longer documents. Use normalization options (bitmask argument) to control this.
skilldb get postgres-skills/Postgres Full Text SearchFull skill: 195 linesFull-Text Search — PostgreSQL
You are an expert in PostgreSQL full-text search capabilities including tsvector, tsquery, ranking, and index strategies.
Core Philosophy
Overview
PostgreSQL provides built-in full-text search (FTS) that converts natural language text into searchable tokens, supports stemming, stop words, multiple languages, ranking, and highlighting. It eliminates the need for external search engines for many use cases while keeping data and search logic in one system.
Core Concepts
tsvector and tsquery
-- tsvector: a sorted list of normalized lexemes (stemmed words)
SELECT to_tsvector('english', 'The quick brown foxes jumped over lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
-- tsquery: a search query with boolean operators
SELECT to_tsquery('english', 'quick & brown & !cat');
-- Result: 'quick' & 'brown' & !'cat'
-- Match operator @@
SELECT to_tsvector('english', 'The quick brown fox')
@@ to_tsquery('english', 'quick & fox');
-- Result: true
Storing and Indexing tsvector
CREATE TABLE articles (
id serial PRIMARY KEY,
title text NOT NULL,
body text NOT NULL,
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED
);
-- GIN index for fast full-text search
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Basic Search Queries
-- Simple keyword search
SELECT id, title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & performance');
-- Phrase search (words adjacent and in order)
SELECT id, title
FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'connection pooling');
-- Websearch-style query (PostgreSQL 11+)
-- Supports quotes, OR, minus for exclusion
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"full text search" -mysql');
Implementation Patterns
Weighted Search with Ranking
-- ts_rank scores results by relevance
SELECT
id,
title,
ts_rank(search_vector, query) AS rank
FROM
articles,
websearch_to_tsquery('english', 'postgres replication') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
-- ts_rank_cd uses cover density ranking (considers proximity)
SELECT
id,
title,
ts_rank_cd(search_vector, query, 32) AS rank -- 32 = normalize by rank / (rank + 1)
FROM
articles,
to_tsquery('english', 'backup & recovery') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
Highlighting Matches
SELECT
id,
ts_headline('english', body, query,
'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20, MaxFragments=3'
) AS snippet
FROM
articles,
websearch_to_tsquery('english', 'index optimization') AS query
WHERE search_vector @@ query
ORDER BY ts_rank(search_vector, query) DESC
LIMIT 10;
Custom Text Search Configuration
-- Create a custom configuration based on English
CREATE TEXT SEARCH CONFIGURATION custom_english (COPY = english);
-- Add synonym dictionary
CREATE TEXT SEARCH DICTIONARY synonyms (
TEMPLATE = synonym,
SYNONYMS = my_synonyms -- references a file in $SHAREDIR/tsearch_data/
);
-- Alter the configuration to use the synonym dictionary
ALTER TEXT SEARCH CONFIGURATION custom_english
ALTER MAPPING FOR asciiword WITH synonyms, english_stem;
-- Use the custom config
SELECT to_tsvector('custom_english', 'PostgreSQL database management');
Combining FTS with Other Filters
-- Full-text search combined with relational filters
SELECT a.id, a.title, ts_rank(a.search_vector, query) AS rank
FROM articles a,
websearch_to_tsquery('english', 'partitioning') AS query
WHERE a.search_vector @@ query
AND a.published_at >= now() - interval '1 year'
AND a.category = 'database'
ORDER BY rank DESC
LIMIT 20;
-- Multi-column GIN index for combined filtering
CREATE INDEX idx_articles_search_category
ON articles USING GIN (search_vector, category gin_trgm_ops);
Autocomplete with Prefix Matching
-- Use :* for prefix matching
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'optim:*')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'optim:*')) DESC
LIMIT 10;
Best Practices
- Use generated stored columns for tsvector to keep the search index automatically synchronized with source data.
- Apply weights (A, B, C, D) to differentiate importance of fields — title matches should rank higher than body matches.
- Always create a GIN index on tsvector columns; without it, every query performs a sequential scan.
- Use
websearch_to_tsqueryfor user-facing search input as it handles natural query syntax gracefully. - Use
ts_headlinefor result snippets but be aware it re-parses the original text at query time, so limit its use to the final displayed results. - Choose the correct text search configuration for your language (
'english','spanish', etc.) to get proper stemming and stop words. - Combine FTS with trigram similarity (
pg_trgm) for fuzzy matching on misspellings.
Common Pitfalls
- Forgetting the GIN index: Full-text search without a GIN index is extremely slow on tables with more than a few thousand rows.
- Using
plainto_tsqueryfor boolean queries:plainto_tsqueryimplicitly ANDs all terms and does not support operators. Useto_tsqueryorwebsearch_to_tsqueryfor boolean logic. - Language mismatch: Using
'simple'configuration skips stemming, so "running" will not match "run". Always specify the correct language. - tsvector column drift: If using a trigger-maintained tsvector column (instead of a generated column), bulk updates that bypass triggers will leave the search vector stale.
- Over-ranking short documents:
ts_rankwithout normalization flags favors longer documents. Use normalization options (bitmask argument) to control this. - Ignoring NULL handling:
to_tsvector(NULL)returns NULL. Always wrap columns incoalesce()when building composite tsvectors.
Anti-Patterns
Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.
Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.
Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.
Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.
Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.
Install this skill directly: skilldb add postgres-skills
Related Skills
Postgres Extensions
Key PostgreSQL extensions including pgvector, PostGIS, pg_cron, and other essential add-ons
Postgres Partitioning
Table partitioning strategies in PostgreSQL including range, list, and hash partitioning
Postgres Replication
Logical and streaming replication in PostgreSQL for high availability and data distribution
Postgres Row Level Security
Row-level security policies in PostgreSQL for fine-grained access control on table rows
Postgres Triggers
Triggers and PL/pgSQL functions in PostgreSQL for automated data processing and integrity enforcement
Adversarial Code Review
Adversarial implementation review methodology that validates code completeness against requirements with fresh objectivity. Uses a coach-player dialectical loop to catch real gaps in security, logic, and data flow.