Technology & EngineeringPostgres195 lines

Postgres Full Text Search

Full-text search in PostgreSQL using tsvector, tsquery, ranking, and GIN indexes

Quick Summary18 lines

You are an expert in PostgreSQL full-text search capabilities including tsvector, tsquery, ranking, and index strategies.

## Key Points

- Use generated stored columns for tsvector to keep the search index automatically synchronized with source data.
- Apply weights (A, B, C, D) to differentiate importance of fields — title matches should rank higher than body matches.
- Always create a GIN index on tsvector columns; without it, every query performs a sequential scan.
- Use `websearch_to_tsquery` for user-facing search input as it handles natural query syntax gracefully.
- Use `ts_headline` for result snippets but be aware it re-parses the original text at query time, so limit its use to the final displayed results.
- Choose the correct text search configuration for your language (`'english'`, `'spanish'`, etc.) to get proper stemming and stop words.
- Combine FTS with trigram similarity (`pg_trgm`) for fuzzy matching on misspellings.
- **Forgetting the GIN index**: Full-text search without a GIN index is extremely slow on tables with more than a few thousand rows.
- **Using `plainto_tsquery` for boolean queries**: `plainto_tsquery` implicitly ANDs all terms and does not support operators. Use `to_tsquery` or `websearch_to_tsquery` for boolean logic.
- **Language mismatch**: Using `'simple'` configuration skips stemming, so "running" will not match "run". Always specify the correct language.
- **tsvector column drift**: If using a trigger-maintained tsvector column (instead of a generated column), bulk updates that bypass triggers will leave the search vector stale.
- **Over-ranking short documents**: `ts_rank` without normalization flags favors longer documents. Use normalization options (bitmask argument) to control this.

skilldb get postgres-skills/Postgres Full Text SearchFull skill: 195 lines

Paste into your CLAUDE.md or agent config

Full-Text Search — PostgreSQL

You are an expert in PostgreSQL full-text search capabilities including tsvector, tsquery, ranking, and index strategies.

Core Philosophy

Overview

PostgreSQL provides built-in full-text search (FTS) that converts natural language text into searchable tokens, supports stemming, stop words, multiple languages, ranking, and highlighting. It eliminates the need for external search engines for many use cases while keeping data and search logic in one system.

Core Concepts

tsvector and tsquery

-- tsvector: a sorted list of normalized lexemes (stemmed words)
SELECT to_tsvector('english', 'The quick brown foxes jumped over lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- tsquery: a search query with boolean operators
SELECT to_tsquery('english', 'quick & brown & !cat');
-- Result: 'quick' & 'brown' & !'cat'

-- Match operator @@
SELECT to_tsvector('english', 'The quick brown fox')
    @@ to_tsquery('english', 'quick & fox');
-- Result: true

Storing and Indexing tsvector

CREATE TABLE articles (
    id serial PRIMARY KEY,
    title text NOT NULL,
    body text NOT NULL,
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED
);

-- GIN index for fast full-text search
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Basic Search Queries

-- Simple keyword search
SELECT id, title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & performance');

-- Phrase search (words adjacent and in order)
SELECT id, title
FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'connection pooling');

-- Websearch-style query (PostgreSQL 11+)
-- Supports quotes, OR, minus for exclusion
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"full text search" -mysql');

Implementation Patterns

Weighted Search with Ranking

-- ts_rank scores results by relevance
SELECT
    id,
    title,
    ts_rank(search_vector, query) AS rank
FROM
    articles,
    websearch_to_tsquery('english', 'postgres replication') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- ts_rank_cd uses cover density ranking (considers proximity)
SELECT
    id,
    title,
    ts_rank_cd(search_vector, query, 32) AS rank  -- 32 = normalize by rank / (rank + 1)
FROM
    articles,
    to_tsquery('english', 'backup & recovery') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Highlighting Matches

SELECT
    id,
    ts_headline('english', body, query,
        'StartSel=<mark>, StopSel=</mark>, MaxWords=50, MinWords=20, MaxFragments=3'
    ) AS snippet
FROM
    articles,
    websearch_to_tsquery('english', 'index optimization') AS query
WHERE search_vector @@ query
ORDER BY ts_rank(search_vector, query) DESC
LIMIT 10;

Custom Text Search Configuration

-- Create a custom configuration based on English
CREATE TEXT SEARCH CONFIGURATION custom_english (COPY = english);

-- Add synonym dictionary
CREATE TEXT SEARCH DICTIONARY synonyms (
    TEMPLATE = synonym,
    SYNONYMS = my_synonyms  -- references a file in $SHAREDIR/tsearch_data/
);

-- Alter the configuration to use the synonym dictionary
ALTER TEXT SEARCH CONFIGURATION custom_english
    ALTER MAPPING FOR asciiword WITH synonyms, english_stem;

-- Use the custom config
SELECT to_tsvector('custom_english', 'PostgreSQL database management');

Combining FTS with Other Filters

-- Full-text search combined with relational filters
SELECT a.id, a.title, ts_rank(a.search_vector, query) AS rank
FROM articles a,
     websearch_to_tsquery('english', 'partitioning') AS query
WHERE a.search_vector @@ query
  AND a.published_at >= now() - interval '1 year'
  AND a.category = 'database'
ORDER BY rank DESC
LIMIT 20;

-- Multi-column GIN index for combined filtering
CREATE INDEX idx_articles_search_category
    ON articles USING GIN (search_vector, category gin_trgm_ops);

Autocomplete with Prefix Matching

-- Use :* for prefix matching
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'optim:*')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'optim:*')) DESC
LIMIT 10;

Best Practices

Use generated stored columns for tsvector to keep the search index automatically synchronized with source data.
Apply weights (A, B, C, D) to differentiate importance of fields — title matches should rank higher than body matches.
Always create a GIN index on tsvector columns; without it, every query performs a sequential scan.
Use websearch_to_tsquery for user-facing search input as it handles natural query syntax gracefully.
Use ts_headline for result snippets but be aware it re-parses the original text at query time, so limit its use to the final displayed results.
Choose the correct text search configuration for your language ('english', 'spanish', etc.) to get proper stemming and stop words.
Combine FTS with trigram similarity (pg_trgm) for fuzzy matching on misspellings.

Common Pitfalls

Forgetting the GIN index: Full-text search without a GIN index is extremely slow on tables with more than a few thousand rows.
Using plainto_tsquery for boolean queries: plainto_tsquery implicitly ANDs all terms and does not support operators. Use to_tsquery or websearch_to_tsquery for boolean logic.
Language mismatch: Using 'simple' configuration skips stemming, so "running" will not match "run". Always specify the correct language.
tsvector column drift: If using a trigger-maintained tsvector column (instead of a generated column), bulk updates that bypass triggers will leave the search vector stale.
Over-ranking short documents: ts_rank without normalization flags favors longer documents. Use normalization options (bitmask argument) to control this.
Ignoring NULL handling: to_tsvector(NULL) returns NULL. Always wrap columns in coalesce() when building composite tsvectors.

Anti-Patterns

Over-engineering for hypothetical scale. Building for millions of users when you have hundreds adds complexity without value. Solve today's problems first.

Ignoring the existing ecosystem. Reinventing functionality that mature libraries already provide well wastes time and introduces unnecessary risk.

Premature abstraction. Creating elaborate frameworks and utilities before you have enough concrete cases to know what the abstraction should look like produces the wrong abstraction.

Neglecting error handling at boundaries. Internal code can trust its inputs, but system boundaries (user input, APIs, file I/O) require defensive validation.

Skipping documentation for obvious code. What is obvious to you today will not be obvious to your colleague next month or to you next year.

Install this skill directly: skilldb add postgres-skills

Get CLI access →