Prompt-to-PR:添加 PostgreSQL 全文搜索
SOP 用于添加原生 PostgreSQL 全文搜索,使用 tsvector、GIN 索引、ts_rank 和 Next.js 搜索 API — 无需第三方搜索服务。
CursorClaude CodeCodexWindsurf Next.jsPostgreSQLTypeScript
PostgreSQL 内置的全文搜索可以满足大多数产品搜索需求,无需 Elasticsearch 或 Algolia。本操作指南将添加一个 tsvector 列、一个 GIN 索引,以及一个位于 Next.js API 路由后方的排序搜索查询。
1. 需求
在 posts 表(列:title、body、tags)上添加全文搜索。搜索结果应按相关性排序并显示片段高亮。实现必须使用生成的 tsvector 列(而非查询时计算),以便使用 GIN 索引。
2. 首个提示
Add native PostgreSQL full-text search to the posts table in this project.
Database: PostgreSQL. ORM/query builder: [Drizzle / postgres.js — use whicheveris already in src/db/].
Step 1 — migration: Create a migration file (or Drizzle schema change) that: a. Adds a generated column: search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B') || setweight(to_tsvector('english', coalesce(tags, '')), 'C') ) STORED; b. Creates a GIN index on search_vector: CREATE INDEX posts_search_idx ON posts USING gin(search_vector);
Step 2 — query function: Create `src/lib/queries/search.ts` exporting `searchPosts(q: string, limit = 10)`. The query must: - Convert the user query to a tsquery: `websearch_to_tsquery('english', $1)`. - Filter: `search_vector @@ query`. - Rank: `ts_rank(search_vector, query) DESC`. - Return: id, title, ts_headline('english', body, query, 'MaxWords=30, MinWords=15, ShortWord=3, HighlightAll=false') AS snippet. - Use parameterized query (no string interpolation of the user input).
Step 3 — API route: Create `src/app/api/search/route.ts` (GET). Read `q` from URL search params. Return 400 if q is empty or shorter than 2 chars. Return JSON array of results. Add cache-control: public, max-age=60.
Step 4 — do not touch any UI components.3. 预期文件更改
src/db/migrations/<timestamp>_add_search_vector.sql (new — or Drizzle migration)src/db/schema.ts (search_vector column if Drizzle)src/lib/queries/search.ts (new — searchPosts function)src/app/api/search/route.ts (new — GET endpoint)4. 审查清单
GENERATED ALWAYS AS ... STORED— 列是存储的(而非虚拟的),因此可以使用 GIN 索引。确认存在STORED关键字。- 使用
websearch_to_tsquery(而非to_tsquery)— 处理来自不可信输入的多词和短语查询,不会出现语法错误。 - 用户提供的查询作为 SQL 参数传递,绝不进行字符串插值。
ts_rank对结果排序 — 而非按字母顺序或插入顺序。ts_headline长度受限(MaxWords=30),避免返回过大的片段。- GIN 索引名称明确 — 便于在需要时删除/重建。
- API 路由验证最小查询长度(2个字符),避免类似
'a':*的全表扫描 tsquery。 - 在响应中设置
Cache-Control: public, max-age=60— 搜索结果可进行短时间缓存。
5. 测试命令
# Run the migrationnpx drizzle-kit migrate# or: psql $DATABASE_URL < migration.sql
# Verify the generated column and index existpsql $DATABASE_URL -c "\d posts" | grep search_vectorpsql $DATABASE_URL -c "\di posts_search_idx"
# Query performance — confirm index scan, not seq scanpsql $DATABASE_URL -c " EXPLAIN ANALYZE SELECT id FROM posts WHERE search_vector @@ websearch_to_tsquery('english', 'your test query') LIMIT 10;" | grep "Index Scan"
# Test the API endpointbun dev &curl "http://localhost:3000/api/search?q=your+test+query" | jq .
# Confirm empty query returns 400curl -o /dev/null -w "%{http_code}" "http://localhost:3000/api/search?q="# Expect: 4006. 常见失败原因
- 顺序扫描而非索引扫描 —
VIRTUAL列(而非STORED)。在 PostgreSQL 中,只有STORED生成的列可以被索引。确认迁移使用STORED。 to_tsquery对多词输入抛出异常 —to_tsquery('english', 'quick brown')是语法错误。对于用户提供的字符串,始终使用websearch_to_tsquery。ts_headline返回整个正文 — 意外设置了HighlightAll=true,或未设置MaxWords。检查选项字符串。- GIN 索引未用于
LIKE查询 — 确保 WHERE 子句使用@@运算符,而非LIKE或ILIKE。全文搜索和LIKE是分开的。 - 迁移在现有数据上失败 —
GENERATED列在创建时填充;对于大表,这可能很慢。在事务中运行并检查进度。
7. 修复提示
EXPLAIN ANALYZE shows a sequential scan instead of an index scan on theposts table. The search_vector column was added as VIRTUAL (or withoutthe STORED keyword) so PostgreSQL cannot create a GIN index on it.
Fix the migration: ALTER TABLE posts DROP COLUMN search_vector; ALTER TABLE posts ADD COLUMN search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(title, '')), 'A') || setweight(to_tsvector('english', coalesce(body, '')), 'B') ) STORED; CREATE INDEX posts_search_idx ON posts USING gin(search_vector);
Confirm the word STORED appears in the column definition.8. PR 描述
## Feature: PostgreSQL native full-text search on posts
- Generated `tsvector` column (`STORED`) with weighted fields: title (A), body (B), tags (C)- GIN index `posts_search_idx` — index scans confirmed via `EXPLAIN ANALYZE`- `searchPosts(q, limit)` uses `websearch_to_tsquery` (safe for user input) and `ts_rank` for relevance ordering- `ts_headline` snippets (max 30 words) with matched terms highlighted- GET `/api/search?q=` — validates query length, returns ranked JSON, `Cache-Control: public, max-age=60`
No third-party search dependency added.