PostgreSQL 搜索应用 — 上下文包

可复制的上下文包，用于基于 PostgreSQL 和 pgvector 的全文与向量搜索应用，让您的 AI 代理从一开始就能编写正确的 SQL 和迁移代码。

将此内容粘贴到任务开头，以便代理在接触任何查询代码之前理解 Postgres 模式、搜索策略和迁移工作流。

项目背景

A content search application that combines PostgreSQL full-text search (tsvector
+ tsquery) with vector similarity search via the pgvector extension. The backend
is a Next.js 15 App Router API. Embeddings are generated by OpenAI's
text-embedding-3-small model and stored as vector(1536) columns. The database
runs on Supabase (hosted Postgres 16) and is accessed via the Postgres.js driver.

技术栈

Next.js 15 (App Router, Route Handlers for search API)
PostgreSQL 16 with extensions: pgvector, pg_trgm, unaccent
postgres (Postgres.js) driver — NOT pg/node-postgres
OpenAI SDK (text-embedding-3-small, 1536 dimensions)
TypeScript (strict)
Tailwind CSS v4
db-migrate for schema migrations (SQL files, not ORM)

目录结构

src/
  app/
    api/
      search/route.ts        # Unified search endpoint (FTS + vector)
      embed/route.ts         # Embedding generation + upsert
  lib/
    db.ts                    # Postgres.js client singleton
    search.ts                # ftsSearch(), vectorSearch(), hybridSearch()
    embed.ts                 # OpenAI embedding helper
    schema/                  # TypeScript types mirroring DB tables
  components/
    SearchBar.tsx
    SearchResults.tsx
migrations/
  001_initial.sql
  002_add_pgvector.sql
  003_add_trgm_index.sql
  ...
scripts/
  seed.ts                    # Bulk embed + insert content

编码规范

- All SQL lives in src/lib/search.ts or in migrations/ — never inline SQL in
  route handlers or components.
- Use tagged template literals with Postgres.js (sql`...`) for all queries.
  Never concatenate user input into SQL strings.
- Full-text search uses a generated tsvector column with a GIN index.
  Update triggers maintain the column automatically — do not compute tsvector
  in application code.
- Vector columns are vector(1536) for text-embedding-3-small. If the model
  changes, the dimension must change too — this requires a migration.
- Hybrid search combines FTS rank and cosine similarity with a weighted sum.
  The weights live in src/lib/search.ts as named constants, not magic numbers.
- Schema changes require a new numbered SQL file in migrations/ and must be
  applied with: `npm run db:migrate` (which calls db-migrate up).
- All migration files are append-only — never edit a migration that has been
  applied to production.
- Index creation uses CONCURRENTLY in production migrations to avoid table locks.

AI 任务边界

- Do not switch to an ORM (Prisma, Drizzle) without explicit instruction.
  Raw SQL with Postgres.js is intentional for fine-grained query control.
- Do not call the OpenAI API in a request that also queries the DB within a
  transaction — embeddings are generated before the transaction opens.
- Do not use cosine similarity (<->) without a vector index (ivfflat or hnsw).
  Always add the index in the same migration that adds the column.
- Do not store user-supplied text as embeddings without sanitizing PII first.
- Do not change the embedding model without a re-indexing migration that
  updates all existing vector values and the column dimension.
- pg_trgm GIN indexes are used for ILIKE fallback — do not drop them.
- Supabase Row Level Security (RLS) is enabled on all tables. Any new table
  must have RLS enabled and at least one policy added in its migration.

llms.txt

# PostgreSQL Search App
DB: Supabase (Postgres 16) — postgres.js driver (src/lib/db.ts)
Search: FTS (tsvector/tsquery) + vector (pgvector) hybrid (src/lib/search.ts)
Embeddings: OpenAI text-embedding-3-small, 1536 dims (src/lib/embed.ts)
Migrations: db-migrate, SQL files in migrations/ — append-only
RLS: enabled on all tables — new tables need policies
Key indexes: GIN on tsvector column, hnsw on vector column
Do NOT: inline SQL in routes, concatenate user input into SQL, skip RLS

PostgreSQL 搜索应用 — 上下文包

项目背景

技术栈

目录结构

编码规范

AI 任务边界

llms.txt

相关