` (50–60 chars) and a unique\n `<meta name=\"description\">` (120–160 chars). Duplicate titles and descriptions\n cause Google to de-rank or rewrite them. Check both before marking a task done.\n- Every page MUST render a `<link rel=\"canonical\" href=\"...\">` pointing to its own\n canonical URL. This is required even on pages that are not duplicated — it is a\n signal, not just a deduplication tool.\n- Never generate two pages with the same slug or URL path. Before adding a new content\n file or route, confirm the path does not already exist in `src/pages/` or the\n content collection.\n- Every blog post or article MUST include JSON-LD structured data: at minimum\n `Article` with `headline`, `datePublished`, `dateModified`, and `author`.\n Product pages need `Product` schema. FAQ pages need `FAQPage` schema.\n- Open Graph tags (`og:title`, `og:description`, `og:image`, `og:url`) must be\n present on every page. The `og:image` must be an absolute URL (not a relative path).\n- Do NOT use `noindex` on pages that should rank. Do NOT remove `noindex` from pages\n in `src/pages/api/`, `src/pages/admin/`, or any route that should not be crawled.\n\n## URL and routing conventions\n- URLs are lowercase, hyphen-separated, no trailing slashes (or consistently with\n trailing slash if the framework default — pick one and enforce it with a redirect).\n- Never rename a published URL without adding a 301 redirect from the old path.\n Broken inbound links are ranking signals that are lost permanently.\n- Paginated series use `/page/2/` style paths, not query strings (`?page=2`).\n Query-string pagination is not indexed by Google.\n\n## Content and performance\n- All images must have descriptive `alt` text that includes the target keyword where\n natural. Empty `alt=\"\"` is only correct for decorative images.\n- Images must be served in WebP or AVIF format. No JPEG or PNG without a `<picture>`\n element providing a next-gen fallback.\n- Every page must load without render-blocking scripts. No `<script>` without `defer`\n or `async` in the `<head>` unless it is a critical inline script.\n- Internal links must use the full path and must not 404. Before adding a link, verify\n the target page exists.\n\n## Definition of done\n- `astro check` or `tsc --noEmit` passes.\n- `astro build` completes without warnings.\n- Running a spot-check: `curl -s <page-url> | grep -c 'canonical'` returns 1.\n- No duplicate `<title>` values across built HTML (run `grep -r '<title>'` on dist/).\n- JSON-LD is present and valid (use schema.org validator).\n```\n\n## 为什么这些规则\n\n- **每页唯一的标题和描述**是内容网站最具影响力的SEO规则。批量生成内容页面的代理通常重复使用相同的元数据模板，生成数十个技术上不同但对爬虫而言看似相同的页面——从而触发软去重惩罚。\n- **每个页面都设置规范URL，而不仅仅是重复页面**经常被误解。阅读SEO文档的代理通常只在内容明显重复的地方添加规范URL（例如分页）。实际上，每个页面都应自引用其规范URL，以防止参数注入的爬取变体分散链接权益。\n\n## 适用场景\n\n- 博客、文档网站、SEO内容中心和营销网站，这些网站以自然搜索为主要获取渠道。\n\n## 不适用\n\n- 内部工具、仪表盘或应用程序，其中SEO无关紧要—

{ "id": "ai-rules-for-seo-content-sites", "type": "rules", "category": "rules", "locale": "zh", "url": "/zh/rules/ai-rules-for-seo-content-sites", "title": "AI编码规则针对SEO内容网站", "description": "AGENTS.md规则针对以SEO为重点的内容网站，防止重复元数据，强制执行结构化数据，并防止代理破坏可爬取性。", "tools": [ "Cursor", "Claude Code", "Codex", "Windsurf" ], "stack": [ "Astro", "Next.js", "TypeScript" ], "tags": [ "agents-md", "seo", "astro", "nextjs", "typescript", "conventions" ], "difficulty": null, "updated": "2026-06-08", "markdown": "将此文件放在仓库根目录，命名为 `AGENTS.md`。它适用于任何内容密集型网站，这些网站以自然搜索流量为主要增长杠杆——博客、文档网站、营销网站和资源库。\n\n## AGENTS.md\n\n```md title=\"AGENTS.md\"\n# Project Rules — SEO Content Site\n\n## Stack\n- Astro (static) or Next.js (App Router, static export or ISR).\n- TypeScript strict. Content schema enforced via Zod (content collections or manual).\n- Tailwind CSS for styling.\n\n## Hard rules — SEO correctness\n- Every page MUST have a unique `` (50–60 chars) and a unique\n `<meta name=\"description\">` (120–160 chars). Duplicate titles and descriptions\n cause Google to de-rank or rewrite them. Check both before marking a task done.\n- Every page MUST render a `<link rel=\"canonical\" href=\"...\">` pointing to its own\n canonical URL. This is required even on pages that are not duplicated — it is a\n signal, not just a deduplication tool.\n- Never generate two pages with the same slug or URL path. Before adding a new content\n file or route, confirm the path does not already exist in `src/pages/` or the\n content collection.\n- Every blog post or article MUST include JSON-LD structured data: at minimum\n `Article` with `headline`, `datePublished`, `dateModified`, and `author`.\n Product pages need `Product` schema. FAQ pages need `FAQPage` schema.\n- Open Graph tags (`og:title`, `og:description`, `og:image`, `og:url`) must be\n present on every page. The `og:image` must be an absolute URL (not a relative path).\n- Do NOT use `noindex` on pages that should rank. Do NOT remove `noindex` from pages\n in `src/pages/api/`, `src/pages/admin/`, or any route that should not be crawled.\n\n## URL and routing conventions\n- URLs are lowercase, hyphen-separated, no trailing slashes (or consistently with\n trailing slash if the framework default — pick one and enforce it with a redirect).\n- Never rename a published URL without adding a 301 redirect from the old path.\n Broken inbound links are ranking signals that are lost permanently.\n- Paginated series use `/page/2/` style paths, not query strings (`?page=2`).\n Query-string pagination is not indexed by Google.\n\n## Content and performance\n- All images must have descriptive `alt` text that includes the target keyword where\n natural. Empty `alt=\"\"` is only correct for decorative images.\n- Images must be served in WebP or AVIF format. No JPEG or PNG without a `<picture>`\n element providing a next-gen fallback.\n- Every page must load without render-blocking scripts. No `<script>` without `defer`\n or `async` in the `<head>` unless it is a critical inline script.\n- Internal links must use the full path and must not 404. Before adding a link, verify\n the target page exists.\n\n## Definition of done\n- `astro check` or `tsc --noEmit` passes.\n- `astro build` completes without warnings.\n- Running a spot-check: `curl -s <page-url> | grep -c 'canonical'` returns 1.\n- No duplicate `<title>` values across built HTML (run `grep -r '<title>'` on dist/).\n- JSON-LD is present and valid (use schema.org validator).\n```\n\n## 为什么这些规则\n\n- **每页唯一的标题和描述**是内容网站最具影响力的SEO规则。批量生成内容页面的代理通常重复使用相同的元数据模板，生成数十个技术上不同但对爬虫而言看似相同的页面——从而触发软去重惩罚。\n- **每个页面都设置规范URL，而不仅仅是重复页面**经常被误解。阅读SEO文档的代理通常只在内容明显重复的地方添加规范URL（例如分页）。实际上，每个页面都应自引用其规范URL，以防止参数注入的爬取变体分散链接权益。\n\n## 适用场景\n\n- 博客、文档网站、SEO内容中心和营销网站，这些网站以自然搜索为主要获取渠道。\n\n## 不适用\n\n- 内部工具、仪表盘或应用程序，其中SEO无关紧要——规范URL/结构化数据要求会增加无益的开销。" }