feat: improve docs search#1277
Conversation
Phase 1 of Algolia search improvements: - Add EnhancedDocsSearchItem type with new fields: - pageTitle: Always the parent page title - description: From frontmatter (page-level only) - content: Text content (truncated ~2000 chars) - headingLevel: 0 for page, 2 for H2, 3 for H3 - isPageLevel: True if page-level record (not a heading) - Create scripts/indexDocsForSearch.ts: - Parses all MDX/MD content files - Extracts frontmatter using remark - Creates page-level records with intro content - Extracts H2/H3 headings with surrounding content - Creates heading-level records with anchor links - Batches uploads to Algolia (1000 per batch) - Gracefully handles missing Algolia credentials - Update package.json: - Add 'index-docs' script - Run new indexer in prebuild before index-apis This enables: - Deep linking to specific sections via #anchor URLs - Better relevance for specific queries - Smaller, more focused search records - Content-based search (not just titles) Co-authored-by: chris <chris@knock.app>
|
Cursor Agent can help with this pull request. Just |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-authored-by: chris <chris@knock.app>
|
@cjbell should i be able to test this on preview link? I'm trying it out but not getting results by querying for headers or content |
Resolve conflicts: - package.json: Combined split-specs from main with index-docs from this branch - Autocomplete.tsx: Used main's refactored content variable pattern, added EnhancedDocsSearchItem import, and added pageTitle display for heading-level search results only (not page-level to avoid redundant title display) Co-authored-by: Chris Bell <chris@cjbell.co>
Move image removal before link removal to prevent images from being partially processed. Since images use  which contains the link pattern [alt](url), the link regex was matching first and leaving behind '!alt text' in the indexed content. Co-authored-by: Chris Bell <chris@cjbell.co>
- Remove generateAlgoliaIndex function from lib/content.server.ts - Remove call to generateAlgoliaIndex from pages/[...slug].tsx - Fix /index replacement to use anchored regex (/\/index$/) to prevent corrupting paths containing 'index' as a substring The new indexDocsForSearch.ts script now handles all docs indexing during prebuild, eliminating the duplicate records that were being created by the old per-page indexing during getStaticProps. Co-authored-by: Chris Bell <chris@cjbell.co>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
@cjbell u still wanna do this or no |
|
@cursor rebase this against main, then make updates such that we start indexing the api / mapi pages so things like rate limits, authentication etc all appear in search results. those are missing right now. |
Co-authored-by: Chris Bell <chris@cjbell.co>
Co-authored-by: Chris Bell <chris@cjbell.co>
Description
This PR introduces a new indexing strategy for the docs, which will now include headings and content within a page in addition to the page title/tags that we previously indexed.
Note: right now we're not indexing API content within this result set, but I can easily change that if we'd like!