docs2context

docs2context is a CLI tool that automates documentation scraping, cleaning, and aggregation. It intelligently searches for project documentation, crawls through related pages, and compiles everything into a single comprehensive markdown file optimized for context integration.

Simply put, call docs2context on a particular package, select the correct link from a list of options, and get back a large document primed for ingestion into an AI for pair programming

Installation

# Install locally
npm install

# Link the CLI for development
npm link

# Set up required API key for GPT-4o-mini content cleaning
export OPENAI_API_KEY=your_api_key

Usage

(express as an example, replace with any package/library)

# Search for documentation by project name
docs2context express

# Provide a direct URL to documentation
docs2context express -u https://expressjs.com/
# or
docs2context express --url https://expressjs.com/

# Disable AI cleaning (faster but less refined output)
docs2context express --disable-ai

# Control concurrent processing (default: 10)
docs2context express --concurrency 20

# Get help
docs2context --help

Future Plans

Implement chunking, embeddings, and keyword/similarity search over documents as an agent tool.
Implement MCP to allow for any agent to interact and understand documentation in seconds
Implement multiple different model providers besides OAI
Allow cursor to load docs with this tool through function calling / agentic behavior

Advanced Options

Option	Description
`--url <url>`	Provide a direct URL to documentation source
`--disable-ai`	Skip AI cleaning of content (faster but less refined)
`--concurrency <number>`	Set the number of concurrent operations (default: 10)
`--verbose`	Enable verbose logging for debugging
`--output <path>`	Specify custom output file path

Development

This project is built with Node.js and leverages:

Core Web Technologies:
- axios - For efficient HTTP requests
- cheerio - For powerful HTML parsing
- node-html-markdown - For HTML to markdown conversion
AI Integration:
- OpenAI API - For content cleaning and enhancement
Concurrency & Performance:
- p-queue - For controlled parallel processing
User Experience:
- commander - For elegant CLI commands
- inquirer - For interactive prompts
- ora - For informative terminal spinners

How It Works

Discovery Phase:
- Searches for documentation or uses provided URL
- Confirms with user to ensure correct source
Crawling Phase:
- Builds a graph of documentation pages through recursive crawling
- Filters to focus only on relevant documentation pages
Processing Phase:
- Scrapes content in batches with controlled concurrency
- Intelligently extracts the main content from each page
- Optionally cleans and enhances content with GPT-4o-mini
Compilation Phase:
- Sorts pages in logical reading order
- Generates comprehensive table of contents
- Compiles everything into a single markdown file

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
bin		bin
src		src
.gitignore		.gitignore
README.md		README.md
ai-plan.md		ai-plan.md
package-lock.json		package-lock.json
package.json		package.json
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docs2context

Installation

Usage

Future Plans

Advanced Options

Development

How It Works

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

docs2context

Installation

Usage

Future Plans

Advanced Options

Development

How It Works

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages