-
Notifications
You must be signed in to change notification settings - Fork 0
Source Types
WikiNow ingests 7 source types. Each is saved to raw/ as immutable markdown, then returned to the AI for compilation into wiki pages.
$ wn ingest https://en.wikipedia.org/wiki/Transformer_(deep_learning_model)
╭─ Ingested ───────────────────────────────────────────╮
│ Transformer (deep learning model) │
│ raw/transformer-deep-learning-model.md · 18,230 chars│
│ Start MCP mode to compile into wiki │
╰──────────────────────────────────────────────────────╯
Uses Jina Reader -- a free service that renders web pages (including JavaScript-heavy sites) and returns clean markdown. No API key required at 20 requests per minute. Set JINA_API_KEY for 500 RPM.
Jina also handles PDF URLs natively -- if you pass a link to a PDF hosted online, it extracts the text without needing pymupdf installed locally.
$ wn ingest https://www.youtube.com/watch?v=xuCn8ux2gbs
╭─ Ingested ───────────────────────────────────────────╮
│ How do solar panels work? │
│ raw/youtube-how-do-solar-panels-work.md · 3,450 chars│
│ Start MCP mode to compile into wiki │
╰──────────────────────────────────────────────────────╯
YouTube URLs are auto-detected (youtube.com/watch, youtu.be/, youtube.com/shorts/).
Two-stage extraction:
- Subtitles first -- yt-dlp extracts English subtitles (manual or auto-generated) in json3 format, parsed into plain text
-
Whisper fallback -- if no English subtitles exist, downloads the audio track and transcribes locally using Whisper's
turbomodel
The saved file includes title, channel, URL, duration, description, and the full transcript.
Requires: pip install wikinow[youtube] and optionally pip install wikinow[whisper] + ffmpeg for the audio fallback.
$ wn ingest research-paper.pdf
╭─ Ingested ──────────────────────────────────────────╮
│ Attention Is All You Need │
│ raw/attention-is-all-you-need.md · 25,100 chars │
│ Start MCP mode to compile into wiki │
╰─────────────────────────────────────────────────────╯
Extracts text page-by-page using pymupdf. Title comes from PDF metadata if available, otherwise derived from the filename (research-paper.pdf becomes "Research Paper").
Requires: pip install wikinow[pdf]
$ wn ingest thinking-fast-and-slow.epub
╭─ Ingested ──────────────────────────────────────────╮
│ Thinking, Fast and Slow │
│ raw/thinking-fast-and-slow.md · 142,500 chars │
│ Start MCP mode to compile into wiki │
╰─────────────────────────────────────────────────────╯
Extracts text chapter-by-chapter using ebooklib + BeautifulSoup. Title and author from Dublin Core metadata. Chapters separated by --- dividers.
For large books, consider ingesting one chapter at a time for better wiki compilation -- the AI can focus on each chapter's concepts individually.
Requires: pip install wikinow[epub]
$ wn ingest podcast-episode.mp3
╭─ Ingested ──────────────────────────────────────────╮
│ Podcast Episode │
│ raw/podcast-episode.md · 8,200 chars │
│ Start MCP mode to compile into wiki │
╰─────────────────────────────────────────────────────╯
Transcribes locally using OpenAI's Whisper turbo model. Supports .mp3, .wav, .m4a, .ogg, .flac, .webm.
The saved file includes title, detected language, duration, and the full transcript.
English only -- Whisper detects the language automatically. If the audio is not English, the ingest is rejected with a clear error. This is intentional -- mixed-language wikis create inconsistent cross-references and search results.
Requires: pip install wikinow[whisper] + ffmpeg
$ wn ingest meeting-notes.md
╭─ Ingested ──────────────────────────────────────────╮
│ Meeting Notes │
│ raw/meeting-notes.md · 1,200 chars │
│ Start MCP mode to compile into wiki │
╰─────────────────────────────────────────────────────╯
Direct file read. No processing, no dependencies. Title derived from filename: my-research_notes.md becomes "My Research Notes".
Works with .txt, .md, or any plain text file.
The AI can also ingest during conversation using the same tools:
AI calls ingest_url("https://...") --> web page or YouTube
AI calls ingest_file("/path/to/file") --> local PDF, epub, audio, text
AI calls ingest_text("notes", "...") --> text pasted in conversation
All paths go through the same dedup check -- if the SHA-256 hash matches something already in raw/, it skips.