Skip to content

Source Types

ramacharanreddy-k edited this page Apr 27, 2026 · 2 revisions

WikiNow ingests 7 source types. Each is saved to raw/ as immutable markdown, then returned to the AI for compilation into wiki pages.


Web URLs

$ wn ingest https://en.wikipedia.org/wiki/Transformer_(deep_learning_model)
╭─ Ingested ───────────────────────────────────────────╮
│ Transformer (deep learning model)                    │
│ raw/transformer-deep-learning-model.md · 18,230 chars│
│ Start MCP mode to compile into wiki                  │
╰──────────────────────────────────────────────────────╯

Uses Jina Reader -- a free service that renders web pages (including JavaScript-heavy sites) and returns clean markdown. No API key required at 20 requests per minute. Set JINA_API_KEY for 500 RPM.

Jina also handles PDF URLs natively -- if you pass a link to a PDF hosted online, it extracts the text without needing pymupdf installed locally.


YouTube

$ wn ingest https://www.youtube.com/watch?v=xuCn8ux2gbs
╭─ Ingested ───────────────────────────────────────────╮
│ How do solar panels work?                            │
│ raw/youtube-how-do-solar-panels-work.md · 3,450 chars│
│ Start MCP mode to compile into wiki                  │
╰──────────────────────────────────────────────────────╯

YouTube URLs are auto-detected (youtube.com/watch, youtu.be/, youtube.com/shorts/).

Two-stage extraction:

  1. Subtitles first -- yt-dlp extracts English subtitles (manual or auto-generated) in json3 format, parsed into plain text
  2. Whisper fallback -- if no English subtitles exist, downloads the audio track and transcribes locally using Whisper's turbo model

The saved file includes title, channel, URL, duration, description, and the full transcript.

Requires: pip install wikinow[youtube] and optionally pip install wikinow[whisper] + ffmpeg for the audio fallback.


Local PDFs

$ wn ingest research-paper.pdf
╭─ Ingested ──────────────────────────────────────────╮
│ Attention Is All You Need                           │
│ raw/attention-is-all-you-need.md · 25,100 chars     │
│ Start MCP mode to compile into wiki                 │
╰─────────────────────────────────────────────────────╯

Extracts text page-by-page using pymupdf. Title comes from PDF metadata if available, otherwise derived from the filename (research-paper.pdf becomes "Research Paper").

Requires: pip install wikinow[pdf]


Epub Books

$ wn ingest thinking-fast-and-slow.epub
╭─ Ingested ──────────────────────────────────────────╮
│ Thinking, Fast and Slow                             │
│ raw/thinking-fast-and-slow.md · 142,500 chars       │
│ Start MCP mode to compile into wiki                 │
╰─────────────────────────────────────────────────────╯

Extracts text chapter-by-chapter using ebooklib + BeautifulSoup. Title and author from Dublin Core metadata. Chapters separated by --- dividers.

For large books, consider ingesting one chapter at a time for better wiki compilation -- the AI can focus on each chapter's concepts individually.

Requires: pip install wikinow[epub]


Audio / Video

$ wn ingest podcast-episode.mp3
╭─ Ingested ──────────────────────────────────────────╮
│ Podcast Episode                                     │
│ raw/podcast-episode.md · 8,200 chars                │
│ Start MCP mode to compile into wiki                 │
╰─────────────────────────────────────────────────────╯

Transcribes locally using OpenAI's Whisper turbo model. Supports .mp3, .wav, .m4a, .ogg, .flac, .webm.

The saved file includes title, detected language, duration, and the full transcript.

English only -- Whisper detects the language automatically. If the audio is not English, the ingest is rejected with a clear error. This is intentional -- mixed-language wikis create inconsistent cross-references and search results.

Requires: pip install wikinow[whisper] + ffmpeg


Text / Markdown

$ wn ingest meeting-notes.md
╭─ Ingested ──────────────────────────────────────────╮
│ Meeting Notes                                       │
│ raw/meeting-notes.md · 1,200 chars                  │
│ Start MCP mode to compile into wiki                 │
╰─────────────────────────────────────────────────────╯

Direct file read. No processing, no dependencies. Title derived from filename: my-research_notes.md becomes "My Research Notes".

Works with .txt, .md, or any plain text file.


Via MCP

The AI can also ingest during conversation using the same tools:

AI calls ingest_url("https://...")     --> web page or YouTube
AI calls ingest_file("/path/to/file") --> local PDF, epub, audio, text
AI calls ingest_text("notes", "...")   --> text pasted in conversation

All paths go through the same dedup check -- if the SHA-256 hash matches something already in raw/, it skips.

Clone this wiki locally