Personal blog and landing page of a software engineer. Built with Hugo + Hextra, deployed via GitHub Pages. Includes a Scrapy-based crawler that fetches RSS feeds and scrapes sites, saving content as markdown with Obsidian-compatible [[wikilinks]].
├── blog/ # Hugo + Hextra site
│ ├── hugo.toml
│ ├── content/
│ ├── archetypes/
│ └── static/
├── crawler/ # Scrapy RSS crawler
│ ├── src/ # Spiders, pipelines, linker
│ ├── tests/
│ ├── sources.json # RSS sources configuration
│ └── pyproject.toml
├── data/ # Crawled markdown files (gitignored)
├── docs/ # Project documentation
│ ├── ARCHITECTURE.md
│ ├── GUIDELINES.md
│ ├── CONTRIBUTING.md
│ ├── SECURITY.md
│ ├── SOURCES.md
│ └── PUBLISHING.md
├── AGENTS.md # Instructions for AI agents
└── .github/workflows/ # CI/CD
# Start blog dev server (hot reload at http://localhost:1313)
docker compose up blog
# Run crawler (on demand)
docker compose run crawlerThe crawler writes markdown files to data/, which is shared with the blog container via a mounted volume.
cd blog
hugo server -DSite available at http://localhost:1313/. Requires Hugo 0.159.1+.
cd crawler
uv sync
uv run scrapy crawl feedsRequires Python 3.14+ and uv.
Fetches all RSS sources, crawls sites for older posts, saves .md files to data/, and creates [[wikilinks]] between related documents. See crawler/README.md for details.
| Document | Content |
|---|---|
| AGENTS.md | Instructions for AI agents |
| ARCHITECTURE.md | Architecture and pipeline |
| GUIDELINES.md | Content conventions and frontmatter |
| CONTRIBUTING.md | How to contribute |
| SECURITY.md | Security and sensitive data |
| SOURCES.md | Source catalog by domain |
| PUBLISHING.md | Publishing pipeline |
Personal project — private use.