Skip to content

openzim/mwoffliner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3,929 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MWoffliner

MWoffliner is a tool for creating a local offline HTML snapshot of any online MediaWiki instance. It scrapes all articles (or a selection if specified) and creates the corresponding ZIM file. While primarily targeted for Wikimedia projects like Wikipedia and Wiktionary, MWoffliner also supports any recent MediaWiki instance (version 1.27+), though instances with custom skins or highly unusual configurations may have limitations.

Read CONTRIBUTING.md to learn more about MWoffliner development.

User help is available in the FAQ.

NPM

npm node Docker Build Status codecov CodeFactor License Join Slack

Features

  • Scrape with or without image thumbnails
  • Scrape with or without audio/video multimedia content
  • S3 cache (optional)
  • Image size optimization and WebP conversion
  • Scrape all articles in namespaces or title list based
  • Specify additional/non-main namespaces to scrape

Run mwoffliner --help to see all available options.

Prerequisites

  • Docker (or Docker-based engine)
  • amd64 architecture

Installation

The recommended way to install and run mwoffliner is using the pre-built Docker container:

docker pull ghcr.io/openzim/mwoffliner
Run software locally / Build from source

Prerequisites for local execution

  • *NIX Operating System (GNU/Linux, macOS, etc.)

  • Redis — in-memory data store

  • Node.js version 24 (we support only one single Node.js version; other versions might work or might not)

  • Libzim — C++ library for creating ZIM files (automatically downloaded on GNU/Linux & macOS)

  • Various build tools which are probably already installed on your machine:

    • libjpeg-dev — JPEG image processing
    • libglu1 — OpenGL utility library
    • autoconf — automatic configuration system
    • automake — Makefile generator
    • gcc — C compiler

    (These packages are for Debian/Ubuntu systems)

An online MediaWiki instance with its API available.

Installation methods

Build your own container

  1. Clone the repository locally:

    git clone https://github.com/openzim/mwoffliner.git && cd mwoffliner
  2. Build the image:

    docker build . -f docker/Dockerfile -t ghcr.io/openzim/mwoffliner

Run the software locally using NPM

[!WARNING] Local installation requires several system dependencies (see above). Using the Docker image is strongly recommended to avoid setup issues.

Setting up MWoffliner locally for development can be tricky due to several dependencies and version requirements. Follow these steps carefully to avoid common errors.

1. Node.js Version

MWoffliner requires Node.js 24 (other versions may fail).

Compatible Node 24 ranges: >=24 <24.6 or >=24.7 <25.

Check your version:

node -v

If your version does not match, use nvm to install the correct Node.js version.

2. libzim Dependency

MWoffliner depends on @openzim/libzim, which requires the C++ libzim library.

  • On Linux/macOS, MWoffliner can download libzim automatically.
  • On Windows, you must install libzim manually because there are no prebuilt binaries. See the libzim installation guide for details.
3. Compiler Requirements (Windows)

Node 24 on Windows officially supports Visual Studio 2019 (v16) or Visual Studio 2022 (v17).

Ensure C++ build tools are installed and environment variables are set correctly. See Windows Setup for node-gyp for detailed instructions.

4. Node-gyp

MWoffliner uses node-gyp, which enforces strict checks for Node and compiler versions. Make sure you have:

Additional troubleshooting steps if errors persist:
  1. Clear npm cache — a corrupted cache can cause cryptic install failures:

    npm cache clean --force
  2. Delete node_modules and reinstall — stale or partially installed dependencies are a common source of errors:

    rm -rf node_modules package-lock.json
    npm install
  3. Check that all environment variables are set — especially on Windows, PATH, INCLUDE, and LIB must point to the correct Visual Studio and libzim directories. Reopen your terminal after installing new tools.

  4. Verify Redis is running before starting MWoffliner — MWoffliner will fail immediately if it cannot connect to Redis:

    redis-cli ping   # expected output: PONG
  5. Run npm install with verbose logging to see exactly where it fails:

    npm install --verbose
5. Common Errors & Troubleshooting
Error Cause Solution
Node.js version error Node.js version incompatible Install Node 24 with nvm
Cannot find module @openzim/libzim libzim not installed Follow libzim installation guide; Windows users must install manually
node-gyp rebuild failed Wrong Node or compiler version Check Node.js version, Visual Studio version, Python 3.x
zim/archive.h not found C++ headers missing Install libzim system-wide, verify include paths

[!NOTE] Even with these steps, other setup errors may occur. Using Docker is strongly recommended for a smoother experience.

Installation via NPM
npm i -g mwoffliner

[!WARNING] You might need to run this command with the sudo command, depending on how your npm / OS is configured. npm permission checking can be a bit annoying for newcomers. Please read the npm script documentation if you encounter issues.

Usage

Using Docker (Recommended)

# Get help
docker run -v $(pwd)/out:/out -ti ghcr.io/openzim/mwoffliner mwoffliner --help
# Create a ZIM for https://bm.wikipedia.org
docker run -v $(pwd)/out:/out -ti ghcr.io/openzim/mwoffliner \
       mwoffliner --mwUrl=https://bm.wikipedia.org --adminEmail=foo@bar.net
Using NPM / Local Install
# Get help
mwoffliner --help
# Create a ZIM for https://bm.wikipedia.org
mwoffliner --mwUrl=https://bm.wikipedia.org --adminEmail=foo@bar.net

To use MWoffliner with an S3 cache, provide an S3 URL:

--optimisationCacheUrl="https://wasabisys.com/?bucketName=my-bucket&keyId=my-key-id&secretAccessKey=my-sac"

Contribute

If you've retrieved the MWoffliner source code (e.g., via a git clone), you can install and run it locally with your modifications:

npm i
npm run mwoffliner -- --help

Detailed contribution documentation and guidelines are available.

API

MWoffliner provides an API and can be used as a Node.js library. Here's a stub example for your index.mjs file:

import * as mwoffliner from 'mwoffliner';

const parameters = {
  mwUrl: "https://es.wikipedia.org",
  adminEmail: "foo@bar.net",
  verbose: true,
  format: "nopic",
  articleList: "./articleList"
};

mwoffliner.execute(parameters); // returns a Promise

Background

Complementary information about MWoffliner:

  • MediaWiki software is used by thousands of wikis, the most famous ones being the Wikimedia ones, including Wikipedia.
  • MediaWiki is a PHP wiki runtime engine.
  • Wikitext is the markup language that MediaWiki uses.
  • MediaWiki parser converts Wikitext to HTML, which displays in your browser.
  • Read the scraper functional architecture for more details.

License

GPLv3 or later, see LICENSE for more details.

Acknowledgements

This project received funding through NGI Zero Core, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program. Learn more at the NLnet project page.

NLnet foundation logo NGI Zero Logo

About

MediaWiki scraper: all your wiki articles in one highly compressed ZIM file

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors