MQuery is an HTTP API server for mining language corpora using Manatee-Open engine. Unlike other Manatee-based solutions, MQuery uses more fine-tuned C bindings without relying on SWIG, and naturally leverages a worker queue architecture for efficient query processing and scalability.
The simplest way to run MQuery is using Docker Compose, which automatically sets up the server, worker, and Redis:
-
Clone the repository:
git clone https://github.com/czcorpus/mquery.git cd mquery -
Create a Docker configuration file
conf-docker.jsonbased onconf.sample.json:cp conf.sample.json conf-docker.json
-
Edit
conf-docker.jsonto match your setup:- Set
listenAddressto0.0.0.0(to accept connections from outside the container) - Set
listenPortto8989 - Set Redis host to
redis(the service name in docker-compose.yml) - Configure your corpora paths:
registryDir:/var/lib/manatee/registrysplitCorporaDir:/var/lib/manatee/split
- Set
-
Place your corpus data and registry files in directories that will be mounted:
- The docker-compose setup creates volumes for corpus data at
/var/lib/manatee - You can modify the volume mounts in
docker-compose.ymlto point to your existing corpus directories
- The docker-compose setup creates volumes for corpus data at
-
Start the services:
docker-compose up -d
-
Access the API at
http://localhost:8989
The Docker Compose setup includes:
- mquery-server: HTTP API server (port 8989)
- mquery-worker: Background worker for processing corpus queries
- redis: Redis database for job queuing and results caching
- View logs:
docker-compose logs -f - Stop services:
docker-compose down - Rebuild after code changes:
docker-compose up -d --build
If you prefer to install MQuery manually without Docker:
- a working Linux server with installed Manatee-open library
- Redis database
- Go language compiler and tools
- (optional) an HTTP proxy server (Nginx, Apache, ...)
- Install
Golanguage environment, either via a package manager or manually from Go download page- make sure
/usr/local/go/binand~/go/binare in your$PATHso you can run any installed Go tools without specifying a full path
- make sure
- Install Manatee-open from the download page. No specific language bindings are required.
configure --with-pcre --disable-python && make && sudo make install && sudo ldconfig
- Get MQuery sources (
git clone --depth 1 https://github.com/czcorpus/mquery.git) - Run
./configure - Run
make - Run
make install- the application will be installed in
/opt/mquery - for data and registry,
/var/opt/corpora/dataand/var/opt/corpora/registrydirectories will be created - systemd services
mquery-server.serviceandmquery-worker-all.targetwill be created
- the application will be installed in
- Copy at least one corpus and its configuration (registry) into respective directories (
/var/opt/corpora/data,/var/opt/corpora/registry) - Update corpora entries in
/opt/mquery/conf.jsonfile to match your installed corpora - start the service:
systemctl start mquery-serversystemctl start mquery-worker-all.target
MQuery supports optional token-based authentication via a configurable HTTP header. When enabled, every request must include the header with a valid token.
Relevant configuration fields in conf.json:
{
"authHeaderName": "X-API-Key",
"authTokens": [
"sha256:a3f1c8d2...",
"sha256:9e107d9d..."
],
"localNetworks": [
"127.0.0.0/8",
"192.168.1.0/24"
],
"knownProxies": [
"192.168.1.10"
]
}Tokens in authTokens can be stored either as plaintext (not recommended) or as SHA-256 hashes prefixed with sha256: (recommended).
-
Choose a secret token (use a long random string):
openssl rand -hex 32 # example output: 4a7b9c2e1f3d8a6b... -
Hash it for storage in
conf.json:echo -n "your-secret-token" | sha256sum | awk '{print "sha256:" $1}' # output: sha256:a3f1c8d2...
-
Paste the
sha256:...value intoauthTokensin your config.
Clients send the original (unhashed) token in the configured header:
X-API-Key: your-secret-token
Requests from IPs within any localNetworks CIDR range are exempt from auth token checks, provided the source IP is not also listed in knownProxies. If localNetworks is not set, only the exact listenAddress is treated as local.
If a reverse proxy shares an IP with a local network (e.g. runs on the same host), add its IP to knownProxies to ensure its forwarded requests still require auth.
For the most recent API Docs, please see https://korpus.cz/mquery-test/docs/