Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
*

# Allow only what the runtime needs
!docker/
!docker/**
!explainshell/
!explainshell/**
!requirements.txt
!start.sh
6 changes: 4 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ python -m explainshell.manager extract --mode source /path/to/manpage.1.gz
- `manpage.py` - Man page reading and HTML conversion
- `help_constants.py` - Shell constant definitions for help text
- `util.py` - Shared utilities (group_continuous, Peekable, name_section)
- `config.py` - Configuration (DB_PATH, HOST_IP, DEBUG, MANPAGE_URLS)
- `config.py` - Configuration defaults (DB_PATH, HOST_IP, DEBUG, MANPAGE_URLS)
- `extraction/` - Man page option extraction pipeline
- `__init__.py` - Public API: `make_extractor(mode)` factory
- `types.py` - Shared types (ExtractionResult, ExtractionStats, BatchResult, ExtractorConfig, Extractor protocol)
Expand All @@ -174,6 +174,8 @@ python -m explainshell.manager extract --mode source /path/to/manpage.1.gz
- `llm_bench.py` - LLM extractor benchmark tool (run/compare metrics reports)
- `fetch_manned.py` - Fetch man pages from manned.org weekly dump
- `mandoc-md` - Custom mandoc binary with markdown output support
- `docker/` - Container runtime assets
- `docker-entrypoint.sh` - Gunicorn entrypoint used by the Docker image
- `tests/` - Unit tests (`test_*.py`), fixtures
- `tests/e2e/` - Playwright e2e tests, snapshots, and dedicated `e2e.db`
- `tests/regression/` - Parsing regression tests and manpage .gz fixtures
Expand Down Expand Up @@ -227,7 +229,7 @@ Hermetic setup: uses a dedicated `tests/e2e/e2e.db` and random port selection. S

### Deployment

The app is deployed to [Fly.io](https://fly.io) with two machines in the `iad` (Virginia) region. The SQLite database is baked into the Docker image at build time (downloaded as `.zst` from the GitHub release, decompressed during `docker build`).
The app is deployed to [Fly.io](https://fly.io) with two machines in the `iad` (Virginia) region. The SQLite database is baked into the Docker image at build time (downloaded as `.zst` from the GitHub release, decompressed during `docker build`). The Dockerfile uses a multi-stage build so Python dependencies and the live DB can cache independently; refresh the baked DB by changing the `DB_CACHE_BUST` build arg when needed. The Gunicorn container path uses `explainshell.web:create_app()` directly; runtime env such as `DB_PATH`, `DEBUG`, and `LOG_LEVEL` is resolved inside the app factory rather than shell-rendering Python arguments in `docker/docker-entrypoint.sh`. App logs default to `INFO`. Gunicorn access logs are disabled by default and can be enabled via `GUNICORN_ACCESS_LOG`, with `GUNICORN_ACCESS_LOG_FILE` and `GUNICORN_ACCESS_LOG_FORMAT` available for overrides.

**Production infrastructure:**

Expand Down
42 changes: 31 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,20 +1,40 @@
FROM python:3.12-slim
FROM python:3.12-slim AS python-deps

ENV VENV_PATH=/opt/venv
ENV PATH="${VENV_PATH}/bin:${PATH}"

WORKDIR /opt/build
COPY requirements.txt .
RUN python -m venv "${VENV_PATH}" \
&& pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir -r requirements.txt

FROM alpine:3.18 AS db

ARG DB_URL=https://github.com/idank/explainshell/releases/download/db-latest/explainshell.db.zst
ARG DB_CACHE_BUST=0

RUN apt-get update \
&& apt-get install -y --no-install-recommends wget zstd \
&& rm -rf /var/lib/apt/lists/*
RUN apk add --no-cache curl zstd

WORKDIR /opt/webapp
COPY requirements.txt .
RUN pip3 install --no-cache-dir --no-warn-script-location -r requirements.txt
WORKDIR /opt/db
RUN printf '%s\n' "${DB_CACHE_BUST}" > .cache-bust \
&& curl -fsSL -o explainshell.db.zst "$DB_URL" \
&& zstd -d --rm explainshell.db.zst

RUN wget -q -O explainshell.db.zst "$DB_URL" && zstd -d --rm explainshell.db.zst
FROM python:3.12-slim AS runtime

COPY start.sh .
ENV VENV_PATH=/opt/venv
ENV PATH="${VENV_PATH}/bin:${PATH}"
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV DB_PATH=/opt/webapp/explainshell.db

WORKDIR /opt/webapp
COPY --from=python-deps "${VENV_PATH}" "${VENV_PATH}"
COPY --from=db /opt/db/explainshell.db ./explainshell.db
COPY explainshell/ explainshell/
COPY --chmod=755 docker/docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh

EXPOSE 8080
EXPOSE 5000

CMD ["./start.sh"]
CMD ["docker-entrypoint.sh"]
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,18 @@ $ make serve
# open http://localhost:5000
```

Runtime env vars for the web app:

- `HOST_IP` - bind address for the local dev server, default `127.0.0.1`
- `DB_PATH` - SQLite database path
- `DEBUG` - enables Flask debug behavior and debug-only web routes/templates
- `LOG_LEVEL` - log level for `explainshell.*` application logs
- `GUNICORN_WORKERS` - Gunicorn worker count for the container entrypoint
- `GUNICORN_THREADS` - Gunicorn thread count per worker
- `GUNICORN_ACCESS_LOG` - enables Gunicorn access logs when set to `1` or `true` (disabled by default)
- `GUNICORN_ACCESS_LOG_FILE` - Gunicorn access log destination when enabled, default `-`
- `GUNICORN_ACCESS_LOG_FORMAT` - Gunicorn access log format string when enabled

## Storage

Processed manpages live in a single SQLite database (`explainshell.db`) with three tables:
Expand Down
23 changes: 23 additions & 0 deletions docker/docker-entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/sh
set -e

HOST_IP="${HOST_IP:-0.0.0.0}"
PORT="${PORT:-5000}"
GUNICORN_WORKERS="${GUNICORN_WORKERS:-2}"
GUNICORN_THREADS="${GUNICORN_THREADS:-4}"
GUNICORN_ACCESS_LOG="${GUNICORN_ACCESS_LOG:-0}"
GUNICORN_ACCESS_LOG_FILE="${GUNICORN_ACCESS_LOG_FILE:--}"
GUNICORN_ACCESS_LOG_FORMAT="${GUNICORN_ACCESS_LOG_FORMAT:-%(t)s \"%(r)s\" %(s)s %(b)s %(D)sμs}"

export DB_PATH="${DB_PATH:-/opt/webapp/explainshell.db}"
export LOG_LEVEL="${LOG_LEVEL:-WARN}"

set -- gunicorn -w "$GUNICORN_WORKERS" --threads "$GUNICORN_THREADS" -b "$HOST_IP:$PORT"

if [ "$GUNICORN_ACCESS_LOG" = "1" ] || [ "$GUNICORN_ACCESS_LOG" = "true" ]; then
set -- "$@" \
--access-logfile "$GUNICORN_ACCESS_LOG_FILE" \
--access-logformat "$GUNICORN_ACCESS_LOG_FORMAT"
fi

exec "$@" "explainshell.web:create_app()"
41 changes: 36 additions & 5 deletions explainshell/web/__init__.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,35 @@
import logging
import os
import time

from flask import Flask, current_app, g
from explainshell import config, store
from explainshell.logger.logging_interceptor import InterceptHandler

# Cache distros() result; refreshed at most every 5 minutes.
_distros_cache = None
_distros_cache_time = 0
_DISTROS_TTL = 300


def _parse_debug(value: str | None, default: bool) -> bool:
if value is None:
return default
return value.lower() not in ("0", "false", "no")


def _configure_web_logging(log_level: str) -> None:
app_logger = logging.getLogger("explainshell")
if not any(
isinstance(handler, InterceptHandler) for handler in app_logger.handlers
):
app_logger.addHandler(InterceptHandler())

level = getattr(logging, log_level.upper(), logging.INFO)
app_logger.setLevel(level)
app_logger.propagate = False


def get_store() -> store.Store:
"""Return a per-request read-only Store, creating one if needed."""
if "store" not in g:
Expand All @@ -25,19 +46,29 @@ def get_cached_distros():
return _distros_cache


def create_app(db_path=None):
def create_app(
db_path: str | None = None,
debug: bool | None = None,
log_level: str | None = None,
):
"""Application factory."""
_configure_web_logging(log_level or os.getenv("LOG_LEVEL", "INFO"))

app = Flask(__name__)
app.config.from_object(config)

db = db_path or config.DB_PATH
if db:
app.config["DB_PATH"] = db
app.debug = (
debug if debug is not None else _parse_debug(os.getenv("DEBUG"), config.DEBUG)
)

db_path = db_path or os.getenv("DB_PATH") or config.DB_PATH
if db_path:
app.config["DB_PATH"] = db_path

from explainshell.web.views import bp, debug_bp

app.register_blueprint(bp)
if config.DEBUG:
if app.debug:
app.register_blueprint(debug_bp)

@app.teardown_appcontext
Expand Down
19 changes: 16 additions & 3 deletions explainshell/web/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@
import markupsafe

import cmarkgfm
from flask import Blueprint, render_template, request, redirect
from flask import (
Blueprint,
current_app,
has_app_context,
render_template,
request,
redirect,
)

import bashlex.errors

Expand All @@ -18,6 +25,12 @@
bp = Blueprint("main", __name__)


def _debug_enabled() -> bool:
if has_app_context():
return bool(current_app.config.get("DEBUG", False))
return config.DEBUG


def _is_known_distro(name):
"""Return True if *name* matches a distro in the cached distros list."""
for distro, _release in get_cached_distros():
Expand Down Expand Up @@ -396,7 +409,7 @@ def explain_program(program, store, distro=None, release=None):
}

debug_info = {}
if config.DEBUG:
if _debug_enabled():
for i, o in enumerate(raw_mp.options):
debug_info[f"option-{i}"] = {
"kind": "option",
Expand Down Expand Up @@ -536,7 +549,7 @@ def explain_cmd(
helptext = sorted(text_ids.items(), key=lambda kv: id_start_pos[kv[1]])

debug_info = {}
if config.DEBUG:
if _debug_enabled():
for group in groups:
for m in group.results:
if m.debug_info and m.text in text_ids:
Expand Down
4 changes: 2 additions & 2 deletions fly.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ primary_region = 'iad'
DB_PATH = '/opt/webapp/explainshell.db'
DEBUG = 'false'
HOST_IP = '0.0.0.0'
PORT = '8080'
PORT = '5000'

[http_service]
internal_port = 8080
internal_port = 5000
force_https = true
auto_stop_machines = 'stop'
auto_start_machines = true
Expand Down
18 changes: 7 additions & 11 deletions runserver.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
import logging
import os

from explainshell import config
from explainshell.web import create_app
from explainshell.logger.logging_interceptor import InterceptHandler


if __name__ == "__main__":
# activate logging and redirect all logs to loguru
logging.basicConfig(handlers=[InterceptHandler()], level=logging.DEBUG, force=True)

def main() -> None:
app = create_app()
port = int(os.environ.get("PORT", 5000))
host = os.environ.get("HOST_IP", "127.0.0.1")

app.run(debug=app.config["DEBUG"], host=host, port=port)

if config.HOST_IP:
app.run(debug=config.DEBUG, host=config.HOST_IP, port=port)
else:
app.run(debug=config.DEBUG, port=port)

if __name__ == "__main__":
main()
7 changes: 0 additions & 7 deletions start.sh

This file was deleted.

22 changes: 22 additions & 0 deletions tests/test_runserver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import runserver


class FakeApp:
def __init__(self) -> None:
self.config = {"DEBUG": False}
self.run_kwargs: dict[str, object] | None = None

def run(self, **kwargs: object) -> None:
self.run_kwargs = kwargs


def test_main_binds_to_loopback_by_default(monkeypatch) -> None:
app = FakeApp()

monkeypatch.delenv("HOST_IP", raising=False)
monkeypatch.delenv("PORT", raising=False)
monkeypatch.setattr(runserver, "create_app", lambda: app)

runserver.main()

assert app.run_kwargs == {"debug": False, "host": "127.0.0.1", "port": 5000}
47 changes: 47 additions & 0 deletions tests/test_web_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import logging
import os
import unittest
from unittest.mock import patch

from explainshell.logger.logging_interceptor import InterceptHandler
from explainshell.web import create_app


class TestCreateAppConfig(unittest.TestCase):
def setUp(self) -> None:
self.logger = logging.getLogger("explainshell")
self.original_handlers = list(self.logger.handlers)
self.original_level = self.logger.level
self.original_propagate = self.logger.propagate

def tearDown(self) -> None:
self.logger.handlers = self.original_handlers
self.logger.setLevel(self.original_level)
self.logger.propagate = self.original_propagate

def test_create_app_reads_db_path_from_env_at_call_time(self) -> None:
with patch.dict(os.environ, {"DB_PATH": "/tmp/from-env.db"}, clear=False):
app = create_app()

self.assertEqual(app.config["DB_PATH"], "/tmp/from-env.db")

def test_create_app_reads_debug_from_env_at_call_time(self) -> None:
with patch.dict(os.environ, {"DEBUG": "false"}, clear=False):
app = create_app()

self.assertNotIn("manpage.show_manpage", app.view_functions)
self.assertFalse(app.config["DEBUG"])

def test_create_app_configures_explainshell_logging_once(self) -> None:
with patch.dict(os.environ, {"LOG_LEVEL": "ERROR"}, clear=False):
create_app()
create_app(log_level="WARNING")

handlers = [
handler
for handler in self.logger.handlers
if isinstance(handler, InterceptHandler)
]
self.assertEqual(len(handlers), 1)
self.assertEqual(self.logger.level, logging.WARNING)
self.assertFalse(self.logger.propagate)