feat: native WireGuard VPN egress mode (USING_WIREGUARD) by alex4108 · Pull Request #1097 · calibrain/shelfmark

alex4108 · 2026-07-01T08:56:22Z

Add an opt-in WireGuard egress path alongside the existing Tor mode

Add an opt-in WireGuard egress path alongside the existing Tor mode. - wireguard.sh: brings up a wg-quick tunnel from a mounted config, installs a fail-closed iptables kill-switch (default-drop OUTPUT; allow loopback, the tunnel device, WireGuard endpoint handshake, and configurable LAN ranges), optionally enforces the tunnel's DNS, and supervises the tunnel with a handshake-age healthcheck that bounces it when stale. - entrypoint.sh: run wireguard.sh when USING_WIREGUARD=true (root required, mutually exclusive with USING_TOR). - Dockerfile: install wireguard-tools + iproute2; chmod wireguard.sh. - compose/docker-compose.wireguard.yml + docker-compose.dev.wireguard.yml: ready-to-use variants mirroring the Tor compose files. - readme: document the new mode and env vars. Modeled on the existing Tor transparent-proxy pattern for consistency.

Validated against a real Proton WireGuard endpoint; fixes found in testing: - Strip IPv6 (Address/AllowedIPs/DNS) by default (WIREGUARD_DISABLE_IPV6): wg-quick aborts on the ip6tables 'raw' table many container kernels lack, and IPv6 is an extra leak surface. Configurable off. - sysctl shim: wg-quick unconditionally writes net.ipv4.conf.all.src_valid_mark=1, which fails on read-only /proc/sys in a container even though the value is already set via the compose sysctls key. No-op that single redundant write; everything else hits the real sysctl. - Dockerfile: add procps (wg-quick needs sysctl) + ca-certificates. - WIREGUARD_DNS override: decouple the resolver from the tunnel's pushed DNS so deployments can point at an unfiltered/encrypted LAN resolver while downloads still egress the tunnel (kept off-tunnel via LAN_NETWORK). Verified: egress flips to the VPN exit IP; DNS resolves live domains; the kill-switch fails closed (external egress blocked when the tunnel drops) while LAN stays reachable.

Copilot

Pull request overview

Adds an opt-in native WireGuard-based egress mode to the container (parallel to the existing Tor mode), including routing enforcement and example compose configurations.

Changes:

Introduces wireguard.sh to bring up a WireGuard tunnel, apply an egress kill-switch, and supervise tunnel health.
Updates startup flow (entrypoint.sh) and image dependencies (Dockerfile) to support WireGuard mode.
Documents and provides Compose examples for WireGuard routing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`wireguard.sh`	New WireGuard egress setup script with iptables kill-switch + supervisor-based healthcheck.
`entrypoint.sh`	Adds `USING_WIREGUARD` startup path and enforces mutual exclusion with Tor.
`Dockerfile`	Installs `wireguard-tools` and `iproute2`; makes `wireguard.sh` executable.
`readme.md`	Documents WireGuard env vars and adds a WireGuard compose quickstart section.
`docker-compose.dev.wireguard.yml`	Adds a dev-oriented compose variant extending the WireGuard compose service.
`compose/docker-compose.wireguard.yml`	Adds a reference compose file for running Shelfmark with WireGuard routing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

alex4108 · 2026-07-01T09:08:17Z

+
+# Endpoint host:port pairs from the config (to permit the encrypted handshake
+# out over the physical NIC).
+ENDPOINTS="$(grep -iE '^\s*Endpoint\s*=' "$RUNTIME_CONFIG" | cut -d'=' -f2- | xargs || true)"


Fixed in b4b7c11 — endpoint allow rules now use the resolved endpoints from the live interface (wg show <iface> endpoints), so they always target concrete IP:port values and can never drop the WireGuard encapsulation on a hostname config.

alex4108 · 2026-07-01T09:08:18Z

+iptables -F OUTPUT
+# Allow loopback
+iptables -A OUTPUT -o lo -j ACCEPT
+# Allow established/related return traffic
+iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+# Allow all traffic over the tunnel itself
+iptables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT


Fixed in b4b7c11 — added an ip6tables OUTPUT default-drop kill-switch (loopback/established/tunnel allowed), mirroring IPv4. When ip6tables is unusable in the container kernel, IPv6 is disabled at the stack so v6 egress still fails closed.

alex4108 · 2026-07-01T09:08:19Z

+        if [ -n "$ep_host" ] && [ -n "$ep_port" ]; then
+            iptables -A OUTPUT -p udp -d "$ep_host" --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || echo "[!] Could not add endpoint allow rule for $ep (may be IPv6/hostname); tunnel route still applies"
+        fi


Fixed in b4b7c11 — endpoint allow rules are now routed to iptables or ip6tables by the resolved address family, so IPv6 endpoints are permitted via ip6tables rather than failing under iptables.

alex4108 · 2026-07-01T09:08:20Z

+# wg-quick wants the interface name to match the config basename.
+CONFIG_BASENAME="$(basename "$WIREGUARD_CONFIG" .conf)"
+if [ "$CONFIG_BASENAME" != "$WIREGUARD_INTERFACE" ]; then
+    RUNTIME_CONFIG="/etc/wireguard/${WIREGUARD_INTERFACE}.conf"
+    mkdir -p /etc/wireguard
+    cp "$WIREGUARD_CONFIG" "$RUNTIME_CONFIG"
+else
+    RUNTIME_CONFIG="/etc/wireguard/${WIREGUARD_INTERFACE}.conf"
+    mkdir -p /etc/wireguard
+    cp "$WIREGUARD_CONFIG" "$RUNTIME_CONFIG"
+fi


Fixed in b4b7c11 — removed the redundant conditional; the config is always staged as /etc/wireguard/.conf.

- Kill-switch endpoints now come from the LIVE interface (`wg show <iface> endpoints`) instead of the config file, so the allow rules always target concrete resolved IP:port values. Config Endpoint= entries can be hostnames (or IPv6); adding an iptables -d rule for a hostname failed and dropped the WireGuard encapsulation, breaking the tunnel right after the kill-switch went in. - Add an IPv6 kill-switch: ip6tables OUTPUT default-drop with loopback / established / tunnel allowances, mirroring IPv4, so non-tunnel IPv6 egress fails closed too. When ip6tables is unusable in the container kernel, fall back to disabling IPv6 in the kernel so v6 egress still cannot leak. - Route each endpoint allow rule to iptables or ip6tables by address family. - Remove the redundant CONFIG_BASENAME conditional (both branches were identical); always stage the config as /etc/wireguard/<interface>.conf. Re-validated live against the Proton endpoint: egress flips to the VPN exit, live domains resolve + fetch through the tunnel, IPv6 egress is dead, and the kill-switch fails closed when the tunnel drops.

alex4108 · 2026-07-01T09:11:16Z

Thanks for the review, @copilot. All four comments are addressed in b4b7c11:

Endpoint allow rules (lines 109/130): now sourced from the live interface via wg show <iface> endpoints, so they always target concrete resolved IP:port values and are routed to iptables or ip6tables by address family. No more hostname/IPv6 rule-add failures that could drop the WireGuard encapsulation.
IPv6 kill-switch (line 215): added an ip6tables OUTPUT default-drop policy mirroring IPv4 (loopback/established/tunnel allowed). When ip6tables is unusable in the container kernel, IPv6 is disabled at the stack so v6 egress still fails closed.
Redundant conditional (line 76): removed; the config is always staged as /etc/wireguard/<interface>.conf.

Re-validated live against a real WireGuard endpoint: egress flips to the VPN exit, live domains resolve + fetch through the tunnel, IPv6 egress is dead, and the kill-switch fails closed when the tunnel drops.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

alex4108 · 2026-07-01T09:48:20Z

 if [ "$USING_TOR" = "true" ]; then
    if [ "$RUN_AS_NON_ROOT" = "true" ]; then
        echo "USING_TOR=true requires the container to start as root." >&2
        echo "Non-root mode skips the privileged filesystem and network setup Tor depends on." >&2
        exit 1
    fi
    ./tor.sh
 fi

+if [ "$USING_WIREGUARD" = "true" ]; then
+    if [ "$RUN_AS_NON_ROOT" = "true" ]; then
+        echo "USING_WIREGUARD=true requires the container to start as root." >&2
+        echo "Non-root mode skips the privileged network setup WireGuard depends on." >&2
+        exit 1
+    fi
+    if [ "$USING_TOR" = "true" ]; then
+        echo "USING_WIREGUARD and USING_TOR are mutually exclusive; pick one egress mode." >&2
+        exit 1
+    fi
+    ./wireguard.sh
+fi


Good catch — moved the USING_TOR/USING_WIREGUARD mutual-exclusion check into a dedicated block before either egress branch, so tor.sh can no longer start and then get aborted by the exclusivity check. Fixed in 5b58f2e.

Fixed in b8bfec1 — the USING_TOR/USING_WIREGUARD mutual-exclusion check now runs BEFORE either mode starts, so tor.sh can't run and then immediately exit 1.

alex4108 · 2026-07-01T09:48:22Z

+if ! ip6tables -L OUTPUT >/dev/null 2>&1; then
+    IP6TABLES_OK="false"
+    echo "[!] ip6tables unavailable in this kernel; disabling IPv6 in the kernel instead so v6 egress cannot leak."
+    # Belt-and-braces: if we cannot program an IPv6 kill-switch, drop IPv6
+    # entirely at the stack so non-tunnel v6 egress is impossible.
+    sysctl -w net.ipv6.conf.all.disable_ipv6=1 >/dev/null 2>&1 || true
+    sysctl -w net.ipv6.conf.default.disable_ipv6=1 >/dev/null 2>&1 || true
+fi


Fixed in 5b58f2e. When ip6tables is unavailable we now verify net.ipv6.conf.all.disable_ipv6 actually reads back as 1 after the sysctl write. If /proc/sys is read-only (so IPv6 stays enabled) and no IPv6 stack-absence is detected, we now fail closed (exit 1) with remediation guidance, instead of continuing with a silent leak. An explicit WIREGUARD_ALLOW_IPV6_LEAK=true override is available for operators who have confirmed the container has no IPv6 connectivity. The deploy compose sets net.ipv6.conf.all.disable_ipv6=1 so the happy path is the kernel-disabled branch.

Fixed in b8bfec1 — when ip6tables is unavailable we disable IPv6 via sysctl, then VERIFY disable_ipv6=1 (or that no IPv6 stack exists). If it couldn't be disabled (read-only /proc/sys) we now refuse to run with a clear remediation message, rather than silently leaving IPv6 able to leak. WIREGUARD_ALLOW_IPV6_LEAK=true is an explicit escape hatch.

alex4108 · 2026-07-01T09:48:23Z

+if is_truthy "$WIREGUARD_ENFORCE_DNS_VALUE"; then
+    # Prefer an explicit override; fall back to the tunnel config's DNS.
+    DNS_TO_USE="${WIREGUARD_DNS:-$WG_DNS}"
+    # Normalise separators (commas -> spaces).
+    DNS_TO_USE="$(echo "$DNS_TO_USE" | tr ',' ' ' | xargs || true)"
+    if [ -n "$DNS_TO_USE" ]; then
+        echo "[*] Enforcing resolver(s): $DNS_TO_USE"
+        : > /etc/resolv.conf
+        for ns in $DNS_TO_USE; do
+            echo "nameserver $ns" >> /etc/resolv.conf
+        done
+    else
+        echo "[*] WIREGUARD_ENFORCE_DNS=true but no resolver resolved (no WIREGUARD_DNS and no config DNS); leaving /etc/resolv.conf unchanged"
+    fi
+else
+    echo "[*] Leaving /etc/resolv.conf unchanged (WIREGUARD_ENFORCE_DNS=$WIREGUARD_ENFORCE_DNS_VALUE)"
+fi


Fixed in 5b58f2e — both sub-cases now fail closed. If WIREGUARD_ENFORCE_DNS=true but no resolver is defined (no WIREGUARD_DNS and no config DNS=), we exit 1 rather than leaving the inherited (Docker 127.0.0.11, LAN-allowlisted) resolver in place. And if writing /etc/resolv.conf fails (read-only bind mount), we now emit a clear error and exit 1 instead of dying abruptly.

Fixed in b8bfec1 — WIREGUARD_ENFORCE_DNS=true with no resolver now fails closed (was leaving the inherited resolver, e.g. Docker's 127.0.0.11, which the LAN allowlist permits). Also detects a read-only /etc/resolv.conf and fails closed with a clear error.

alex4108 · 2026-07-01T09:48:24Z

+    if [ "$FAIL_COUNT" -ge 3 ]; then
+        echo "$(date): restart trigger - bouncing $WIREGUARD_INTERFACE"
+        wg-quick down "$WIREGUARD_INTERFACE" 2>/dev/null || true
+        # Re-add tunnel ACCEPT before bringing it up (flush is not done here;
+        # the DROP rule stays in place so we never leak during the bounce).
+        wg-quick up "$WIREGUARD_INTERFACE" 2>/dev/null || echo "$(date): wg-quick up failed, will retry"
+        FAIL_COUNT=0
+        sleep 15
+    fi


Fixed in 5b58f2e. The sysctl shim is written to a persistent path (/app/wg-sysctl-shim) and the healthcheck now puts it on PATH before its recovery wg-quick up, exactly like the initial bring-up. So a stale-handshake bounce no longer fails on the read-only net.ipv4.conf.all.src_valid_mark=1 write and can actually recover.

Fixed in b8bfec1 — the sysctl shim is now persistent (/app/wg-sysctl-shim) and the healthcheck puts it on PATH, so the recovery bounce's wg-quick up no longer fails on the src_valid_mark write under read-only /proc/sys.

alex4108 · 2026-07-01T09:48:25Z

+# Resolved "host:port" endpoints from the live interface (one per peer).
+RESOLVED_ENDPOINTS="$(wg show "$WIREGUARD_INTERFACE" endpoints 2>/dev/null | awk '{print $2}' | grep -v '^$' || true)"
+
+# ip6tables may be unusable in some container kernels (missing tables). Detect
+# once so we can fail closed on IPv6 when possible and warn otherwise.
+IP6TABLES_OK="true"
+if ! ip6tables -L OUTPUT >/dev/null 2>&1; then
+    IP6TABLES_OK="false"
+    echo "[!] ip6tables unavailable in this kernel; disabling IPv6 in the kernel instead so v6 egress cannot leak."
+    # Belt-and-braces: if we cannot program an IPv6 kill-switch, drop IPv6
+    # entirely at the stack so non-tunnel v6 egress is impossible.
+    sysctl -w net.ipv6.conf.all.disable_ipv6=1 >/dev/null 2>&1 || true
+    sysctl -w net.ipv6.conf.default.disable_ipv6=1 >/dev/null 2>&1 || true
+fi
+
+add_endpoint_rule() {
+    # $1 = host, $2 = port
+    local host="$1" port="$2"
+    [ -z "$host" ] || [ -z "$port" ] && return 0
+    if printf '%s' "$host" | grep -q ':'; then
+        # IPv6 endpoint
+        if [ "$IP6TABLES_OK" = "true" ]; then
+            ip6tables -A OUTPUT -p udp -d "$host" --dport "$port" -j ACCEPT 2>/dev/null \
+                || echo "[!] Could not add IPv6 endpoint allow rule for [$host]:$port"
+        fi
+    else
+        iptables -A OUTPUT -p udp -d "$host" --dport "$port" -j ACCEPT 2>/dev/null \
+            || echo "[!] Could not add IPv4 endpoint allow rule for $host:$port"
+    fi
+}
+
+# --- IPv4 kill-switch ---
+iptables -F OUTPUT
+iptables -A OUTPUT -o lo -j ACCEPT
+iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+iptables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT
+
+# --- IPv6 kill-switch (fail closed) ---
+if [ "$IP6TABLES_OK" = "true" ]; then
+    ip6tables -F OUTPUT
+    ip6tables -A OUTPUT -o lo -j ACCEPT
+    ip6tables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+    ip6tables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT
+fi
+
+# Allow the encrypted WireGuard handshake/data to each resolved endpoint.
+for ep in $RESOLVED_ENDPOINTS; do
+    # Split host:port from the right so IPv6 colons in the host are preserved.
+    ep_port="${ep##*:}"
+    ep_host="${ep%:*}"
+    # Strip IPv6 brackets if present ([2a02::10] -> 2a02::10)
+    ep_host="${ep_host#[}"
+    ep_host="${ep_host%]}"
+    add_endpoint_rule "$ep_host" "$ep_port"
+done


Fixed in 5b58f2e. On each recovery bounce the healthcheck now re-derives the peer endpoint(s) from the live interface (wg show <iface> endpoints) and re-opens them in the kill-switch, inserting at the top of OUTPUT ahead of the default DROP and guarded with -C so rules never duplicate. A rotated/NAT-rebound endpoint can no longer be stranded behind the DROP during reconnect.

Fixed in b8bfec1 — endpoint allow rules now permit by the WireGuard UDP port rather than a pinned startup IP (everything else stays forced through the tunnel by the default-drop), and the healthcheck refreshes them from the live interface after each bounce, so a rotated/roamed endpoint can reconnect.

- entrypoint: move USING_TOR/USING_WIREGUARD mutual-exclusion check ahead of both egress branches so tor.sh can never start then abort (was inside the WireGuard block). - wireguard.sh: fail closed when ip6tables is unavailable AND IPv6 cannot be disabled via sysctl (read-only /proc/sys) instead of silently ignoring the failed sysctl write; add WIREGUARD_ALLOW_IPV6_LEAK override. - wireguard.sh: fail closed on DNS enforcement when /etc/resolv.conf is not writable, and when WIREGUARD_ENFORCE_DNS=true but no resolver is defined, rather than leaving the inherited (leaky) resolver in place. - wireguard.sh: healthcheck recovery now brings the tunnel up via the same persistent sysctl shim as the initial bring-up, so a stale-handshake bounce does not fail on the read-only src_valid_mark write and never recover. - wireguard.sh: healthcheck refreshes the kill-switch endpoint allow rules from the live interface on bounce (idempotent -C/-I), so a rotated/rebound peer endpoint cannot be stranded behind the default DROP.

Five findings from the re-review, all fixed and re-validated live: 1. entrypoint.sh: check USING_TOR/USING_WIREGUARD mutual exclusion BEFORE starting either mode, so we never run tor.sh and then exit 1 with a half-configured stack. 2. IPv6 fail-closed: when ip6tables is unusable we disable IPv6 via sysctl, but that can silently no-op on read-only /proc/sys, leaving IPv6 able to leak with no kill-switch. Now we VERIFY disable_ipv6=1 (or that no IPv6 stack exists) and refuse to run otherwise, with a clear remediation message and a WIREGUARD_ALLOW_IPV6_LEAK escape hatch. 3. DNS fail-closed: with WIREGUARD_ENFORCE_DNS=true, if no resolver is defined we now refuse to run instead of leaving the inherited resolver (Docker's 127.0.0.11, permitted by the LAN allowlist) able to leak queries off-tunnel. Also detect a read-only /etc/resolv.conf write failure and fail closed with a clear error instead of aborting opaquely. 4. Healthcheck recovery: the stale-handshake bounce reused a bare wg-quick, which would fail again on the src_valid_mark sysctl write under read-only /proc/sys. The sysctl shim is now persistent (/app/wg-sysctl-shim) and the healthcheck puts it on PATH so a bounce can actually recover. 5. Endpoint roaming: endpoint allow rules were derived once at startup and pinned to the startup IP, so a provider IP rotation / NAT rebind / roaming could strand the reconnect. Rules now allow by the WireGuard UDP port (everything else stays forced through the tunnel by the default-drop), and the healthcheck refreshes them from the live interface after each bounce. Re-validated live against the Proton endpoint: egress = VPN exit, DNS resolves + fetch through the tunnel, IPv6 leak guard correctly refuses to run without a v6 kill-switch (and proceeds once IPv6 is disabled for the container), DNS no-resolver guard fails closed, and the kill-switch fails closed on tunnel drop.

alex4108 · 2026-07-01T09:48:40Z

Second Copilot review addressed — commit `5b58f2e`

All 5 nitpicks on the previous head (b4b7c11) are fixed and re-validated live:

Correctness / ordering

entrypoint.sh: mutual-exclusion of USING_TOR/USING_WIREGUARD now runs before either egress branch, so tor.sh can't start and then abort.

Fail-closed hardening (no silent leaks)

IPv6: when ip6tables is unusable, we verify disable_ipv6 actually took; if /proc/sys is read-only and IPv6 stays up, we exit 1 (override: WIREGUARD_ALLOW_IPV6_LEAK=true).
DNS: exit 1 if enforcement is on but no resolver is defined, and if /etc/resolv.conf isn't writable — instead of falling back to the leaky inherited resolver.

Recovery robustness

Healthcheck bounce now reuses the persistent sysctl shim on PATH, so wg-quick up doesn't re-fail on the read-only src_valid_mark write.
Healthcheck now refreshes kill-switch endpoint allow-rules from the live interface on each bounce (idempotent -C/-I), so a rotated/rebound peer endpoint isn't stranded behind the default DROP.

Live re-validation (base image, real wireguard.sh, container sysctls incl. disable_ipv6=1):

egress IP = Proton exit 95.173.221.208 (not the host)
annas-archive.gs resolves (46.8.9.x) and fetches HTTP 302
IPv6 egress dead (curl -6 rc=7)
fail-closed confirmed: tunnel down ⇒ non-LAN egress blocked (curl rc=6)
generated healthcheck passes bash -n

GHCR image ghcr.io/alex4108/shelfmark:wireguard-test rebuilt & pushed (sha256:9dca4df8…).

- compose/docker-compose.wireguard.yml: add net.ipv6.conf.*.disable_ipv6=1 sysctls so the IPv6 leak guard is satisfied on kernels without ip6tables. - readme: document WIREGUARD_DNS, WIREGUARD_DISABLE_IPV6, WIREGUARD_ALLOW_IPV6_LEAK, WIREGUARD_STALE_AFTER, the IPv6 fail-closed behavior, and the endpoint-refresh on healthcheck bounce.

alex4108 · 2026-07-01T09:49:27Z

Second review round addressed in b8bfec1 (+ docs). Thanks again @copilot — these were sharp.

Mutual exclusion now checked before either egress mode starts (no more run-tor-then-exit).
IPv6 fail-closed: verify IPv6 is actually disabled when ip6tables is unavailable; refuse to run otherwise (with a WIREGUARD_ALLOW_IPV6_LEAK escape hatch). No silent leak.
DNS fail-closed: no-resolver and read-only /etc/resolv.conf now fail closed instead of falling back to the inherited resolver.
Healthcheck recovery: persistent sysctl shim reused on the bounce so wg-quick up can recover under read-only /proc/sys.
Endpoint roaming: allow by WireGuard UDP port (not a pinned IP) and refresh from the live interface after each bounce, so a rotated/roamed endpoint reconnects.

Re-validated live against a real Proton endpoint: egress = VPN exit, DNS resolves + fetch through the tunnel, the IPv6 guard correctly refuses to run without a v6 kill-switch and proceeds once IPv6 is disabled for the container, the DNS no-resolver guard fails closed, and the kill-switch fails closed on tunnel drop. Compose example + readme updated to disable IPv6 for the container on kernels lacking ip6tables.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

alex4108 · 2026-07-01T10:03:27Z

+# --- IPv4 kill-switch ---
+iptables -F OUTPUT
+iptables -A OUTPUT -o lo -j ACCEPT
+iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+iptables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT
+
+# --- IPv6 kill-switch (fail closed) ---
+if [ "$IP6TABLES_OK" = "true" ]; then
+    ip6tables -F OUTPUT
+    ip6tables -A OUTPUT -o lo -j ACCEPT
+    ip6tables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+    ip6tables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT
+fi


Fixed in 1f4f276 — removed the duplicate OUTPUT flush/rebuild block left by the earlier refactor; the kill-switch chain is now built once.

alex4108 · 2026-07-01T10:03:29Z

+refresh_endpoint_rules() {
+    local eps ep ep_host ep_port seen_v4=" " seen_v6=" "
+    eps="$(wg show "$WIREGUARD_INTERFACE" endpoints 2>/dev/null | awk '{print $2}' | grep -v '^$' || true)"
+    for ep in $eps; do
+        ep_port="${ep##*:}"
+        ep_host="${ep%:*}"
+        [ -z "$ep_port" ] && continue
+        if printf '%s' "$ep_host" | grep -q ':'; then
+            case "$seen_v6" in *" $ep_port "*) continue ;; esac
+            seen_v6="${seen_v6}${ep_port} "
+            [ "$IP6TABLES_OK" = "true" ] && { ip6tables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || ip6tables -I OUTPUT 1 -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null || true; }
+        else
+            case "$seen_v4" in *" $ep_port "*) continue ;; esac
+            seen_v4="${seen_v4}${ep_port} "
+            iptables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || iptables -I OUTPUT 1 -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null || true
+        fi
+    done
+}


Fixed in 1f4f276 — the healthcheck's refresh_endpoint_rules now uses the same IP+port pinning as startup, so tightening the startup allowlist and the reconnect path stay consistent.

alex4108 · 2026-07-01T10:03:31Z

+while true; do
+    HS="$(latest_handshake_epoch)"
+    NOW="$(date +%s)"


Fixed in 1f4f276 — refresh_endpoint_rules now runs EVERY healthcheck cycle (not only after a bounce), so a provider endpoint-IP change is picked up and allowed ahead of the DROP before the tunnel goes stale, minimising recovery delay.

alex4108 · 2026-07-01T10:03:33Z

+# Egress modes are mutually exclusive. Check this BEFORE starting either one so
+# we never run tor.sh and then abort, leaving a half-configured network stack.
+if [ "$USING_TOR" = "true" ] && [ "$USING_WIREGUARD" = "true" ]; then
+    echo "USING_TOR and USING_WIREGUARD are mutually exclusive; enable only one egress mode." >&2
+    exit 1
+fi


Added in 1f4f276 — tests/config/test_entrypoint_permissions.py now covers USING_WIREGUARD non-root rejection and USING_TOR/USING_WIREGUARD mutual exclusion (including that the mutex check fires before either egress script runs, in both non-root and root startup). All 3 pass.

alex4108 · 2026-07-01T10:03:28Z

+    local endpoints ep ep_port seen_v4=" " seen_v6=" "
+    endpoints="$(wg show "$WIREGUARD_INTERFACE" endpoints 2>/dev/null | awk '{print $2}' | grep -v '^$' || true)"
+    for ep in $endpoints; do
+        # Split host:port from the right so IPv6 colons in the host are preserved.
+        ep_port="${ep##*:}"
+        local ep_host="${ep%:*}"
+        [ -z "$ep_port" ] && continue
+        if printf '%s' "$ep_host" | grep -q ':'; then
+            # IPv6 endpoint -> ip6tables (by port)
+            case "$seen_v6" in *" $ep_port "*) continue ;; esac
+            seen_v6="${seen_v6}${ep_port} "
+            if [ "$IP6TABLES_OK" = "true" ]; then
+                ip6tables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                    || ip6tables -A OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                    || echo "[!] Could not add IPv6 endpoint allow rule for udp/$ep_port"
+            fi
+        else
+            # IPv4 endpoint -> iptables (by port)
+            case "$seen_v4" in *" $ep_port "*) continue ;; esac
+            seen_v4="${seen_v4}${ep_port} "
+            iptables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || iptables -A OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || echo "[!] Could not add IPv4 endpoint allow rule for udp/$ep_port"
+        fi
+    done


Fixed in 1f4f276 — endpoint allow rules are now pinned to the resolved endpoint IP and UDP port (was port-only). Verified live: UDP to a non-peer host on the WG port is dropped; only -d <peer>/32 --dport <port> is permitted, so there's no cleartext-UDP hole while the tunnel is down.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

alex4108 · 2026-07-01T10:04:47Z

+# --- IPv4 kill-switch ---
+iptables -F OUTPUT
+iptables -A OUTPUT -o lo -j ACCEPT
+iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+iptables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT
+
+# --- IPv6 kill-switch (fail closed) ---
+if [ "$IP6TABLES_OK" = "true" ]; then
+    ip6tables -F OUTPUT
+    ip6tables -A OUTPUT -o lo -j ACCEPT
+    ip6tables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
+    ip6tables -A OUTPUT -o "$WIREGUARD_INTERFACE" -j ACCEPT
+fi


Fixed in `1f4f276` (built on `c5f4543`). Removed the duplicated OUTPUT flush/rebuild block — there is now a single kill-switch build, so it is unambiguous which rules are active. Verified live: `iptables -S OUTPUT` shows one lo/ESTABLISHED/wg0/endpoint/LAN/DROP chain with no duplicate base rules.

alex4108 · 2026-07-01T10:04:48Z

+        # Split host:port from the right so IPv6 colons in the host are preserved.
+        ep_port="${ep##*:}"
+        local ep_host="${ep%:*}"
+        [ -z "$ep_port" ] && continue
+        if printf '%s' "$ep_host" | grep -q ':'; then
+            # IPv6 endpoint -> ip6tables (by port)
+            case "$seen_v6" in *" $ep_port "*) continue ;; esac
+            seen_v6="${seen_v6}${ep_port} "
+            if [ "$IP6TABLES_OK" = "true" ]; then
+                ip6tables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                    || ip6tables -A OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                    || echo "[!] Could not add IPv6 endpoint allow rule for udp/$ep_port"
+            fi
+        else
+            # IPv4 endpoint -> iptables (by port)
+            case "$seen_v4" in *" $ep_port "*) continue ;; esac
+            seen_v4="${seen_v4}${ep_port} "
+            iptables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || iptables -A OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || echo "[!] Could not add IPv4 endpoint allow rule for udp/$ep_port"
+        fi


Fixed in `1f4f276`. `apply_endpoint_rules` now pins each allow rule to the resolved endpoint destination IP and UDP port (`-d /32 --dport `) instead of the port alone, closing the off-tunnel UDP hole during downtime/bounces. Verified live: rule is `-A OUTPUT -d 95.173.221.187/32 -p udp -m udp --dport 51820 -j ACCEPT`, and UDP to a non-peer host on that port is dropped. Endpoint IP rotation self-heals via the healthcheck re-deriving the live endpoint.

alex4108 · 2026-07-01T10:04:49Z

+    local eps ep ep_host ep_port seen_v4=" " seen_v6=" "
+    eps="$(wg show "$WIREGUARD_INTERFACE" endpoints 2>/dev/null | awk '{print $2}' | grep -v '^$' || true)"
+    for ep in $eps; do
+        ep_port="${ep##*:}"
+        ep_host="${ep%:*}"
+        [ -z "$ep_port" ] && continue
+        if printf '%s' "$ep_host" | grep -q ':'; then
+            case "$seen_v6" in *" $ep_port "*) continue ;; esac
+            seen_v6="${seen_v6}${ep_port} "
+            [ "$IP6TABLES_OK" = "true" ] && { ip6tables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || ip6tables -I OUTPUT 1 -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null || true; }
+        else
+            case "$seen_v4" in *" $ep_port "*) continue ;; esac
+            seen_v4="${seen_v4}${ep_port} "
+            iptables -C OUTPUT -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null \
+                || iptables -I OUTPUT 1 -p udp --dport "$ep_port" -j ACCEPT 2>/dev/null || true
+        fi
+    done


Fixed in `1f4f276`. `refresh_endpoint_rules` now pins to the resolved endpoint IP + port as well, and strips the IPv6 `[]` brackets before `-d`. It also runs every healthcheck cycle (not only after a bounce) so a provider endpoint-IP rotation is picked up promptly. Same fail-closed guarantee: everything else stays forced through the tunnel by the default-DROP.

…dead duplicate base ruleset Address 3rd Copilot review on PR calibrain#1097 (HEAD b8bfec1): - Remove the duplicated OUTPUT base kill-switch block (dead code; the second identical flush/allow block was the only one in effect). - apply_endpoint_rules: pin the WireGuard endpoint allow rule to the resolved destination IP AND udp/port instead of the port alone. A wildcard-port rule left an off-tunnel egress hole for arbitrary UDP to that port during tunnel downtime/bounces. Endpoint IP rotation self-heals via the healthcheck bounce re-deriving the live endpoint. - refresh_endpoint_rules (healthcheck): same IP+port pinning, and strip IPv6 [] brackets before -d. Live re-validated: egress=Proton 95.173.221.208, endpoint rule pinned -d 95.173.221.187/32 --dport 51820, annas-archive.gs 302, IPv6 egress dead (rc7), fail-closed when tunnel dropped.

- Kill-switch: remove the duplicate OUTPUT flush/rebuild block left by the previous refactor (single kill-switch build now). - Endpoint allow rules pinned to the resolved endpoint IP *and* UDP port (was port-only). Port-only allowed arbitrary cleartext UDP egress to that port while the tunnel was down, undermining the fail-closed guarantee. Verified: UDP to a non-peer host on the WG port is now dropped; only the peer IP:port is permitted. - Healthcheck refresh_endpoint_rules matches startup (IP+port) and now runs EVERY cycle, not only after a bounce, so a provider endpoint-IP rotation is picked up promptly (adds the new IP+port allow ahead of the DROP before the tunnel goes stale) rather than waiting for the next bounce. - Tests: add entrypoint coverage for USING_WIREGUARD non-root rejection and USING_TOR/USING_WIREGUARD mutual exclusion (incl. that the mutex check fires before either egress script runs, in both non-root and root startup). Re-validated live: endpoint rule is IP+port pinned (-d <peer>/32 --dport), egress = VPN exit, annas-archive.gs 302 through the tunnel, kill-switch fails closed on drop, and the 3 new entrypoint tests pass.

alex4108 · 2026-07-01T10:03:45Z

Third review round addressed in 1f4f276. Thanks @copilot.

Duplicate kill-switch — removed the leftover second OUTPUT flush/rebuild; chain is built once.
Cleartext-UDP hole (kill-switch guarantee) — endpoint allow rules are now pinned to the resolved endpoint IP + port (was port-only). Verified live: UDP to a non-peer host on the WG port is dropped; only -d <peer>/32 --dport <port> is allowed.
Healthcheck consistency — refresh_endpoint_rules matches startup (IP+port).
Proactive endpoint refresh — refresh now runs every healthcheck cycle, not just after a bounce, so a rotated endpoint IP is allowed before the tunnel goes stale.
Tests — added USING_WIREGUARD non-root + TOR/WG mutual-exclusion entrypoint tests (fire-before-either-egress-runs, non-root and root). All 3 pass.

Re-validated live: endpoint rule is IP+port pinned, egress = VPN exit, annas-archive.gs 302 through the tunnel, kill-switch fails closed on drop.

alex4108 · 2026-07-01T10:13:43Z

Addressed Copilot 3rd review (HEAD `1f4f276`)

All three inline nitpicks on the previous head are resolved in the kill-switch:

Dead duplicate base ruleset — removed the duplicated OUTPUT flush/rebuild block; there is now a single kill-switch build (lo → ESTABLISHED → wg0 → endpoint → LAN → DROP), so which rules are active is unambiguous.
Startup endpoint allow rule (apply_endpoint_rules) — now pinned to the resolved endpoint destination IP + UDP port (-d <peer>/32 --dport <port>) instead of the port alone, closing the off-tunnel UDP hole during downtime/bounces.
Healthcheck endpoint rule (refresh_endpoint_rules) — same IP+port pinning, IPv6 [] brackets stripped before -d, and it now runs every cycle so a provider endpoint-IP rotation is picked up promptly.

Also added entrypoint test coverage: USING_WIREGUARD non-root rejection and USING_TOR/USING_WIREGUARD mutual exclusion (mutex fires before either egress script runs).

Live re-validation (base image + real `wireguard.sh`, container `disable_ipv6=1` sysctls)

Endpoint rule pinned: -A OUTPUT -d 95.173.221.187/32 -p udp -m udp --dport 51820 -j ACCEPT
Egress = Proton exit 95.173.221.208 (not the host IP)
annas-archive.gs resolves via enforced resolver → HTTP 302 through the tunnel
IPv6 egress dead (curl -6 rc=7)
Fail-closed: tunnel down → non-LAN egress blocked (curl rc=6), default-DROP intact

GHCR image ghcr.io/alex4108/shelfmark:wireguard-test rebuilt + pushed from this head (sha256:2bb31870…).

…ial review) Internal adversarial security review found a CRITICAL fail-open hole and several hardening gaps before this went back to external review: C1 (CRITICAL) — blanket 'ESTABLISHED,RELATED ACCEPT' on OUTPUT let an app-initiated flow that was established over the tunnel keep egressing off the physical NIC during a healthcheck bounce (wg-quick down drops the wg0 default route; the flow falls back to eth0 and matched the blanket ACCEPT above the DROP), leaking the real IP. Replaced with a rule scoped to WebUI server replies only (-p tcp --sport FLASK_PORT -m conntrack --ctstate ESTABLISHED), so client-initiated egress can never ride ESTABLISHED off-tunnel. Verified live: an established download is blocked the moment the tunnel drops. H1 — set 'iptables -P OUTPUT DROP' (and ip6tables) BEFORE building the chain, so the build window and any partial-build failure under set -e are fail-closed by construction rather than relying on the trailing DROP being reached. M4 — on the no-handshake abort, stop the healthcheck + wg-quick down before exit so a lingering supervised healthcheck can't revive an abandoned tunnel. M5 — on a failed recovery 'wg-quick up', keep FAIL_COUNT at threshold so the next cycle retries immediately instead of waiting 3 more stale cycles. M6 — use '-m conntrack --ctstate' instead of '-m state' for the (now scoped) reply rule; the general egress path no longer depends on conntrack at all. M7 — endpoint parser now skips '(none)' / non-numeric-port tokens (startup and healthcheck refresh), avoiding junk iptables rules on multi-peer/no-handshake. M3 — readme: document that WIREGUARD_DNS must be a trusted LAN resolver (query leaves as plaintext UDP/53 on the LAN) and that WIREGUARD_ENFORCE_DNS=false leaks DNS via Docker's inherited resolver. L1 — replace GNU-only '\s' with '[[:space:]]' in the DNS grep/sed. H2 — add the key security test: wireguard.sh exiting non-zero must abort entrypoint before gunicorn starts (kill-switch bypass guard). Plus the parser skip test. All entrypoint tests pass. C2 (verified safe, no change) — traced that entrypoint's set -e + plain ./wireguard.sh call already guarantees gunicorn never boots on egress-setup failure; H2 now locks that invariant with a test.

…dversarial pass 2) Second internal adversarial pass on the kill-switch (zero critical/high left); this closes the one remaining MEDIUM and two robustness nits: F1 (MEDIUM) — the WebUI reply allow rule matched any ESTABLISHED packet with source port FLASK_PORT, which could match an APP-INITIATED outbound flow that happens to bind/collide on that local port (off-tunnel leak). Added '--ctdir REPLY' so only packets in the reply direction of an inbound-originated connection match; app-initiated flows are in the original direction and are never allowed. Verified live: an outbound socket bound to source port 8084 is BLOCKED when the tunnel is down (times out), while legitimate WebUI replies still pass. Mirrored on ip6tables. L2 — the gunicorn-abort test now also asserts the stub wireguard.sh marker on stderr, so it can't false-pass on an unrelated earlier abort. L4 — clamp the healthcheck FAIL_COUNT so it matches its 'cap it' comment instead of incrementing unbounded across a long outage.

alex4108 · 2026-07-01T10:45:31Z

Pushed a substantial hardening round to ae39448 after an internal adversarial security review of the kill-switch.

Critical fail-open leak fixed. The previous revision had a blanket OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT. During a healthcheck tunnel bounce (wg-quick down tears out the wg0 default route; traffic falls back to the physical NIC), an app-initiated download already established over the tunnel would keep egressing off-tunnel and match that rule above the DROP — leaking the real IP. Replaced with:

iptables -P OUTPUT DROP set before flushing/building the chain, so the build window and any partial-build failure are fail-closed by construction.
A WebUI-reply-only allow: -p tcp --sport ${FLASK_PORT:-8084} -m conntrack --ctstate ESTABLISHED --ctdir REPLY -j ACCEPT. --ctdir REPLY matches only reply-direction packets of inbound-originated connections, so client-initiated egress can never ride ESTABLISHED off-tunnel (even on a source-port collision), while genuine WebUI replies through Traefik/DNAT still pass.

Also in this round:

Endpoint allow pinned to resolved IP+port (no wildcard-UDP-port hole); healthcheck refreshes it every cycle + on bounce so endpoint rotation self-heals without leaving a hole.
IPv6 + DNS both fail closed (refuse to start rather than risk a leak), with a documented escape hatch and a readme note on the WIREGUARD_ENFORCE_DNS=false foot-gun.
Endpoint parser skips (none)/malformed tokens; -m conntrack instead of -m state; portability + back-off nits.
New test: wireguard.sh exiting non-zero must abort entrypoint before gunicorn starts (kill-switch-bypass guard) — asserts the app never boots on egress-setup failure.

Verified live throughout: chain is policy-DROP + scoped WebUI reply + IP+port endpoint + LAN + trailing DROP; egress = VPN exit; an established download and an outbound socket bound to source port 8084 are both blocked the moment the tunnel drops; bounce recovery restores tunnel egress. Ready for another look.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

+| `USING_WIREGUARD` | Enable WireGuard VPN egress with kill-switch (requires root startup) | `false` |
+| `WIREGUARD_CONFIG` | Path to the mounted wg-quick config | `/config/wg0.conf` |
+| `WIREGUARD_INTERFACE` | WireGuard interface name | `wg0` |
+| `LAN_NETWORK` | Comma-separated CIDRs kept off the tunnel so the WebUI / internal clients stay reachable | `127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16` |
+| `WIREGUARD_ENFORCE_DNS` | Force the resolver (via `WIREGUARD_DNS`, else the config's `DNS =`) so lookups go through the tunnel. Fails closed if no resolver is defined or `/etc/resolv.conf` is not writable. | `true` |
+| `WIREGUARD_DNS` | Explicit resolver(s) to pin (comma/space separated). Use when the VPN's pushed DNS filters domains you need; point at a resolver reachable via the tunnel or an allowed LAN resolver. | _(unset; uses config `DNS =`)_ |
+| `WIREGUARD_DISABLE_IPV6` | Strip IPv6 from the tunnel config (many container kernels lack the ip6tables `raw` table wg-quick needs) and remove IPv6 as a leak surface. | `true` |
+| `WIREGUARD_ALLOW_IPV6_LEAK` | Escape hatch: continue even when an IPv6 kill-switch can't be installed AND IPv6 can't be disabled. Only set if the container has no IPv6 connectivity. | `false` |
+| `WIREGUARD_STALE_AFTER` | Seconds since the last handshake before the healthcheck bounces the tunnel. | `180` |


calibrain · 2026-07-01T15:12:21Z

Hi @alex4108
Could you fix the CI failure (its just some formatting issue, uv run ruff format should fix it !)

Also, any reason of doing this versus just running it through gluetun ?
Unlike TOR, Wireguard has first class support in gluetun ecosystem and several users are already using a gluetun side car to access their shelfmark instance

alex4108 added 2 commits July 1, 2026 08:29

Copilot AI review requested due to automatic review settings July 1, 2026 08:56

Copilot started reviewing on behalf of alex4108 July 1, 2026 08:56 View session