Skip to content

feat(dind): preload + default-on host-image passthrough for the nested daemon (issue #94)#95

Merged
konard merged 6 commits into
mainfrom
issue-94-c1c4a75d3b0d
Jun 10, 2026
Merged

feat(dind): preload + default-on host-image passthrough for the nested daemon (issue #94)#95
konard merged 6 commits into
mainfrom
issue-94-c1c4a75d3b0d

Conversation

@konard

@konard konard commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #94: the box-dind
nested daemon boots with an empty image store, so a fresh container re-pulls
images the host already has (the classic Docker-in-Docker cache pitfall).

This PR seeds the inner daemon at startup via two complementary paths, both
daemon-gated, idempotent (skip images already present), and non-fatal
(a bad path / failed copy logs a warning and the shell still starts):

1. Explicit preload

Variable Behavior
DIND_PRELOAD_TARBALL Space-separated docker save tarballs and/or directories of *.tar, docker load-ed into the inner daemon. Zero-network host-image reuse.
DIND_PRELOAD_IMAGES Space-separated references to docker pull, skipping any already present.

2. Host-image passthrough — on by default, opt-out-able (issue follow-up)

Addresses the owner's follow-up: "by default add host-image passthrough … possible
to turn it off … do passthrough only for images that are available in docker hub
and so on, so these are safe from tokens and baked in configuration … also have an
option to pass through them all."

Variable Default Behavior
DIND_HOST_PASSTHROUGH public public: copy only host images carrying a RepoDigest from an allowlisted public registry (freely re-pullable → no local secrets/credentials leak). all: copy every tagged image. off/0/false/no: disable.
DIND_HOST_DOCKER_SOCK /var/run/host-docker.sock Path to the mounted host Docker socket. A non-default path, mounted read-only, used only to seed images — the inner daemon keeps its own isolated /var/run/docker.sock.
DIND_HOST_PASSTHROUGH_REGISTRIES common public registries Allowlist treated as "public" in public mode.

Enable it by mounting the host socket; otherwise it is a quiet no-op, so the
normal --privileged run is unchanged:

docker run -d --privileged \
  -v /var/run/docker.sock:/var/run/host-docker.sock:ro \
  --name box-dind konard/box-dind sleep infinity

Why this preserves isolation (issue #80): the host socket is read only at
startup to seed images and is deliberately not mounted at
/var/run/docker.sock. The inner daemon stays the container's runtime, so the
per-container docker ps scoping is intact and we never adopt the
Docker-outside-of-Docker security model the project rejects. Why public is the
safe default:
a RepoDigest from a public registry proves the image is freely
re-pullable, so copying it leaks no build secrets and needs no credential; local
builds (no RepoDigest) and private-registry images are excluded unless you opt into
all.

How to reproduce / verify

  • Isolated unit test — runs anywhere (the CI sandbox is vfs-only and cannot
    build the overlay-backed dind image): bash experiments/preload-unit-test.sh
    sources the real entrypoint and drives load/pull/skip/daemon-down/no-op plus the
    passthrough branches (public filter, all, skip-present, off, registry
    detection) with a mock docker and a real AF_UNIX socket → 26 assertions, all pass.
  • Integration example (CI pr-test-dind, privileged):
    tests/dind/example-preload-images.sh asserts the single-file and directory
    tarball forms seed the inner daemon, DIND_PRELOAD_IMAGES skips the redundant
    pull, and — using a throwaway second dind-box as a controlled host daemon —
    all mode copies a local fixture while public mode correctly refuses it
    (no RepoDigest). Wired into CI alongside the other documented dind examples.

CI/CD review (PR #95 follow-up request)

  • "Skip CI/CD when a new commit arrives, for both main and PRs" — already
    satisfied: both release.yml and measure-disk-space.yml use
    concurrency: { group: ${{ github.workflow }}-${{ github.ref }}, cancel-in-progress: true }.
    The per-github.ref group cancels superseded runs on PR branches (each PR has
    its own group) and on main.
    • Deliberate divergence from the templates: the js/rust/python/csharp
      pipeline templates use cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
      to protect main-branch releases from mid-flight cancellation. Box uses the
      unconditional form per the explicit "both main and PRs" request. Trade-off:
      a new push to main can cancel an in-progress release on main. If protecting
      releases is preferred, switch to the conditional form — flag if you'd like that.
  • Template comparison (full workflow file tree across all four templates): the
    only shared CI gap found is that the js template's links.yml is missing a
    concurrency: block its sibling workflows have. Reported upstream as authorized:
    ci: links.yml is missing a concurrency block (redundant runs not cancelled on new commits) js-ai-driven-development-pipeline-template#73.

Files changed

  • ubuntu/24.04/dind/dind-entrypoint.sh — preload hook + host passthrough + env docs.
  • tests/dind/example-preload-images.sh — executable example (CI), incl. passthrough.
  • experiments/preload-unit-test.sh — isolated branch coverage (preload + passthrough).
  • docs/dind/USAGE.md, README.md, docs/case-studies/issue-94/CASE-STUDY.md — docs.
  • .changeset/dind-preload-images.mdbump: minor (release trigger; VERSION is
    bumped by CI, not manually).

Fixes #94

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #94
@konard konard self-assigned this Jun 9, 2026
The nested dockerd starts with an empty image store, so a fresh
box-dind container re-downloads images the host already has on the
first inner 'docker run'. Add a documented startup preload hook to
dind-entrypoint.sh:

- DIND_PRELOAD_TARBALL: docker load 'docker save' tarballs and/or
  directories of *.tar into the inner daemon once it is ready.
- DIND_PRELOAD_IMAGES: docker pull registry/mirror references,
  skipping any image already present.

Preload is non-fatal, daemon-gated, and skipped when
DIND_SKIP_DAEMON=1. Documented in docs/dind/USAGE.md and README,
covered by tests/dind/example-preload-images.sh (wired into
pr-test-dind CI) and experiments/preload-unit-test.sh.

Fixes #94
@konard konard changed the title [WIP] box-dind: nested daemon starts with empty image store, forcing re-download of host images feat(dind): preload host images into the nested daemon (issue #94) Jun 9, 2026
@konard konard marked this pull request as ready for review June 9, 2026 21:40
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Working session summary

I've implemented the fix and pushed it. Let me summarize the work while CI runs in the background.

Summary

I implemented option 1 from issue #94 — a documented startup preload hook for the box-dind family, so the nested daemon no longer re-downloads images the host already has.

Root cause

Each box-dind runs its own isolated nested dockerd with its own --data-root. That isolation (the deliberate per-container docker ps scoping from issue #80) is exactly why the inner daemon can't see host images. The fix must therefore be opt-in seeding, not automatic socket/store sharing (which would reintroduce the Docker-outside-of-Docker security problems the project already rejects).

Changes

  • ubuntu/24.04/dind/dind-entrypoint.sh — after dockerd is ready, seeds the nested daemon via two env vars:
    • DIND_PRELOAD_TARBALLdocker loads docker save tarball files and/or directories of *.tar (zero-network host-image reuse).
    • DIND_PRELOAD_IMAGESdocker pulls registry/mirror refs, skipping any already present.
    • Non-fatal (warns and continues), daemon-gated (only when docker info succeeds), skipped under DIND_SKIP_DAEMON=1.
  • docs/dind/USAGE.md — new "Reusing Host Images (Preload)" section + env-table rows.
  • README.md — security-model note linking to the preload section.
  • tests/dind/example-preload-images.sh — executable integration example (offline docker import fixture → docker save → mount → asserts presence in the inner daemon; covers single-file + directory forms + the skip-pull branch). Wired into the pr-test-dind CI job.
  • experiments/preload-unit-test.sh — isolated unit test driving the extracted entrypoint functions with a mock docker (11/11 branches passing locally). Needed because this sandbox only has the vfs storage driver and can't build the full overlay-backed dind image.
  • docs/case-studies/issue-94/ — case study + preserved issue text.
  • .changeset/dind-preload-images.mdminor bump.

Verification

  • Isolated unit test: 11/11 passing locally (load / pull / skip-when-present / daemon-down / no-op / missing-path).
  • Changeset & version checks already passed on the first CI run; the integration example is validated by the pr-test-dind job, which I'm monitoring in the background.

PR #95 is updated and marked ready for review: #95

I'll report the CI result for the dind-js job (which runs the new integration example) once it completes.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $4.544200

📊 Context and tokens usage:

  • 97.6K / 1M (10%) input tokens, 36.8K / 128K (29%) output tokens

Total: (1.8K new + 169.6K cache writes + 5.1M cache reads) input tokens, 36.8K output tokens, $4.544200 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2017KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

We need to configure our CI/CD, so we always skip CI/CD execution when new commit arrives, it will have reduce resources waste, and allow us to iterate faster, that should work for both main branch and pull requests.

Also double check #94 and that we fully implemented everything as I wrote in the comment to the actual issue.

Use all the best practices from CI/CD templates (check full file tree to compare for all GitHub workflow and CI/CD scripts file), if the same issue is found in template report issue also in templates:

We should compare all files, so we don't have more CI/CD errors in the future and reuse all the best practices from these templates.

@konard konard marked this pull request as draft June 9, 2026 21:59
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🤖 AI Work Session Started

Starting automated work session at 2026-06-09T21:59:58.436Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

konard added 2 commits June 9, 2026 22:17
…ty filter

Implements the issue #94 follow-up: copy images the host daemon already
has into the nested daemon at startup so they are not re-pulled, on by
default but safe and opt-out-able.

- DIND_HOST_PASSTHROUGH (default 'public'): 'public' copies only host
  images carrying a RepoDigest from an allowlisted public registry (freely
  re-pullable, so no local build secrets or private credentials leak);
  'all' copies every tagged image; 'off' disables it.
- DIND_HOST_DOCKER_SOCK (default /var/run/host-docker.sock): a NON-default
  path, mounted read-only, used only to seed images at startup so the inner
  daemon keeps its own isolated socket (preserves issue #80 isolation).
- DIND_HOST_PASSTHROUGH_REGISTRIES: the public-registry allowlist.
- Quiet no-op when no host socket is mounted, so the normal --privileged
  run is unchanged.

Tests: unit test sources the entrypoint and drives the passthrough
branches with a mock docker + real AF_UNIX socket (public filter, all
mode, skip-present, off, registry detection); the integration example
stands up a throwaway host daemon and asserts 'all' copies a local
fixture while 'public' refuses it. Docs updated in USAGE.md, README.md,
the issue-94 case study, and the changeset.
@konard konard changed the title feat(dind): preload host images into the nested daemon (issue #94) feat(dind): preload + default-on host-image passthrough for the nested daemon (issue #94) Jun 9, 2026
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Working session summary

Everything is verified and finalized on my end:

The only thing outstanding is CI, which is stuck in the GitHub runner queue (run 27239631956, 0 jobs scheduled after 40+ min — pure infrastructure backlog, nothing in my control). The prior commit's run already proved the pipeline healthy (sole failure was a transient Docker Hub/buildx timeout on an unrelated swift variant).

A background monitor (bum0bb0lo) is waiting for the run's jobs to schedule or the run to complete; it will re-invoke me. Once the gating checks (Check for Changesets, Check for Manual Version Changes, pr-test / dind-js) pass, I'll run gh pr ready 95 to mark the PR ready. I'll continue when the monitor reports back.


This summary was automatically extracted from the AI working session output.

@konard konard marked this pull request as ready for review June 9, 2026 22:33
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $9.726514

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

  1. 114.8K / 1M (11%) input tokens, 46.8K / 128K (37%) output tokens
  2. 116.8K / 1M (12%) input tokens, 48.1K / 128K (38%) output tokens
  3. 47.7K / 1M (5%) input tokens, 5.5K / 128K (4%) output tokens

Total: (33.6K new + 345.4K cache writes + 8.9M cache reads) input tokens, 118.5K output tokens, $9.726514 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (3819KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🔄 Auto-restart triggered (iteration 1)

Reason: CI failures detected

Starting new session to address the issues.


Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Working session summary

The run is still in progress. I'll wait for the background monitor (bjw7qa0mc) to notify me when the workflow completes. The analysis so far is solid:

  • Root cause of CI failure: A transient docker.io registry timeout (i/o timeout resolving ubuntu:24.04) at the very first FROM line of the JS box build — before any of my dind code executed. The docker-build-test failure was just its aggregator reacting.
  • My PR changes are unrelated to the JS Dockerfile — they're confined to dind entrypoint, docs, and tests. Local verification: all 26 preload unit tests pass, the example script syntax-checks clean, and it follows the same proven --privileged pattern as the existing sibling dind example tests.
  • Fix applied: Re-triggered the workflow run, since a transient infrastructure timeout is resolved by re-running.

I'll report back once the re-run finishes.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🔄 Auto-restart-until-mergeable Log (iteration 1)

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $1.530905

📊 Context and tokens usage:

  • 78.1K / 1M (8%) input tokens, 13.7K / 128K (11%) output tokens

Total: (3.0K new + 65.2K cache writes + 1.5M cache reads) input tokens, 13.7K output tokens, $1.530905 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (4788KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🔄 Auto-restart triggered (iteration 2)

Reason: CI failures detected

Starting new session to address the issues.


Auto-restart-until-mergeable mode is active. This run will stop after 5 restart iterations.

The example-preload-images.sh test waited only for the inner dockerd to
answer 'docker info' (wait_for_inner_docker), then immediately asserted the
seeded images were present. But the entrypoint loads images *after* dockerd
reports ready, so the assertion raced the asynchronous 'docker load' and
intermittently failed with the image showing up in the diagnostic dump a
fraction of a second later.

Emit an 'image preload/passthrough complete' marker once every preload path
finishes, and have the test wait for that marker before asserting (positive
and negative cases alike). Deterministic sync instead of a readiness race.
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Working session summary

Both monitors are watching the run. I'll wait for the completion notification rather than polling further.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

🔄 Auto-restart-until-mergeable Log (iteration 2)

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $2.079068

📊 Context and tokens usage:

  • 84.8K / 1M (8%) input tokens, 17.2K / 128K (13%) output tokens

Total: (3.9K new + 71.9K cache writes + 2.4M cache reads) input tokens, 17.2K output tokens, $2.079069 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (6679KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

box-dind: nested daemon starts with empty image store, forcing re-download of host images

1 participant