Skip to content

feat(lucebox): docker stack + CLI + bench/profile + harness + luce-bench in-tree#285

Closed
easel wants to merge 14 commits into
Luce-Org:mainfrom
easel:feat/lucebox-docker
Closed

feat(lucebox): docker stack + CLI + bench/profile + harness + luce-bench in-tree#285
easel wants to merge 14 commits into
Luce-Org:mainfrom
easel:feat/lucebox-docker

Conversation

@easel
Copy link
Copy Markdown
Collaborator

@easel easel commented May 27, 2026

This PR turns Lucebox into a one-command local inference deployment and ships the two tools that operate it: lucebox (the host CLI that runs and tunes the server) and luce-bench (the benchmark + grading framework that measures it). All three ship together so a fresh box goes from nothing to a tuned, benchmarked server with a single install.

The three pieces, what each is, and how to use it:


1. Docker — the server image

A CUDA 12.8 image (ghcr.io/luce-org/lucebox-hub:cuda12) that builds the dflash server and bundles server/, lucebox/, harness/, and luce-bench/. The entrypoint dispatches serve (default), benchmark, any lucebox subcommand, or shell. An in-container autotune fallback picks VRAM-tiered defaults and resolves the draft GGUF by target architecture (gemma4 → gemma drafter, qwen3.6 → dflash-draft-3.6).

Use it directly:

docker run --rm --gpus all -p 8080:8080 \
  -v ~/.local/share/lucebox/models:/opt/lucebox-hub/server/models \
  ghcr.io/luce-org/lucebox-hub:cuda12
# OpenAI + Anthropic-compatible API on :8080
curl -s http://localhost:8080/v1/models

Image tags: :cuda12, :vX.Y.Z-cuda12, :X.Y-cuda12, :sha-<short>-cuda12. Built and pushed by .github/workflows/docker.yml; docker-bake.hcl has a cuda13 slot ready.

2. lucebox — the host CLI

lucebox.sh is the host-side wrapper (deps: docker + nvidia-smi only). It probes the host, writes a tuned config.toml, runs the container as a user-systemd service, and delegates provisioning/workloads to the in-container Python CLI (models, autotune, profile, smoke, config, the client drivers).

Stand a server up:

lucebox check            # driver / docker / NVIDIA Container Toolkit / VRAM / systemd / WSL2 probe
lucebox pull             # docker pull the cuda12 image
lucebox models download  # pull target + DFlash draft GGUFs  (verbs: list, download)
lucebox autotune         # VRAM-tiered DFLASH_* defaults → ~/.lucebox/config.toml  (autotune --sweep picks a winner empirically)
lucebox install          # install the user-systemd unit
lucebox start            # bring it up   (enable = start at every login)
lucebox status           # unit state + the server's startup banner
lucebox logs             # follow the journal
lucebox smoke            # props/tools/http/1-token health check

Tune it to the GPU:

lucebox profile          # level1/2/3 sweep over DFLASH_MAX_CTX × DFLASH_BUDGET ×
                         # {KV type, pFlash mode, lazy-draft, prefix-cache slots},
                         # gated on capability + ds4-eval/agentic validation before
                         # the winner merges into config.toml

The running config is observable at GET /props (schema 4), which now reports a host block — kernel, OS, WSL vs native, driver, CPU, RAM, GPU — so a server self-describes its real config and host.

3. luce-bench — the benchmark + grading framework

In-tree workspace member (luce-bench/, 0.2.7.dev0) that scores any OpenAI/Anthropic-compatible endpoint and writes versioned, comparable result files. Areas: smoke, ds4-eval (92 reasoning items), gsm8k, truthfulqa-mc1, hellaswag, code, longctx, agent, agent_recorded, forge. Every result stamps a per-area grader_version and a host block (from /props.host, or a clearly-marked client-side fallback for servers without /props).

Run it:

uvx --from 'git+https://github.com/easel/lucebox-hub@feat/lucebox-docker#subdirectory=luce-bench' \
  luce-bench --base-url http://localhost:8080 --model dflash --areas all --no-think

Thinking control is portable. Each request carries three control shapes (chat_template_kwargs.enable_thinking, Anthropic thinking:{type}, reasoning_effort). For servers that ignore the API flags (e.g. OpenRouter), --prompt-thinking-control {auto,on,off} (default auto) injects the model family's in-band token (/no_think, /think); auto fires only when /props shows no server-side enforcement. A post-run verifier records thinking_control_honored so a nothink run that secretly reasoned is flagged, not silently mislabeled.

Comparing results: runs from one grader version are comparable as written. For older snapshots graded by a different version, luce-bench regrade <dirs> re-scores stored outputs at the current pinned grader and refuses to place mismatched-version (or mismatched-host) runs in the same row. report / snapshot / submit-baseline round out the reporting surface.


Also in this PR

  • harness/ — drives real clients (claude_code, codex, opencode, hermes, pi, openclaw) against a running server; lucebox profile delegates bench runs here.
  • Model-card sidecarsshare/model_cards/{qwen3.6-27b,gemma-4-26b-a4b-it,gemma-4-31b-it,laguna-xs.2}.json + _schema.json, so the server resolves sampler defaults, thinking budgets, and the force-close hint per model.
  • Workspacepyproject.toml declares all members (server, lucebox, luce-bench, harness, optimizations/{megakernel,pflash}); [tool.uv.sources] luce-bench = { workspace = true } replaces the prior git-tag pin. release-luce-bench.yml publishes to PyPI on luce-bench-v* tags.
  • Docs — README quick start + hardware/env reference; server/docs/ benchmark-snapshot spec and experiment write-ups.
  • Removes the obsolete server/scripts/bench_*.py (their work now lives in luce-bench).

Out of scope / follow-ups

  • Gemma 4 31B backend wiring beyond what its model card ships (validated empirically @ 24 GB, AR-only).
  • gemma4 MoE expert split.
  • Multi-Token Prediction (upstream, draft).

Validation

  • uv sync clean on the workspace; luce-bench test suite passes.
  • Full --areas all sweeps run end-to-end against bragi (RTX 5090 Laptop), sindri (RTX 3090 Ti), vidar (M2 Ultra / MLX), and OpenRouter, think and nothink, all on one grader version.
  • /props.host confirmed populated on lucebox servers (bragi + sindri report WSL2); OpenRouter nothink confirmed honored via client-side /no_think injection.

@easel easel force-pushed the feat/lucebox-docker branch 3 times, most recently from b5d4cc5 to 3642703 Compare May 27, 2026 18:15
@easel easel force-pushed the feat/lucebox-docker branch 5 times, most recently from f2ddfc4 to 2be3eef Compare May 27, 2026 18:56
@easel
Copy link
Copy Markdown
Collaborator Author

easel commented May 27, 2026

Some commands to test this... copied from the readme.

Install the lucebox wrapper:

curl -fsSL https://raw.githubusercontent.com/easel/lucebox-hub/feat/lucebox-docker/lucebox.sh \
       -o ~/.local/bin/lucebox.sh && chmod +x ~/.local/bin/lucebox.sh

Run lucebox using the docker image

# Override the container image to the temporary build:
export LUCEBOX_IMAGE=ghcr.io/easel/lucebox-hub

# Check your machine for lucebox compatibility
lucebox check

# Start the lucebox server
lucebox serve

Run benchmarks against a local server:

uvx --refresh --from "git+https://github.com/easel/lucebox-hub@feat/lucebox-docker#subdirectory=luce-bench" lucebench --url http://localhost:1236

Run benchmarks against open router

uvx --refresh --from "git+https://github.com/easel/lucebox-hub@feat/lucebox-docker#subdirectory=luce-bench" lucebench --base-url https://openrouter.ai/api --model qwen/qwen3.6-27b --auth-env OPENROUTER_API_KEY

@easel easel force-pushed the feat/lucebox-docker branch from 244257c to f4db35b Compare May 29, 2026 05:16
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
…g#285 CI

CI's "Lint Python surfaces touched by lucebox tooling" job ran
`ruff check .` and found 11 errors across surfaces this branch touches.
Ruff --fix handled 6 (import sorting, unused imports); 5 needed
hand-edits:

  luce-bench/src/lucebench/report.py:172  E741  rename `for l in` → `for lineup in`
  lucebox/tests/test_check.py:39, 95      E731  lambda → def stub() for the two HostFacts stubs
  lucebox/tests/test_cli.py:95            E501  wrap the LUCEBOX_HOST_GPU_LIST_CSV setenv
  lucebox/tests/test_sweep.py:174, 177    E501  wrap two CellResult constructors

22 lucebox tests touched still pass; ruff is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@easel easel marked this pull request as ready for review May 29, 2026 05:23
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Merge PR Luce-Org#285 after it changed from draft to open during the cron run. Resolve refreshed Docker/lucebox/luce-bench conflicts by taking the PR head for feature files while preserving the server include required by the existing integration stack.\n\nValidation:\n- git diff --check\n- python3 -m compileall -q lucebox/src lucebox/tests luce-bench/src luce-bench/tests harness/src\n- uv run --with pytest python -m pytest lucebox/tests luce-bench/tests/test_report.py luce-bench/tests/test_smoke_area.py luce-bench/tests/test_runner.py -q
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Keep the primary checkout clean after integrating PR Luce-Org#285 by ignoring the generated .docker-build/ CMake scratch directory. Update the auto-integration manifest with the final PR Luce-Org#285 merge and validation details.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

17 issues found across 188 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread .github/workflows/release-luce-bench.yml
Comment thread Dockerfile
Comment thread harness/clients/run_lucebench.sh Outdated
Comment thread .github/workflows/docker.yml Outdated
Comment thread harness/src/harness/clients/_common.py Outdated
Comment thread lefthook.yml
Comment thread harness/clients/run_lucebench.sh Outdated
Comment thread lucebox/README.md Outdated
Comment thread luce-bench/NOTICE Outdated
Comment thread luce-bench/src/lucebench/areas/__init__.py Outdated
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Fix the initial PR Luce-Org#285 CI failures by sorting the ruff-reported import block and installing libcurl development headers before the native dflash server CMake build. Refresh the auto-integration manifest with the CI diagnosis.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Integrate the refreshed lucebox Docker stack PR head (09dc0be), which switches Hugging Face metadata lookup to model_info in lucebox download handling.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the clean Luce-Org#285 follow-up merge at 09dc0be and the validation performed after updating the stack.
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
The job-level `permissions` block replaces the workflow-level default
entirely, so `actions/checkout` was running without `contents: read`
and would fail on protected refs. Add `contents: read` back alongside
the existing `id-token: write`.

Addresses cubic #1 on PR Luce-Org#285.
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
- Dockerfile: keep --frozen on the uv sync fallback so the layer can't
  silently resolve outside the lockfile.
- harness/clients/run_lucebench.sh: default LUCEBENCH_THINK empty
  (per-area card defaults govern; --no-think only when explicitly set)
  and default LUCEBENCH_AREA to the level1 capability gate
  (smoke,code,gsm8k,agent,longctx) instead of `all`, which was too broad
  for routine harness runs.

Addresses cubic #2, Luce-Org#3 (P1) and Luce-Org#14 (P2) on PR Luce-Org#285.
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
…appers

- .github/workflows/{ci,docker,release-luce-bench}.yml: pin
  actions/checkout, docker/{setup-buildx,login,metadata,bake}-action,
  and astral-sh/setup-uv to immutable commit SHAs with `# vN` comments
  so the supply chain is reproducible (Luce-Org#4).
- harness/src/harness/clients/_common.py: replace the external `timeout`
  shell-out with `subprocess.run(..., timeout=N)`, return 124 on
  TimeoutExpired to match GNU timeout's exit code (Luce-Org#5).
- scripts/build_image.sh: normalize REGISTRY to end in `/` instead of
  silently producing `ghcr.io/luce-orglucebox-hub` when the trailing
  slash is missing (Luce-Org#6).
- harness/src/harness/clients/pi.py: non-interactive launch now mirrors
  run_pi.sh's validated invocation (--provider, --print, --mode json,
  --tools, --no-session, --offline) and sets PI_CODING_AGENT_DIR /
  PI_CODING_AGENT_SESSION_DIR / PI_OFFLINE (Luce-Org#7).
- docker-bake.hcl: sanitize `+` → `-` in VERSION before composing tags,
  since `+` is not a valid Docker tag character (Luce-Org#8).
- harness/src/harness/clients/hermes.py: set HERMES_HOME + the rest of
  run_hermes.sh's env wiring and call `chat --provider --model
  --accept-hooks --yolo --max-turns --source --query` instead of a bare
  positional prompt (Luce-Org#9, Luce-Org#10).
- harness/src/harness/clients/openclaw.py: apply the OpenClaw config
  patch via `openclaw config patch --file` before the run, and call
  `agent --local --json --model lucebox/<model> --session-id --timeout
  --message` instead of a bare positional prompt (Luce-Org#11).
- pyproject.toml: drop the dead dflash/scripts/{prefix_cache,test_server,
  tool_memory}.py ruff include pins (those paths were renamed during
  the dflash→server rename and then deleted upstream) (Luce-Org#12).
- lefthook.yml: widen the shellcheck/bash-parse glob from `*.sh` to
  `**/*.sh` so scripts under nested dirs (harness/clients/*.sh,
  scripts/*.sh, server/scripts/*.sh) are linted on commit (Luce-Org#13).

Addresses cubic Luce-Org#4Luce-Org#13 (P2) on PR Luce-Org#285. Luce-Org#14 was already addressed in
the previous commit alongside the LUCEBENCH_THINK default fix.
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
- lucebox/README.md: fix the relative link to `cli.py`; resolves to
  `src/lucebox/cli.py` (the actual location), not the nonexistent
  `lucebox/cli.py` (Luce-Org#15).
- luce-bench/NOTICE: the bundled forge_eval LICENSE says
  "Copyright (c) 2025-2026 Antoine Zambelli", not 2024 — sync NOTICE
  with the actual upstream LICENSE (Luce-Org#16).
- luce-bench/src/lucebench/areas/__init__.py: `__all__` was missing
  agent / agent_recorded / forge / longctx / smoke. Add the imports +
  list entries so `from lucebench.areas import *` matches the actual
  area surface (Luce-Org#17).

Addresses cubic Luce-Org#15Luce-Org#17 (P3) on PR Luce-Org#285.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Merge the advanced Luce-Org#285 docker/harness follow-up head into the integration stack and record the refreshed PR classification, conflict probes, retained worktrees, and validation results.
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
…nch in-tree

Squashes 8 commits from feat/lucebox-docker (PR Luce-Org#285) into a single
commit on top of origin/main (8782d07). Net: 189 files changed.

Workstreams folded in:

* Docker prebuild stack: ghcr.io/easel/lucebox-hub:cuda12 image,
  multi-stage Dockerfile with reproducible `uv sync --frozen`,
  docker-bake.hcl with VERSION sanitization for Docker tag charset,
  .github/workflows/docker.yml with SHA-pinned external actions and
  GHA cache, build identity baked into /opt/lucebox-hub/IMAGE_INFO +
  HOST_INFO.

* Host wrapper (lucebox.sh): probe_host, smart cmd_serve (INVOCATION_ID
  guard against systemd self-defeat, container-state preflight),
  cmd_systemctl_passthrough (already-active short-circuit, restart-loop
  detection), cmd_update (bootstrap-installer pattern), cmd_completion
  (bash/zsh/fish), config.toml reader (env > toml > default), all
  shellcheck-clean.

* Bootstrap installer (install.sh): bakes LUCEBOX_INSTALLED_FROM into
  the installed copy so `lucebox update` keeps tracking the channel;
  refuses SHA-pinned URLs without LUCEBOX_INSTALL_CHANNEL.

* In-container Python CLI (lucebox/): sparse config.toml persistence,
  config get/set/unset sub-app, models list/download sub-app
  (replaces download-models), autotune with --apply / --json / --sweep,
  profile collapsed onto luce-bench snapshot (1701 → ~150 lines).
  _load_or_build now respects env > toml > default precedence.

* luce-bench: snapshot subcommand + canonical HostInfo schema v2
  (multi-GPU lineup, WSL detection, source/collector trust metadata) +
  levels (level0/1/2/3) + report subcommand (host column + cross-host
  confounder warnings) + submit-baseline (level3-gated) + regrade.

* Server (C++): /props.host block + props_schema=4 + host_info loader,
  /props.build identity, GGUF metadata + sha256 sidecars, model card
  sidecars. Deleted server/scripts/bench_{agent,he,llm}.py — bench
  machinery moved into luce-bench.

* Harness: client implementations for claude/codex/opencode/hermes/pi
  pointed at the running lucebox server, matched against the validated
  run_*.sh shell wrappers.

Cubic AI code review (17 findings) addressed in full:
  P0: contents: read on luce-bench release job permissions.
  P1: Dockerfile `--frozen` reinstated; LUCEBENCH_THINK default empty
      so per-area defaults apply.
  P2: 6 external actions pinned to immutable SHAs; non-interactive
      timeout via subprocess.run; REGISTRY trailing-slash normalize;
      VERSION + Docker tag charset sanitize; harness pi/hermes/openclaw
      mirrored against run_*.sh wrappers; ruff scan paths corrected to
      server/scripts/; lefthook glob `**/*.sh`; LUCEBENCH_AREA default
      level1.
  P3: lucebox/README.md cli.py link fixed; NOTICE copyright year
      2025-2026; areas/__init__.py __all__ exposes all 10 areas.

CI on PR Luce-Org#285: all 4 checks green (uv workspace, cmake build, cuda12
prebuild, cubic reviewer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@easel easel force-pushed the feat/lucebox-docker branch from ccef455 to 37b3fbd Compare May 29, 2026 15:06
easel added a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
…box-docker)

Brings the Qwen3.6/Laguna think-mode reasoning fix (route reasoning into
reasoning_content channel instead of content) into the lucebox-docker stack.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Merge the advanced Luce-Org#285 head, carrying Qwen3.6/Laguna started_in_thinking propagation through render, parsing, and SSE emission. Resolve the overlap with the existing Qwen3 Jinja closed-think fallback and refresh the auto-integration manifest.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Document the post-push discovery of PR Luce-Org#285's 9a6db60 head, its clean merge into auto-integration, targeted luce-bench validation, and retained worktree paths.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="luce-bench/src/lucebench/cli.py">

<violation number="1" location="luce-bench/src/lucebench/cli.py:692">
P2: `--prompt-thinking-control=on` can be silently disabled by the new card-capability gate, so explicit user override no longer works on unresolved/non-capable cards.</violation>
</file>

<file name="luce-bench/src/lucebench/runner.py">

<violation number="1" location="luce-bench/src/lucebench/runner.py:682">
P2: Continuation request can unintentionally re-enable thinking because `extra_body` overwrites forced disable flags.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread luce-bench/src/lucebench/report.py
# tokens for a thinking-capable card; otherwise force the flag off so
# neither the card nor the family-map fallback injects.
effective_thinking_control = (
prompt_thinking_control if card_is_thinking_capable(model_card) else "off"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: --prompt-thinking-control=on can be silently disabled by the new card-capability gate, so explicit user override no longer works on unresolved/non-capable cards.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At luce-bench/src/lucebench/cli.py, line 692:

<comment>`--prompt-thinking-control=on` can be silently disabled by the new card-capability gate, so explicit user override no longer works on unresolved/non-capable cards.</comment>

<file context>
@@ -665,6 +685,13 @@ def _run_standard_area_to_dir(
+    # tokens for a thinking-capable card; otherwise force the flag off so
+    # neither the card nor the family-map fallback injects.
+    effective_thinking_control = (
+        prompt_thinking_control if card_is_thinking_capable(model_card) else "off"
+    )
+
</file context>

if top_k is not None and top_k > 0:
body["top_k"] = int(top_k)
if extra_body:
body.update(extra_body)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Continuation request can unintentionally re-enable thinking because extra_body overwrites forced disable flags.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At luce-bench/src/lucebench/runner.py, line 682:

<comment>Continuation request can unintentionally re-enable thinking because `extra_body` overwrites forced disable flags.</comment>

<file context>
@@ -384,7 +618,88 @@ def run_case(
+    if top_k is not None and top_k > 0:
+        body["top_k"] = int(top_k)
+    if extra_body:
+        body.update(extra_body)
+
+    headers = {"Content-Type": "application/json"}
</file context>
Suggested change
body.update(extra_body)
body.update(
{
k: v
for k, v in extra_body.items()
if k
not in {
"messages",
"max_tokens",
"stream",
"chat_template_kwargs",
"thinking",
"reasoning_effort",
}
}
)

easel added a commit to easel/lucebox-hub that referenced this pull request Jun 3, 2026
Replaces the single-shot recorded harness with a multi-turn replay,
adds the grading/ subpackage with the llm_judge, multi_turn_cases
fixture, agent_recorded test + extract-agentic-fixture test, and
touches normalize/regrade/cli to support new turn-level metrics.

Also carries the two qwen3.6 and two gemma4 coding-agent-loop sweep
writeups that consume this area.

Depends on: lucebench-harness (edits files introduced there).

Part of the PR Luce-Org#285 split (Luce-Org#285) into
tightly-scoped PRs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Containerization stack for lucebox-hub:

- Dockerfile and docker-bake.hcl define the lucebox-hub container image
  (build-env and runtime stages); scripts/build_image.sh drives local
  builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO
  sidecars that /props schema-3/4 consumes.
- GitHub Actions: .github/workflows/docker.yml builds and publishes
  the image; ci.yml runs the standard checks; release-luce-bench.yml
  handles luce-bench release tagging.
- Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml,
  .gitignore, README.md) live here because the Dockerfile uv-syncs the
  workspace at build time.
- Catch-all dev/CI scripts ship with this stack: card-bundle drift check,
  lucebox-wrapper sandbox check, pflash session bench, ds4 2-case sweep,
  agentic fixture extractor, lucebox.sh smoke test, and the dflash
  eval_quality_compare helper.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…shell wrapper

Lucebox hub CLI (autotune, sweep, profile, smoke, models, config, download,
host-check, docker_run) + lucebox.sh wrapper + install.sh + the harness/ adapter
package (claude_code/codex/hermes/openclaw/opencode/pi adapters used by
autotune sweeps). Carries autotune profile-sweep protocol + bragi tuning
summary docs.

Depends on:
- docker-stack PR (CLI launches the lucebox-hub image and reads IMAGE_INFO)
- lucebench-harness PR (lucebox bench delegates to lucebench)
- server-layer-split PR (autotune presumes layer-split build flags)

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…gime router

User-facing pflash feature that lands four tightly-related pieces:

- ee7 early-exit drafter with score-range gate and tail-capture guard
  (server/src/qwen3/qwen3_drafter.{h,cpp}, server/src/common/score_range.h)
- Anchor-transitive cascade default-on, fixing the 64K NIAH cliff
  (server/src/qwen3/anchor_scan.{h,cpp})
- Regime router scaffolding for swapping pflash composition by regime
  (server/src/common/regime_router.h)
- Adaptive composition via per-request fa_window
  (server/src/server/adaptive_keep_ratio.h)

Ships five accompanying C++ unit tests covering the early-exit score-range
gate, tail-capture guard, warm-path regression, anchor-transitive cascade,
and regime router. Includes the pflash composition / anchor / drafter
design docs and the qwen3.6 pflash A/B and prefix-cache regression
writeups from bragi 2026-05-31.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Backend feature implementing soft-close thinking termination via a
logit-ratio peek, split probe/inject ids, a min_tokens floor, and
max_tokens treated as a response-only budget while thinking is active.

- qwen35 originates the mechanism in qwen35_backend.{cpp,h}.
- gemma4 ports the same mechanism in gemma4_backend.cpp.
- gemma4_internal.h carries the probe/inject hooks and is owned here.
- server_main CLI flags ship with the separate thinking-control API PR.

Backend-only, plus the soft-close design plan doc under
docs/experiments/soft-close-thinking-termination-plan.md.

Part of the PR Luce-Org#285 split (Luce-Org#285) into
tightly-scoped PRs.

Generated with Claude Code
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Parses Gemma's plain-text call:<verb>{} emissions (also accepts the
``_call:`` tokenizer artifact prefix) and renders them as Anthropic
tool_use + tool_result blocks. Isolated to tool_parser.{cpp,h}; the
streaming detection in sse_emitter is owned by the thinking-control PR.

Includes the call-verb tool parser plan and the Gemma4-26B call-verb
parser fix writeup from bragi (2026-05-31).

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…l + /props schema-4

- chat_template prefills closed <think> when thinking off (Qwen3-gated) so the
  model skips the reasoning preamble without losing the assistant turn.
- http_server bumps /props schema from 2 to 4, adding build/model.target/
  model.draft/host blocks so clients can introspect the runtime.
- server_main adds --debug-thinking-logits and --think-soft-close-* flags
  plus image/host-info loaders for richer card-driven boot.
- sse_emitter routes Qwen3.6/Laguna think-mode output to the reasoning_content
  channel so reasoning never leaks into the user-visible content stream.
- Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards, the
  /props OpenAPI doc, thinking-budget spec, and experiments capturing the
  thinking-control protocol/mechanism work.
- test_server_unit gets the matching coverage spanning chat_template prefill,
  /props schema 4, and the reasoning_content routing.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…shell wrapper

Lucebox hub CLI (autotune, sweep, profile, smoke, models, config, download,
host-check, docker_run) + lucebox.sh wrapper + install.sh + the harness/ adapter
package (claude_code/codex/hermes/openclaw/opencode/pi adapters used by
autotune sweeps). Carries autotune profile-sweep protocol + bragi tuning
summary docs.

Depends on:
- docker-stack PR (CLI launches the lucebox-hub image and reads IMAGE_INFO)
- lucebench-harness PR (lucebox bench delegates to lucebench)
- server-layer-split PR (autotune presumes layer-split build flags)

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Backend feature implementing soft-close thinking termination via a
logit-ratio peek, split probe/inject ids, a min_tokens floor, and
max_tokens treated as a response-only budget while thinking is active.

- qwen35 originates the mechanism in qwen35_backend.{cpp,h}.
- gemma4 ports the same mechanism in gemma4_backend.cpp.
- gemma4_internal.h carries the probe/inject hooks and is owned here.
- server_main CLI flags ship with the separate thinking-control API PR.

Backend-only, plus the soft-close design plan doc under
docs/experiments/soft-close-thinking-termination-plan.md.

Part of the PR Luce-Org#285 split (Luce-Org#285) into
tightly-scoped PRs.

Generated with Claude Code
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Parses Gemma's plain-text call:<verb>{} emissions (also accepts the
``_call:`` tokenizer artifact prefix) and renders them as Anthropic
tool_use + tool_result blocks. Isolated to tool_parser.{cpp,h}; the
streaming detection in sse_emitter is owned by the thinking-control PR.

Includes the call-verb tool parser plan and the Gemma4-26B call-verb
parser fix writeup from bragi (2026-05-31).

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…gime router

User-facing pflash feature that lands four tightly-related pieces:

- ee7 early-exit drafter with score-range gate and tail-capture guard
  (server/src/qwen3/qwen3_drafter.{h,cpp}, server/src/common/score_range.h)
- Anchor-transitive cascade default-on, fixing the 64K NIAH cliff
  (server/src/qwen3/anchor_scan.{h,cpp})
- Regime router scaffolding for swapping pflash composition by regime
  (server/src/common/regime_router.h)
- Adaptive composition via per-request fa_window
  (server/src/server/adaptive_keep_ratio.h)

Ships five accompanying C++ unit tests covering the early-exit score-range
gate, tail-capture guard, warm-path regression, anchor-transitive cascade,
and regime router. Includes the pflash composition / anchor / drafter
design docs and the qwen3.6 pflash A/B and prefix-cache regression
writeups from bragi 2026-05-31.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…ate plumbing

Cross-backend C++ refactor that extracts a shared layer_split_backend, a
GGUF inspection helper, the c2_spec_decode_permitted predicate, dim-aware
draft loading, and the PFLASH_COMPRESS_* rename. Touches all backends
(qwen3, qwen35, gemma4), CMake, and bench scripts. c2_gate.h is a backend
predicate (not a user switch) so it lives here.

Independent base for the four feature PRs that follow.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Backend feature implementing soft-close thinking termination via a
logit-ratio peek, split probe/inject ids, a min_tokens floor, and
max_tokens treated as a response-only budget while thinking is active.

- qwen35 originates the mechanism in qwen35_backend.{cpp,h}.
- gemma4 ports the same mechanism in gemma4_backend.cpp.
- gemma4_internal.h carries the probe/inject hooks and is owned here.
- server_main CLI flags ship with the separate thinking-control API PR.

Backend-only, plus the soft-close design plan doc under
docs/experiments/soft-close-thinking-termination-plan.md.

Part of the PR Luce-Org#285 split (Luce-Org#285) into
tightly-scoped PRs.

Generated with Claude Code
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…l + /props schema-4

- chat_template prefills closed <think> when thinking off (Qwen3-gated) so the
  model skips the reasoning preamble without losing the assistant turn.
- http_server bumps /props schema from 2 to 4, adding build/model.target/
  model.draft/host blocks so clients can introspect the runtime.
- server_main adds --debug-thinking-logits and --think-soft-close-* flags
  plus image/host-info loaders for richer card-driven boot.
- sse_emitter routes Qwen3.6/Laguna think-mode output to the reasoning_content
  channel so reasoning never leaks into the user-visible content stream.
- Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards, the
  /props OpenAPI doc, thinking-budget spec, and experiments capturing the
  thinking-control protocol/mechanism work.
- test_server_unit gets the matching coverage spanning chat_template prefill,
  /props schema 4, and the reasoning_content routing.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…gime router

User-facing pflash feature that lands four tightly-related pieces:

- ee7 early-exit drafter with score-range gate and tail-capture guard
  (server/src/qwen3/qwen3_drafter.{h,cpp}, server/src/common/score_range.h)
- Anchor-transitive cascade default-on, fixing the 64K NIAH cliff
  (server/src/qwen3/anchor_scan.{h,cpp})
- Regime router scaffolding for swapping pflash composition by regime
  (server/src/common/regime_router.h)
- Adaptive composition via per-request fa_window
  (server/src/server/adaptive_keep_ratio.h)

Ships five accompanying C++ unit tests covering the early-exit score-range
gate, tail-capture guard, warm-path regression, anchor-transitive cascade,
and regime router. Includes the pflash composition / anchor / drafter
design docs and the qwen3.6 pflash A/B and prefix-cache regression
writeups from bragi 2026-05-31.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Backend feature implementing soft-close thinking termination via a
logit-ratio peek, split probe/inject ids, a min_tokens floor, and
max_tokens treated as a response-only budget while thinking is active.

- qwen35 originates the mechanism in qwen35_backend.{cpp,h}.
- gemma4 ports the same mechanism in gemma4_backend.cpp.
- gemma4_internal.h carries the probe/inject hooks and is owned here.
- server_main CLI flags ship with the separate thinking-control API PR.

Backend-only, plus the soft-close design plan doc under
docs/experiments/soft-close-thinking-termination-plan.md.

Part of the PR Luce-Org#285 split (Luce-Org#285) into
tightly-scoped PRs.

Generated with Claude Code
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…ate plumbing

Cross-backend C++ refactor that extracts a shared layer_split_backend, a
GGUF inspection helper, the c2_spec_decode_permitted predicate, dim-aware
draft loading, and the PFLASH_COMPRESS_* rename. Touches all backends
(qwen3, qwen35, gemma4), CMake, and bench scripts. c2_gate.h is a backend
predicate (not a user switch) so it lives here.

Independent base for the four feature PRs that follow.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Containerization stack for lucebox-hub:

- Dockerfile and docker-bake.hcl define the lucebox-hub container image
  (build-env and runtime stages); scripts/build_image.sh drives local
  builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO
  sidecars that /props schema-3/4 consumes.
- GitHub Actions: .github/workflows/docker.yml builds and publishes
  the image; ci.yml runs the standard checks; release-luce-bench.yml
  handles luce-bench release tagging.
- Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml,
  .gitignore, README.md) live here because the Dockerfile uv-syncs the
  workspace at build time.
- Catch-all dev/CI scripts ship with this stack: card-bundle drift check,
  lucebox-wrapper sandbox check, pflash session bench, ds4 2-case sweep,
  agentic fixture extractor, lucebox.sh smoke test, and the dflash
  eval_quality_compare helper.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Parses Gemma's plain-text call:<verb>{} emissions (also accepts the
``_call:`` tokenizer artifact prefix) and renders them as Anthropic
tool_use + tool_result blocks. Isolated to tool_parser.{cpp,h}; the
streaming detection in sse_emitter is owned by the thinking-control PR.

Includes the call-verb tool parser plan and the Gemma4-26B call-verb
parser fix writeup from bragi (2026-05-31).

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
…l + /props schema-4

- chat_template prefills closed <think> when thinking off (Qwen3-gated) so the
  model skips the reasoning preamble without losing the assistant turn.
- http_server bumps /props schema from 2 to 4, adding build/model.target/
  model.draft/host blocks so clients can introspect the runtime.
- server_main adds --debug-thinking-logits and --think-soft-close-* flags
  plus image/host-info loaders for richer card-driven boot.
- sse_emitter routes Qwen3.6/Laguna think-mode output to the reasoning_content
  channel so reasoning never leaks into the user-visible content stream.
- Ships the model-card _schema.json, qwen3.6-27b and laguna-xs.2 cards, the
  /props OpenAPI doc, thinking-budget spec, and experiments capturing the
  thinking-control protocol/mechanism work.
- test_server_unit gets the matching coverage spanning chat_template prefill,
  /props schema 4, and the reasoning_content routing.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
Containerization stack for lucebox-hub:

- Dockerfile and docker-bake.hcl define the lucebox-hub container image
  (build-env and runtime stages); scripts/build_image.sh drives local
  builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO
  sidecars that /props schema-3/4 consumes.
- GitHub Actions: .github/workflows/docker.yml builds and publishes
  the image; ci.yml runs the standard checks; release-luce-bench.yml
  handles luce-bench release tagging.
- Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml,
  .gitignore, README.md) live here because the Dockerfile uv-syncs the
  workspace at build time.
- Catch-all dev/CI scripts ship with this stack: card-bundle drift check,
  lucebox-wrapper sandbox check, pflash session bench, ds4 2-case sweep,
  agentic fixture extractor, lucebox.sh smoke test, and the dflash
  eval_quality_compare helper.

Part of the PR Luce-Org#285 split (Luce-Org#285) into tightly-scoped PRs.

Generated with [Claude Code](https://claude.com/claude-code)
@easel
Copy link
Copy Markdown
Collaborator Author

easel commented Jun 4, 2026

Superseded by the 8-PR split (#334, #335, #336, #337, #338, #339, #340, #341). All core feature content is captured across those 8 PRs; the umbrella diff additionally contained docs/experiments and dev-script baggage that was intentionally dropped during the polish pass. Closing in favor of reviewing the splits.

🤖 Generated with Claude Code

@easel easel closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant