Skip to content

feat: add agent-friendly Modly CLI#144

Open
jokrasno wants to merge 6 commits into
lightningpixel:devfrom
jokrasno:cli-tool-dev-pr
Open

feat: add agent-friendly Modly CLI#144
jokrasno wants to merge 6 commits into
lightningpixel:devfrom
jokrasno:cli-tool-dev-pr

Conversation

@jokrasno
Copy link
Copy Markdown

Adds a simple CLI tool so AI agents can have an easier time interacting with modly. Includes a SKILL.md file as well.

@DrHepa
Copy link
Copy Markdown
Contributor

DrHepa commented May 22, 2026

I think the idea is useful: Modly does need an agent-friendly, JSON-first way to automate common operations without driving the UI manually.

After reviewing this against the current dev branch and the existing external MCP/CLI direction, my main concern is not the usefulness of the tool, but the contract it creates.

Right now this PR adds a new Python CLI surface that talks mostly to existing REST endpoints such as:

  • /health
  • /model/*
  • /generate/from-image
  • /generate/status/{job_id}
  • /generate/cancel/{job_id}
  • /export/{format}

That is practical, but it also risks becoming a third parallel automation path next to:

  1. the in-app Agent Chat (/agent/chat + Ollama tool-calling),
  2. the existing MCP surfaces / modly-cli-mcp integration shown in Settings,
  3. the newer workflow-runs / process/capability direction.

I would suggest keeping the useful CLI ergonomics, but aligning the contract before treating this as an official automation interface.

Concretely, I think the CLI should be shaped around canonical primitives:

health
model
workflow-run
process-run
capability

And keep the current /generate/* flow as legacy compatibility rather than the primary path.

1. Prefer workflow runs for generation

Instead of making generate primarily wrap:

POST /generate/from-image
GET /generate/status/{job_id}

prefer:

POST /workflow-runs/from-image
GET /workflow-runs/{run_id}
POST /workflow-runs/{run_id}/cancel

The JSON output should include recovery metadata so agents can resume/poll safely:

{
  "ok": true,
  "run": {
    "kind": "workflowRun",
    "id": "..."
  },
  "meta": {
    "status_command": "workflow-run status ...",
    "legacy": false
  }
}

generate can still exist as a friendly alias, but it should be clearly documented as either canonical-through-workflow-run or legacy if it falls back to /generate/*.

2. Use process/capability discovery instead of heuristics

The current approach appears to infer some behavior from model names / labels, especially around texture/refine-style flows.

For agents, that is fragile. The CLI should avoid guessing hidden capabilities from strings and instead use explicit discovery when available, e.g.:

GET /automation/capabilities
POST /process-runs
GET /process-runs/{run_id}

If a process is not available through a canonical contract, it should fail closed:

{
  "ok": false,
  "code": "UNSUPPORTED_PROCESS",
  "message": "This process is not available through the canonical process-run contract."
}

3. Validate model IDs canonically

Any model_id should be validated against:

GET /model/all

If --model auto remains, it should use verified defaults/capability metadata, not string matching or first available model behavior.

4. Clarify readiness boundaries

GET /health should happen before business operations.

Also, starting FastAPI through serve / ensure-server should be clearly marked as developer-only. A FastAPI-only process does not imply Electron/Desktop bridge readiness, scene operations, extension process execution, or full workflow support.

I would suggest moving these commands under something like:

dev serve-api
dev ensure-server

or documenting them explicitly as non-canonical dev helpers.

5. Keep ComfyUI out of the canonical Modly agent contract

The ComfyUI integration may be useful, but it feels like an external orchestration helper rather than part of the core Modly automation contract.

I would suggest moving it under:

experimental comfy-image

or splitting it from the main supported CLI surface.

6. Align with existing MCP direction

Settings > Agent already points users toward modly-cli-mcp / modly-mcp for external agents. To avoid confusing users and future maintainers, this CLI should either:

  • explicitly stay as a repo-local/dev helper, or
  • align its naming and JSON semantics with the existing MCP concepts: workflowRun, processRun, capability, recovery metadata, and fail-closed unsupported operations.

The important part is avoiding a second canonical-looking automation contract with different semantics.

7. PR hygiene

Since the feature is Python-only, the package-lock.json changes look unrelated unless there is a specific reason for them. I would recommend removing that noise from this PR.

Overall: I like the goal and I think the CLI can be valuable. I would just prefer to land it with a clearer contract boundary:

  • canonical: workflow-run, process-run, capability
  • legacy: /generate/* and old job polling
  • dev-only: server startup helpers
  • experimental: ComfyUI orchestration

That would preserve the useful agent ergonomics while keeping Modly automation aligned with Agent Chat, MCP, and future workflow/process execution.

@jokrasno
Copy link
Copy Markdown
Author

Sounds good! I’m going to keep the CLI ergonomics but narrow the contract so it aligns with Modly’s canonical automation direction. I’ll make workflow runs the primary generation path, include recovery metadata in JSON output, validate explicit models through /model/all, remove texture/refine string heuristics, make process/capability commands fail closed when the canonical server contract is unavailable, move headless API startup under dev, move ComfyUI helpers under experimental, and keep /generate/* only as explicit legacy compatibility. I’ll also remove the unrelated package-lock.json changes and ensure generated E2E artifacts stay ignored and out of history.

@jokrasno
Copy link
Copy Markdown
Author

jokrasno commented Jun 2, 2026

Fixes applied @DrHepa! Let me know what you think :)

@lightningpixel
Copy link
Copy Markdown
Owner

lightningpixel commented Jun 2, 2026

Hey ! Good work, but I still have a few points to ask you about before merging.

Required before approval

  1. Strict workspace-path validation (security).
    Since this tool is designed to be driven by autonomous agents, any path the CLI accepts or derives must be rejected if
    it's absolute, contains .. traversal, or is otherwise not workspace-relative. Right now nothing is validated:
  • cmd_export forwards --path as-is into /export/{fmt}?path=... — --path ../../etc/passwd or an absolute path goes
    straight to the server.
  • _workspace_relative_path derives the path from the server-returned output_url (incl. unquote), so a malicious/buggy
    server response would be re-requested verbatim.

A single choke point in _export_workspace_path (called by cmd_export, _generate_one, and cmd_legacy_generate) covers
all cases; adding the same guard in _workspace_relative_path gives defense-in-depth. Please reject absolute paths,
leading / or , Windows drive letters (C:), and any .. component, with a clear INVALID_WORKSPACE_PATH JSON error —
and add a couple of tests for those cases.

  1. Guarantee the "always parsable JSON" contract on unexpected errors.
    main() only catches ModlyCliError and KeyboardInterrupt. A few paths raise un-wrapped exceptions that produce a
    traceback instead of JSON, which breaks the promise made to agents:
  • _multipart_form → file_path.read_bytes(): OSError/PermissionError on the input image.
  • _parse_params → Path(params_file).read_text(): OSError on an unreadable params file.
  • Module load: int(os.environ.get("MODLY_CLI_TIMEOUT", "1800")) raises ValueError at import if the env var is
    malformed.

Suggest a final except Exception as exc: in main() emitting {"ok": false, "code": "UNEXPECTED_ERROR", "message":
str(exc)} and returning 1.

Small fix

  1. process-run recovery metadata is inconsistent. _recovery_meta injects --base-url into the status/cancel
    commands when the URL differs from the default, but cmd_process_run_start builds those commands hardcoded without
    --base-url, so an agent on a custom base URL gets non-replayable recovery commands. Please route it through
    _recovery_meta.

Please confirm

  1. The README rename launcher.bat → launch.bat is bundled into this feature PR — is it intentional and does it match
    the actual filename? If unrelated, it'd be cleaner as its own commit.
  • Hidden aliases (comfy-image, generate-from-workflow, serve, ensure-server) re-declare their arguments manually — a
    shared registration helper would prevent the two copies from drifting.
  • sub._choices_actions = [...] relies on an argparse private attribute; a short # relies on argparse internals comment
    would help future readers.
  • cmd_batch restores args.format without try/finally (dirty namespace if a non-ModlyCliError is raised).

@jokrasno
Copy link
Copy Markdown
Author

jokrasno commented Jun 2, 2026

Addressed these in 9c92bca:

  • Added strict workspace-relative path validation with INVALID_WORKSPACE_PATH for export paths and server-derived output paths
  • Added JSON fallback for unexpected exceptions via UNEXPECTED_ERROR.
  • Hardened malformed timeout/poll env var handling at import
  • Fixed process-run recovery metadata to preserve custom --base-url.
  • Added tests for the security/error-handling cases
  • Confirmed launch.bat is the actual filename in the repo
  • Also added the argparse internals comment and cmd_batch try/finally cleanup

Tests pass: python tools/modly-cli/test_agent.py now runs 30 tests successfully

Let me know if there's anything else :)

@lightningpixel
Copy link
Copy Markdown
Owner

Almost there !

One last consistency gap before I approve. In _workflow_workspace_path(), the output_url branch is validated but the
scene_candidate.workspace_path branch is returned raw:

 def _workflow_workspace_path(status: dict[str, Any]) -> str:
      scene_candidate = status.get("scene_candidate")
      if isinstance(scene_candidate, dict) and scene_candidate.get("workspace_path"):
          return str(scene_candidate["workspace_path"])        # not validated
      output_url = str(status.get("output_url") or "")
      if output_url:
          return _workspace_relative_path(output_url)          # validated

Why it matters: _export_workspace_path re-validates downstream, so a real export still rejects a malicious value. But
when export is skipped (--no-export, or workflow-run start --wait), this raw path is surfaced in the output JSON
workspace_path field as if trusted, and could be reused by an agent in a later call.

Suggested fix — validate at the source so both branches behave consistently:

return _validate_workspace_path(str(scene_candidate["workspace_path"]))

A small test covering a scene_candidate.workspace_path containing .. (expecting INVALID_WORKSPACE_PATH) would round it
out

@jokrasno
Copy link
Copy Markdown
Author

jokrasno commented Jun 2, 2026

Alright, covered it!

I appreciate your patience with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants