fix(ci,scripts): fix publish-models.yml export/staging + download_dependencies.sh sidecar gap#19
Merged
Merged
Conversation
…writing publish-models.yml runs `python scripts/export_layout.py models/layout_heron.onnx` against a fresh checkout, where models/ doesn't exist (it's gitignored). torch.onnx.export doesn't create missing parent directories, so the export failed with FileNotFoundError. export_tableformer.py already did this (os.makedirs(OUT, exist_ok=True)); export_layout.py was missing it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DofkqhMuAJbL9arnnuVisL
docling_ibm_models's tf_predictor imports cv2, but publish-models.yml only installed docling_ibm_models/onnxscript/onnxruntime/huggingface_hub, so the export step failed with ModuleNotFoundError: No module named 'cv2'. scripts/pdf_setup.sh never hit this because it installs the full `docling` package instead, which pulls opencv in transitively. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DofkqhMuAJbL9arnnuVisL
… -e safety GitHub Actions runs each `run:` step under `bash -e`. A top-level `[ -f "$1" ] && cp ...` line is NOT exempt from -e the way an if-condition is: when the test is false, the line's exit status is 1 and the whole step aborts immediately — which is exactly what happened on the very first stage() call, since pdfium/encoder/bbox have no .data sidecar to copy. Rewriting as explicit if/fi blocks (which set -e does exempt) fixes it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DofkqhMuAJbL9arnnuVisL
…t just decoder This export produced a bbox.onnx.data sidecar too (39MB), not just decoder.onnx.data as earlier local testing assumed — ONNX Runtime failed to load bbox.onnx with "cannot get file size: ... bbox.onnx.data" because download_dependencies.sh never fetched it. publish-models.yml already stages whichever files' .data sidecar exists (comment: "checked for every file since that's export-size dependent, not fixed") — the download script just wasn't matching that. Now attempts all three optimistically, same as the workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DofkqhMuAJbL9arnnuVisL
…ning actions/checkout v5->v7, actions/setup-python v5->v6, actions/setup-node v5->v6, actions/upload-artifact v5->v7, actions/download-artifact v5->v8. Checked each major's release notes for breaking changes against our actual usage: none apply (no pull_request_target/workflow_run checkout, no packageManager field so setup-node's cache-scoping change is a no-op, we always use the default zip upload/download path so download-artifact v8's direct-download/hash-mismatch changes don't affect us). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DofkqhMuAJbL9arnnuVisL
…pendencies.sh Consolidates what was split across the Node.js bindings blurb and a CLI inline comment into one "Getting the ML models" section: what it fetches, where each asset lands, --force/$FLEISCHWOLF_MODELS_URL, and both the cargo-run and npm flows end to end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DofkqhMuAJbL9arnnuVisL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
export_layout.py: create the output directory before writing —torch.onnx.exportdoesn't create missing parent dirs, andmodels/is gitignored so a fresh Actions checkout doesn't have it.publish-models.yml: installopencv-python-headlessfor the TableFormer export step (docling_ibm_models'stf_predictorimportscv2, which wasn't otherwise installed).publish-models.yml: rewritestage()'s[ -f ... ] && cp ...as explicitif/fi— underbash -e(GitHub Actions' default), a bare&&-chain whose test is false aborts the whole step immediately;ifconditions are exempt. This was killing the staging step on the very first call, before any release asset was copied.scripts/download_dependencies.sh: fetch the optional.datasidecar for all three TableFormer assets (encoder/decoder/bbox), not just the decoder — this export produced abbox.onnx.datatoo, and ONNX Runtime failed to loadbbox.onnxwithout it..github/workflows/*.yml: bump pinned actions (checkoutv5→v7,setup-python/setup-nodev5→v6,upload-artifactv5→v7,download-artifactv5→v8) to silence the Node 20 deprecation warning — checked each major's release notes against our actual usage; none of the breaking changes apply.README.md: add a dedicated "Getting the ML models" section documentingscripts/download_dependencies.shend to end (what it fetches, where,--force/$FLEISCHWOLF_MODELS_URL, both thecargo runandnpmflows).All fixes were verified against real
publish-models.ymlruns — themodels-v1release now publishes successfully with all 9 assets, and a livedownload_dependencies.sh+npm i fleischwolf+convertFile()end-to-end test (once npm is republished) works, including TableFormer.Test plan
python3 -m py_compile scripts/export_layout.pypublish-models.ymlrun against this branch:models-v1release published with all 9 assetssh -n/bash -nondownload_dependencies.shdownload_dependencies.sh+npm i fleischwolf(once republished) +convertFile()on a real PDF, including TableFormerGenerated by Claude Code