CowCow is a Rust-based priority data plane for physical AI. It helps edge devices capture, verify, rank, and synchronize high-volume data when connectivity is unreliable.
In plain terms: CowCow decides what edge AI data deserves bandwidth first.
Robots, vehicles, drones, cameras, phones, and field devices can collect more data than teams can upload right away. CowCow keeps the raw data local, breaks it into verifiable chunks, explains why some samples should move first, and resumes safely after Wi-Fi or processes fail.
Physical AI teams do not just need more data. They need to know which data is urgent, trustworthy, private, corrupted, repetitive, or worth reviewing.
CowCow is built for the boring but important part of that workflow:
- ingest files without moving the originals
- compute fast BLAKE3 hashes
- split files into content-addressed chunks
- keep an append-only event journal
- run cheap local quality checks
- assign explainable sync priority
- hold data blocked by policy or privacy
- sync important chunks first
- resume after interrupted transfers
- prove local and remote hashes match
- generate dataset and integrity reports
CowCow does not delete raw data automatically.
CowCow is not a dashboard. It is not a data labeling tool. It is not another dataset manager.
It also does not replace MCAP, Zenoh, ReductStore, rclone, restic, lakeFS, or NVIDIA Holoscan. CowCow sits around the edge data flow as the local-first priority, provenance, and integrity layer.
- Rust 1.70 or newer
- Git
git clone https://github.com/deepubuntu/cowcow.git
cd cowcow
cargo testThe integration tests cover interrupted sync, deleted-remote resume, and the fixture pack pipeline.
CowCow ships a small multimodal fixture pack for local testing (synthetic payloads, not real sensor recordings):
./scripts/generate-fixtures.shSee examples/fixtures/README.md for layout and roles (video, LiDAR-like .bin, images, telemetry, held/private sample).
rm -rf fleet-demo remote
cargo run -p cowcow-cli -- init fleet-demo
cargo run -p cowcow-cli -- ingest ./examples/fixtures --project fleet-demo
cargo run -p cowcow-cli -- chunk --project fleet-demo --chunk-size 64kb
cargo run -p cowcow-cli -- qc --project fleet-demo
cargo run -p cowcow-cli -- score --project fleet-demo
cargo run -p cowcow-cli -- doctor --project fleet-demo
cargo run -p cowcow-cli -- sync ./remote --project fleet-demo --resume --priority urgent,high,normal
cargo run -p cowcow-cli -- verify --project fleet-demo
cargo run -p cowcow-cli -- report --project fleet-demoRe-running ingest on the same files reports duplicates instead of silently doing nothing:
Ingested 0 new sample(s), skipped 9 duplicate(s), scanned 9 file(s)
rm -rf remote
cargo run -p cowcow-cli -- simulate network-failure ./remote --project fleet-demo --after-chunks 1The goal:
100GB in.
Wi-Fi dies.
Process dies.
CowCow resumes.
Important clips moved first.
Hashes match.
Manifest proves chain of custody.
| Command | Purpose |
|---|---|
init <project> |
Create project layout and cowcow.yml |
ingest <path> --project <dir> |
Register files; reads manifest.jsonl in the ingest folder when present |
chunk --project <dir> --chunk-size 64mb |
Fixed-size chunks + BLAKE3 hashes |
qc --project <dir> |
Local quality checks (pass / warn / fail, never auto-delete) |
score --project <dir> |
Rule-based priority + sync class |
doctor --project <dir> |
Project health: samples, chunks, missing files |
sync <dest> --project <dir> --resume --priority urgent,high,normal |
Priority filesystem sync |
verify --project <dir> |
Local chunk + manifest integrity |
report --project <dir> |
Dataset, sync, and integrity reports |
simulate network-failure ... |
Interrupt sync, then resume |
Project layout:
project/
cowcow.yml
.cowcow/
journal.jsonl
chunks/
manifests/
queue/
data/ metadata/ qc/ reports/ exports/
Phase 1 is the active release line: prove correctness under failure before optimizing for speed or AI.
Shipped:
- Rust workspace (
crates/cowcow-*) - Multimodal ingest with hash deduplication
- Fixture pack +
manifest.jsonlmetadata overlay - Fixed chunking, BLAKE3 verification
- JSONL manifests, journal, sync state
- Rule-based priority scoring and policy hold
- Local filesystem sync with destination-aware resume
doctor,verify, reports- Integration tests
Good for: internal testing, demos, pilot scripts on edge laptops.
Not yet production for fleets: cloud object storage (S3), compression at scale, real sensor decode (ffprobe / MCAP), ops dashboards.
| Priority | Item |
|---|---|
| High | S3-compatible sync for real uploads |
| Medium | zstd compression, FastCDC chunking |
| Medium | MCAP import/export, Parquet metadata |
| Later | Local AI adapters (Ollama, ONNX) for triage |
| Later | P2P / store-and-forward sync |
| Later | Factory-scale curation exports |
cargo check
cargo test
cargo fmt
cargo test -p cowcow-cli --test offline_resume
cargo test -p cowcow-cli --test fixtures_pipeline