WIP: Rough draft for updated generic OCI sealing#226
Conversation
1ce192a to
063ff54
Compare
| composefs_oci::signing::FsVeritySigningKey::from_pem(&cert_pem, &key_pem)?; | ||
|
|
||
| // Build subject descriptor from the source image's manifest | ||
| let manifest_json = img.manifest().to_string()?; |
There was a problem hiding this comment.
Hmm we actually need to operate on the raw original representation, can't rely on to_string() always giving us the same thing.
| /// Image reference (tag name) | ||
| image: String, | ||
| /// Path to the OCI layout directory (must already exist) | ||
| oci_layout_path: PathBuf, |
There was a problem hiding this comment.
I think we can use clap(value_parser) into an ocidir directly or so
| /// the container to be mounted with integrity protection. | ||
| /// | ||
| /// Returns a tuple of (sha256 content hash, fs-verity hash value) for the updated configuration. | ||
| pub fn seal<ObjectID: FsVerityHashValue>( |
There was a problem hiding this comment.
Might be cleaner if we do a prep commit that removes the old sealing as we know we're not going to do it anymore.
| /// # Returns | ||
| /// | ||
| /// The number of referrer artifacts exported. | ||
| pub fn export_referrers_to_oci_layout<ObjectID: FsVerityHashValue>( |
There was a problem hiding this comment.
Something like this could land as a prep commit
| use std::fs; | ||
| use std::io::Write; | ||
|
|
||
| let blobs_dir = oci_layout_path.join("blobs").join("sha256"); |
| format!("{seed:02x}").repeat(32) | ||
| } | ||
|
|
||
| fn sample_subject() -> Descriptor { |
There was a problem hiding this comment.
Let's unify this stuff with shared infra to generate an ocidir with known content
361eeb7 to
2f93e4a
Compare
|
This one will need to logically depend on #225 because that one has a lot of hardening for the EROFS parser |
6b676dd to
d226f55
Compare
83ea13e to
13f1957
Compare
0f06a47 to
e0c049a
Compare
The biggest goal here is support for Linux kernel-native fsverity signatures to be attached to layers, which enables integration with IPE. Add support for a fully separate OCI "composefs signature" artifact which can be attached to an image. Now, in discussion it came up that we could re-do an incremental fetch design on top of this, and digging into that I think that really wants a "canonical tar" format. Add RFC/plans for all of that. Assisted-by: OpenCode (Claude Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
Extract sub-second nanoseconds from PAX extension mtime headers. The tar-core crate only keeps the integer seconds; we now read the fractional part ourselves and populate st_mtim_nsec accordingly. Prep for V1 EROFS compatibility where nanosecond timestamps matter for bit-for-bit reproducibility with the C composefs implementation. Assisted-by: OpenCode (Claude Opus 4) Signed-off-by: Colin Walters <walters@verbum.org>
Add expected_v1_id and expected_v1_bootable_id fields to the ContainerImage test struct and pin values for all four fixture images. This extends the digest stability test to cover the V1 EROFS writer alongside the existing V2 coverage. Prep for the V1 EROFS OCI integration. Assisted-by: OpenCode (Claude Opus 4) Signed-off-by: Colin Walters <walters@verbum.org>
Migrate OCI crate callers to the new RepositoryConfig API and add dual-format (V1+V2) EROFS image generation during OCI pull. The V1 kernel cmdline karg uses a new self-describing format: composefs.digest=v1-sha256-12:<hex> composefs.digest=v1-sha512-12:<hex> The value encodes the EROFS format version, hash algorithm, and block size before the digest, mirroring how meta.json uses fsverity-sha256-12. The stable key name composefs.digest works naturally with ConditionKernelCommandLine= and allows multiple entries on the same cmdline for different algorithm/format combinations. The initramfs (composefs-setup-root) parses all composefs kargs from the kernel cmdline in order, then tries to mount each image in sequence — the first image that actually exists in the repository wins. mount_composefs_image_if_exists() maps ImageNotFound to Ok(None), letting the mount loop skip missing images without swallowing real errors (verity mismatch, permissions, etc.). The legacy composefs=<hex> karg continues to work for V2 EROFS. Assisted-by: OpenCode (Claude Opus 4) Signed-off-by: Colin Walters <walters@verbum.org>
fuser 0.17 is needed to support multithreaded FUSE sessions: the new API requires `Filesystem: Send + Sync + 'static`, which forces proper Arc-based ownership of the filesystem state and makes it possible to safely hand the implementation to multiple worker threads. The breaking API changes and how they are addressed: - `&self` instead of `&mut self` on all trait methods: the only mutable state (open file handles) is now protected by a Mutex. - New newtypes (INodeNo, FileHandle, LockOwner, Generation) and bitflags (OpenFlags, FopenFlags) — updated at call sites. - readdir/read offsets changed from i64 to u64. - Session::from_fd now takes SessionACL + Config separately. - Session::run() is no longer public; replaced by spawn().join(). - reply.error() takes fuser::Errno instead of raw i32. To satisfy the `'static` bound, serve_tree_fuse() now takes `Arc<FileSystem>` and `Arc<Repository>`. A pre-built flat Vec<InodeData> (indexed by ino-1) replaces the old HashMap<Ino, InodeRef<'a>>, removing the lifetime that was incompatible with `'static`. An InodeLookup index (path→ino for dirs, LeafId→ino for leaves) handles child ino resolution without raw pointers. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Wire the composefs-fuse crate into cfsctl behind a new `fuse` cargo
feature (on by default) and expose it through both the command line and
the varlink RPC API, with an integration test exercising the FUSE mount
end to end.
CLI surface:
- `cfsctl fuse-serve <image> <mountpoint>` serves an EROFS composefs
image over FUSE from a file on disk.
- `cfsctl oci mount --fuse[=<opts>]` FUSE-serves an OCI image's EROFS
instead of doing a kernel composefs mount, so it works without
fs-verity on the backing store. `--fuse=passthrough` opts into
kernel-bypass reads (Linux 6.9+). Options are parsed via a small
FuseOptions FromStr so the surface can grow without new flags.
Varlink surface:
- `org.composefs.Repository.FuseServe` and `org.composefs.Oci.OciFuseMount`
let a client drive FUSE mounts over the RPC socket. Both take a
`wait` parameter: with `wait=true` the call blocks for the session;
with `wait=false` the FUSE session is detached into a background task
and the call returns once the mount is registered, so a caller can
mount and then go on to use the filesystem.
The privileged_fuse_dumpfile_roundtrip integration test spawns
`cfsctl fuse-serve` as a subprocess, polls for mount readiness via st_dev
change, reads external files directly, and compares the dumpfile produced
by `cfsctl create-dumpfile` over the FUSE mount against the expected
output from write_dumpfile, asserting the FUSE implementation reports
every piece of metadata the dumpfile format captures. Uses
similar_asserts for readable diffs on mismatch.
Assisted-by: OpenCode (claude-sonnet-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Implement readdirplus (combined readdir + lookup in one round-trip), no-op forget (inode table is static for session lifetime), and FOPEN_KEEP_CACHE on open replies. Serve with one thread per logical CPU using FUSE_DEV_IOC_CLONE (clone_fd=true) so each worker gets its own /dev/fuse fd, eliminating per-request channel lock contention. Arc<OwnedFd> allows read() to clone the handle and drop the mutex before calling pread, so concurrent reads on the same file don't serialise. Add FUSE passthrough support (Linux 6.9+): when FuseConfig::passthrough is true and the kernel advertises FUSE_PASSTHROUGH, external file reads are routed directly in-kernel to the repository object fds. Opt-in via FuseConfig because passthrough requires root and a non-tmpfs backing filesystem. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Prep for OCI sealing, which needs the byte block size and hash digest size when validating composefs.* artifact annotations. These mirror the helpers on the (soon-to-be-removed) ComposeFsAlgorithm so signature.rs can use the canonical Algorithm type directly. Assisted-by: OpenCode (Claude Opus 4) Signed-off-by: Colin Walters <walters@verbum.org>
Add a `keyring` feature that exposes `inject_fsverity_cert` and `KeyringError`, backed by `keyutils 0.4` (which provides `Keyring::new` and `keytypes::Asymmetric` needed to add X.509 certificates to the kernel's .fs-verity keyring). The implementation uses `keyutils::Keyring::new` to locate the `.fs-verity` special keyring and `add_key` to inject PEM-decoded DER certificates. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
…est, ioctl) Add `algorithm.rs` (ComposeFsAlgorithm enum for EROFS/signature types), `formatted_digest.rs` (hex-encoded digest with known format), and extend `ioctl.rs` with `fs_ioc_enable_verity_with_sig` to pass a PKCS#7 signature blob when enabling verity. The `fsverity::mod` re-exports `inject_fsverity_cert` from composefs-ioctls under the `keyring` feature, and exposes `enable_verity_raw_with_sig` for callers that have pre-computed the fs-verity descriptor and signature. Adapted for PR#297/306: removed duplicate ComposeFsAlgorithm type; canonical type is composefs::fsverity::Algorithm. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Add three new modules: - `signing.rs`: PKCS#7/openssl-backed `FsVeritySigningKey` for signing fs-verity digests, with PEM cert/key parsing. - `signature.rs`: `SignatureArtifactBuilder` that constructs an OCI artifact manifest containing EROFS layers, PKCS#7 signature blobs, and a config descriptor, implementing the composefs signing spec. Also exposes `sign_image`/`verify_image_signatures` for CLI and varlink consumers, and `parse_signature_artifact` for the verify path. - `referrers.rs`: `find_composefs_artifacts` to fetch OCI referrer manifests locally from repo, used by the verify command. Update `image.rs` / `oci_image.rs` / `boot.rs` / `lib.rs` to adapt to upstream API changes (FormatVersion-aware helpers, OciRefNotFound on ENOENT, containers_image_proxy oci-spec import paths). Add `openssl` as a dep (signing.rs) and optional `oci-client` feature. Adapted for PR#297/306: FormatVersion threading through all digest/image helpers, Algorithm type is composefs::fsverity::Algorithm throughout. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Add OCI sealing and signing workflows to cfsctl: - `oci seal`: Commits EROFS images for all layers and the merged rootfs into the repository. - `oci sign`: Creates a composefs PKCS#7 signature OCI artifact. - `oci verify`: Fetches referrer artifacts and validates EROFS layer digests against embedded signatures. - `oci run` / `oci stop`: Runs a container from a pulled OCI image by generating an OCI runtime spec, mounting a composefs overlay, and invoking crun/runc. - `keyring add-cert`: Injects an X.509 PEM certificate into the kernel's .fs-verity keyring (requires CAP_SYS_ADMIN). - `oci export`: Exports an image to an OCI layout directory. - `oci composefs-digest-karg`: Print composefs kernel cmdline arg. Adapted for PR#297/306: use composefs_oci::sign_image/verify_image_signatures (library fns), ComposefsCmdline API for karg generation, version threading. Assisted-by: OpenCode (claude-sonnet-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
Mirror the OCI sealing CLI surface onto the org.composefs.Oci varlink interface so external callers (and a future cfsctl-as-client) can drive sealing through structured RPC instead of scraping CLI output. Seal and Verify are the primary RPC surface; Sign is included for completeness. Certificate and key material is passed as PEM *content* rather than file paths: the daemon may run in a different user or mount namespace than the client and cannot reliably read client-side files. The private key therefore transits the (local, typically root-owned) Unix socket — noted in the method docs. Mounting and `keyring add-cert` are intentionally not exposed: both need CAP_SYS_ADMIN and operate on the daemon's own filesystem/host view, so the verification gate (Verify, returning a count) is the useful RPC primitive and the mount syscall stays with the caller. Two new OciError variants — InvalidCertificate and SignatureVerificationFailed — let clients distinguish a genuine verification failure from an internal error. The wrong-cert integration test asserts the typed error rather than a bare failure, so a no-op check would not pass. Assisted-by: OpenCode (Claude Opus 4) Signed-off-by: Colin Walters <walters@verbum.org>
|
I've been doing some experimental work on something similar with the current composefs support in podman at podman-container-tools/podman#28658. That code encodes the expected composefs digests in the manifests and assumes it will be reproducibly generated from the tar layers locally. To get trust in that I then have to validate the manifest signature at runtime. I think long term I prefer the approach proposed here of distributing the erofs images directly, particularly because it allows kernel side signatures that can integrate with IMA, etc. However... I wonder if not just signing the erofs image is missing something. Don't we actually want to sign the entire container specification, not just the file content? What about other parts of the image, like exposed ports, volumes, cmdline, etc? With the signature on the erofs blob we leave all the stuff in the manifest and config "unprotected". |
In the current design, the manifest and config are always stored as external objects, and we can then apply fsverity signatures to those as well. It's a bit buried in the spec, but see https://github.com/composefs/composefs-rs/pull/224/changes#diff-def2a4eef2075f93a81da71d729f633c5f748feff036fd3789252ee37cf37dfdR281 |
|
A simple way to say it is, I think with this proposal we don't actually need to sign the manifest via cosign at all - runtime integrity >= transport integrity. But of course in practice most use cases would do so, because you still want the transport integrity for deployments that aren't using runtime integrity yet. |
This is just some rough draft raw material that builds on: