RFC: Fil One Object Encryption#9
Conversation
832ac27 to
80a1a9c
Compare
hannahhoward
left a comment
There was a problem hiding this comment.
LGTM -- rarely do I respond "no notes" but this is an excellent RFC, well specified, clear -- only thing to consider (and we can layer this once everything is up and running) is what it would look like to use Dilithium or other quantum resistant algorithm for the asymmetric key.
pyropy
left a comment
There was a problem hiding this comment.
Great RFC, and a very nice design with the double wrapping!
bajtos
left a comment
There was a problem hiding this comment.
Great spec!
I have a few concerns to consider; please see my comments below.
| - Ingot requests the tenant public key from Hilt (through a cache). | ||
| 3. Ingot builds a FEE header with the tenant as the recipient (and thus containing the tenant-wrapped CEK). | ||
| 4. Ciphertext (header ‖ payload) is pushed to Forge via Guppy/Sprue. | ||
| 5. Ingot asks its KMS to wrap the CEK with its own region key. |
There was a problem hiding this comment.
Discussion point: Is it good enough to have a single key shared by the entire region? Aurora has either a per-tenant or a per-bucket master encryption key (MEK).
Benefits of having more granular MEKs:
- When we want to delete a tenant or a bucket, it's enough to delete the MEK; we don't need to remove all individual CEKs. (I.e. delete the MEK immediately; remove individual CEKs asynchronously in the background as part of the garbage-collection/space-reclaiming process).
- More granular key rotation. With per-tenant or per-bucket keys, we can perform the key rotation gradually - rotate one MEK at a time, with fewer CEKs to re-encode using the new MEK.
There was a problem hiding this comment.
It's a good question. Luckily, we can delay the final answer at least for a bit, since it doesn't impact the stored blobs, but I wouldn't want to get too deep on the Ingot implementation before resolving it. I'll add it to the open questions.
There was a problem hiding this comment.
Just to clarify, when we say "region key" here we mean the private key for the Ingot service? Or a different key?
I find it terrifying that if the Ingot key is compromised the attacker can decrypt all data in all buckets for all tenants. I would suggest that we have a per-tenant KEK at minimum but I appreciate that's way more complicated since now Ingot has to have a notion of tenants and be able to map from request -> tenant every time it receives an object read/write request.
It might actually be simpler for Ingot to have a per-bucket KEK, since the bucket (space) DID will be the subject of the delegations we need to read/write into Forge, which we will already have in cache or need to fetch anyway.
There was a problem hiding this comment.
Just to clarify, when we say "region key" here we mean the private key for the Ingot service?
Yes, the private key used by the Ingot service to encrypt individual CEKs.
I am referring to this key as MEK, but your name KEK works for me too!
There was a problem hiding this comment.
I find it terrifying that if the Ingot key is compromised the attacker can decrypt all data in all buckets for all tenants.
Yes, I share the same concern!
It might actually be simpler for Ingot to have a per-bucket KEK
Sounds like we have the answer now :)
|
|
||
| Notes: | ||
|
|
||
| - No data (plaintext or ciphertext) is buffered to disk, for security and for performance. Everything is streamed. |
There was a problem hiding this comment.
I am wondering, would it make sense to relax this aspect and allow the S3 gateway to cache the frequently accessed content in a local disk cache to get better GetObject performance?
Such a feature is beyond v1. I just want to flag it as something we may want to implement in the future - it would be great if the design did not prevent us from doing so.
There was a problem hiding this comment.
Yeah, that's fair. I'll make a note that it's not strictly disallowed.
|
|
||
| DELETE destroys every segment's **region-wrap row** in Ingot's DB. Subsequent GET/HEAD return `NoSuchKey`, and the blobs are queued for the future true-deletion mechanism. That's the cryptoshred: the region can no longer read the object, immediately. | ||
|
|
||
| **What survives:** the **tenant recipient in the envelope** still exists wherever the blob still exists on the network, and Hilt's tenant key still exists (it's shared across the tenant's objects, so it can't be destroyed per-object). So a deleted object remains recoverable _by Fil One_ until the blob itself is truly deleted. Because this is not a normal read path, this should be acceptable. |
There was a problem hiding this comment.
So a deleted object remains recoverable by Fil One until the blob itself is truly deleted. Because this is not a normal read path, this should be acceptable.
Please check with Product and Legal if this is truly acceptable - we need to be compliant with various regulations like GDPR.
See also my previous comment.
There was a problem hiding this comment.
Alternatives to consider:
- A per-tenant key.
- A per-bucket key.
- When cryptoshredding a bucket containing files, we can short-circuit the process and simply remove the per-bucket key in Hilt.
- The downside is that when closing down the entire account (tenant), we need to delete all per-bucket keys of that tenant.
- A per-object key to get true cryptoshredding at object level
There was a problem hiding this comment.
Yes, I still have to raise this with Product to be sure. The delay is only ~24 hours, though, so my hope is that that's fine—immediately inaccessible and truly unrecoverable after 24 hours. Will need to confirm.
There was a problem hiding this comment.
I think a bucket key makes some sense. I really want to avoid a per-object key, because currently Hilt knows nothing about individual objects, and making it aware of them would put it in the middle of a whole lot of process that it's currently able to stay out of.
There was a problem hiding this comment.
I didn't realise we can delete data within 24 hours in the PDP world. I was mistakenly thinking about the PoRep world, where sectors live for 6 months.
I propose adding another open question: if we decide to use PoRep for disaster recovery/cold archival of data stored in Forge, how do we enable timely cryptoshredding of those archival copies? I think there is so much uncertainty about how PoRep integration would work, and whether it will even happen, that we don't need to worry about the encryption-related details now.
Having said that, I agree with what you wrote. We should avoid per-object keys if feasible; per-bucket keys seem to be a nice middle ground.
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **Plaintext CID in the envelope's `app_metadata`.** This is an optional field in the FEE proposal. Putting it in exposes what the plaintext is to anyone who's seen that plaintext somewhere before and might know its CID. It also allows someone to correlate two encrypted blobs, confirming that they hold the same data without knowing what it is. Neither of these seem like particularly worrying attacks. On the other hand, there isn't a clear argument in _favor_ of putting it in. |
There was a problem hiding this comment.
On the other hand, there isn't a clear argument in favor of putting it in.
When in doubt, leave it out?
IMO, the less information about our customers' data we leak to the open, the better.
Also:
Forge stores opaque blobs it can't read
If the FEE includes the plaintext CID, then the blog stored in Forge is less opaque. Sure, Forge (or PDP activity observer) cannot see the plaintext, but as you pointed out, the plaintext CID does leak bits of information.
|
|
||
| The plaintext CIDs are also in the bucket entries. If those remain in MSTs and we continue to plan not to encrypt the MSTs, we have plaintext CIDs hanging around in plaintext anyway. | ||
|
|
||
| 2. **AES vs X25519 for the region key.** There is _one_ use case for the region key being an asymmetric keypair. If Ingot's wrapped keys were lost, or if a blob were moved or replicated to another region (not currently planned, but in the realm of things we might want), we'd need to restore Ingot's wrapped CEK. Under an asymmetric scheme, Hilt could encrypt the CEK with the region's public key and send it that. Is that enough to change this decision? |
There was a problem hiding this comment.
Under an asymmetric scheme, Hilt could encrypt the CEK with the region's public key and send it that.
I think there are other ways to ensure secure transmission of the CEK from Hilt to Ingot when moving or replicating a blob. Isn't a regular HTTPS connection secure enough? (I.e. can we transmit the CEK as plaintext over a TLS-secured connection? Does the second layer of encryption - CEK encrypted with the region's public key - move the needle in a meaningful way?)
Is that enough to change this decision?
IMO: it's not enough.
There was a problem hiding this comment.
At a gut level, I don't love moving around unencrypted keys over a network, but I suppose TLS is supposed to make that okay. And we're talking about blob-specific CEKs, not wrapping keys, which are more dangerous. Also, I'd love a reason to keep it the way it is, so I'm with you. 😄
Yeah, I think we'd be looking at ML-KEM ("Kyber") here, which is (AFAICT) the key-wrapping equivalent to Dilithium's signature scheme (as X25519 is to Ed25519). We should be able to layer it later, yeah. A few things aren't quite mature yet. COSE doesn't have support for it yet, but it's working on it: Migration should look like Tier 2 tenant key rotation, just generating a different kind of key. The |
* Fixed terminology mixup of "cyphertext" and "envelope". * Made explicit note that regions can be allowed to cache during retrieval, we just need to decide to do it mindfully. * Added open question on single region key vs per-tenant or per-bucket keys. * Added open question on per-tenant canonical (at Hilt) keys, vs per-bucket keys.
There was a problem hiding this comment.
Pull request overview
This PR introduces an RFC describing the proposed encryption-at-rest design for Fil One objects stored via Forge, with encryption handled by Ingot using the Filecoin Encrypted Envelope (FEE) format and a dual-wrap key strategy (tenant/Hilt for recovery + region for read-path independence).
Changes:
- Adds an end-to-end design for object encryption using FEE (chunked AES-256-GCM STREAM) with per-object CEKs.
- Specifies key management and storage responsibilities across Ingot (region wrap) and Hilt (tenant wrap / recovery).
- Outlines operational flows for PUT/GET/DELETE, multipart behavior, and rotation tiers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Tenant KEK | X25519 | Tenant | Hilt DB (public key only) | Wraps the Blob CEK for the region-independent decryption. | | ||
| | Tenant-KEK(Blob-CEK) | ECDH-ES+A256KW Result | Blob | FEE Header at start of blob, in Forge | Unwrapped by tenant for region-independent decryption. | | ||
| | Hilt Root KEK | AES-256 | Hilt (i.e., exactly one) | FilOne secure storage[^secure-storage] | Seals the KEKs in the Hilt DB. | | ||
| | Hilt-Root-KEK(Tenant-KEK) | A256KW Result | Tenant | Hilt DB. | Unsealed by Hilt to unwrap keys as tenant. | |
There was a problem hiding this comment.
This suggestion seems relevant to me.
|
|
||
| **Single wrap (region only).** Cheapest, but no recovery: a region that loses its key loses its customers' data permanently, and Fil One has no backstop. | ||
|
|
||
| **Asymmetric X25519 region and Hilt keypairs instead of symmetric AES keys.** These keys are only used to wrap values for storage at rest locally: the CEKs and the tenant keys, respectively. No other party needs to encrypt with these, so we don't need a public key. Using AES gives us better compatibility with key management systems (especially hardware security modules) which can hold a non-exportable key and en/decrypt for us on demand. Conversely, the _tenant_ key is used by the region for encryption, so we want a public key it can use, and to hold the (encrypted) private key in Hilt. (But: see Open Question #2.) |
| | Tenant KEK | X25519 | Tenant | Hilt DB (public key only) | Wraps the Blob CEK for the region-independent decryption. | | ||
| | Tenant-KEK(Blob-CEK) | ECDH-ES+A256KW Result | Blob | FEE Header at start of blob, in Forge | Unwrapped by tenant for region-independent decryption. | | ||
| | Hilt Root KEK | AES-256 | Hilt (i.e., exactly one) | FilOne secure storage[^secure-storage] | Seals the KEKs in the Hilt DB. | | ||
| | Hilt-Root-KEK(Tenant-KEK) | A256KW Result | Tenant | Hilt DB. | Unsealed by Hilt to unwrap keys as tenant. | |
There was a problem hiding this comment.
This suggestion seems relevant to me.
| 3. **One region key vs many.** This proposal describes a single key for the entire region. We can further limit the blast radius by keeping a key per tenant or per bucket. This complicates the interaction with the KMS slightly, but it's a reasonable tradeoff. We also get to decide this late (or change our minds), because this doesn't impact the envelopes at all. We can migrate by re-wrapping the Ingot DB's contents, just like in region key rotation. | ||
|
|
||
| 4. **One key per tenant vs one key per bucket.** This proposal associates the canonical wrapping key with a _tenant_. We could use a different key per _bucket_ instead. That does a few things: it limits the blast radius of exposing a single key, it makes remediation of an exposure easier by limiting what we need to Tier-2 rotate, and it makes bucket transfer between tenants easier (which has not been planned or discussed). The downside is pretty much that there are more keys to manage. This one is harder to decide later, because it does impact the envelope, but as in Tier 2 rotation, we _do_ get a hand in the bookkeeping from the `kid` in the FEE header, so we'll never lose track of which key is correct. Still, better to pick this up front. |
There was a problem hiding this comment.
Let's clarify what key we are discussing in which paragraph - is it the Ingot-custodied key or Hilt-custodied key?
3. **Ingot: One region key vs many**. (...)
4. **Hilt: One key per tenant vs one key per bucket**. (...)
alanshaw
left a comment
There was a problem hiding this comment.
I think in general this sounds good pending some clarifications, and resolutions to the outstanding questions.
|
|
||
| Data we store in Forge is exposed to the Filecoin network, so it must be encrypted at rest. **Encryption is entirely Ingot's job: the S3 client hands Ingot plaintext, Ingot encrypts, and Forge stores opaque blobs it can't read.** Forge itself stays unaware that encryption is happening. It stores blobs, as it already does. | ||
|
|
||
| Each object is encrypted per [FEE](https://github.com/filecoin-project/FIPs/discussions/1253) (chunked AES-256-GCM, STREAM construction) under a fresh per-object content-encryption key (CEK). That CEK is then wrapped twice: |
There was a problem hiding this comment.
Can we define what we mean by "wrapped"? Encrypted by the tenant/Ingot private key - correct?
| 1. S3 PUT arrives at Ingot; authorization validated. | ||
| 2. Ingot generates a fresh CEK. In parallel: | ||
| - Ingot streams the plaintext through FEE encryption, computing the **plaintext CID** inline as the bytes flow. Only the _ciphertext_ is buffered (to disk). | ||
| - Ingot requests the tenant public key from Hilt (through a cache). |
There was a problem hiding this comment.
What is this public key used for? What does it look like to "request" it?
There was a problem hiding this comment.
That's DID resolution. Noted explicitly now.
| 2. Ingot generates a fresh CEK. In parallel: | ||
| - Ingot streams the plaintext through FEE encryption, computing the **plaintext CID** inline as the bytes flow. Only the _ciphertext_ is buffered (to disk). | ||
| - Ingot requests the tenant public key from Hilt (through a cache). | ||
| 3. Ingot builds a FEE header with the tenant as the recipient (and thus containing the tenant-wrapped CEK). |
There was a problem hiding this comment.
I don't understand how this bit works?
How does the key wrapping occur? Ingot does not have the tenant private key.
There was a problem hiding this comment.
We don't need the private key, just the public key. That's why we got it in the previous step. Noted explicitly now.
There was a problem hiding this comment.
Ah, ok it's perhaps worth mentioning at this point that what is happening is a ECDH-ES + AES Key Wrap.
| - Ingot requests the tenant public key from Hilt (through a cache). | ||
| 3. Ingot builds a FEE header with the tenant as the recipient (and thus containing the tenant-wrapped CEK). | ||
| 4. Ciphertext (header ‖ payload) is pushed to Forge via Guppy/Sprue. | ||
| 5. Ingot asks its KMS to wrap the CEK with its own region key. |
There was a problem hiding this comment.
Just to clarify, when we say "region key" here we mean the private key for the Ingot service? Or a different key?
I find it terrifying that if the Ingot key is compromised the attacker can decrypt all data in all buckets for all tenants. I would suggest that we have a per-tenant KEK at minimum but I appreciate that's way more complicated since now Ingot has to have a notion of tenants and be able to map from request -> tenant every time it receives an object read/write request.
It might actually be simpler for Ingot to have a per-bucket KEK, since the bucket (space) DID will be the subject of the delegations we need to read/write into Forge, which we will already have in cache or need to fetch anyway.
| - **Abort / abandonment** must cryptoshred orphaned parts' key rows and queue their blobs for true deletion. | ||
| - **Superseded parts** (re-uploaded part numbers; last write wins at Complete) must shred the replaced part's key row. | ||
|
|
||
| However, multipart itself is a separate concern, and these operations are already built on a Forge-blob-delete operation. We just need to extend the Forge-blob-delete operation in the same way we do to implement an S3-object-delete. |
There was a problem hiding this comment.
Yeah, @frrist is looking into multipart and we've discussed adding a /blob/abort command to Sprue to cancel a blob that has been allocated, (possibly) uploaded, but not yet accepted (the onward invocation to Piri being something like /blob/reject i.e. "not accept").
| 1. S3 GET arrives at Ingot; authorization validated. | ||
| 2. Ingot looks up the object's parts. For each part, it asks its KMS to unwrap the region-wrapped CEK. | ||
| 3. Ingot streams ciphertext from the local Piri, decrypts segment-by-segment in manifest order, and streams plaintext to the client. An auth-tag (tampered data) failure mid-stream terminates the response as an error. | ||
| 4. **Range GET** maps the object range to segment spans via the manifest's cumulative plaintext sizes, then to chunk spans via the stored envelope byte lengths, fetches only the affected ciphertext chunks (byte-range retrieval from Piri), and decrypts only those. |
There was a problem hiding this comment.
Okay, I need some clarification on how this works 😅. What is this "manifest"?
There was a problem hiding this comment.
Oh, sorry, shorthand that got half-reverted. Just means the DB rows that say what's in it.
|
|
||
| ### Deletion (cryptoshredding) | ||
|
|
||
| DELETE destroys every segment's **region-wrap row** in Ingot's DB. Subsequent GET/HEAD return `NoSuchKey`, and the blobs are queued for the future true-deletion mechanism. That's the cryptoshred: the region can no longer read the object, immediately. |
There was a problem hiding this comment.
DELETE...an object, bucket? Can we please clarify what we talking about here upfront?
What is a "region-wrap row"? - Ah, this is the mapping of blob digest to CEK wrapped by the region key that Ingot will have to maintain - right?
There was a problem hiding this comment.
Object. Bucket deletes aren't described here explicitly, but the only relevant part for this RFC should be deleting the objects within the bucket. Unless we add bucket-specific keys, which isn't here yet, but is sounding like a good plan.
| 3. Ingot builds a FEE header with the tenant as the recipient (and thus containing the tenant-wrapped CEK). | ||
| 4. Envelope blob (header ‖ ciphertext) is pushed to Forge via Guppy/Sprue. | ||
| 5. Ingot asks its KMS to wrap the CEK with its own region key. | ||
| 6. On successful push, a manifest row (including the region-wrapped key) is committed. On any failure, the buffer is discarded and nothing is committed. |
There was a problem hiding this comment.
Maybe add a clarification here that the region-wrapped key is NOT added to the FEE. I did not realise this until much later in the document and things got very confusing.
Open Questions
Recommendation: Don't include it. Can't come up with a good reason to. And can come up with some minor reasons not to.
Recommendation: Keep AES. Rely on TLS if we need to transmit CEKs. And if we ever need to change this decision, rotation does not impact the blobs.
Recommendation: Use many keys. A single key is too much risk at a single point of failure. Remaining Open Question: By tenant, or bucket? Our existing system uses a key per tenant (as the Hilt copy will). But the bucket is potentially easier to identify on the request. And the cardinality cuts both ways: a bucket key has a smaller blast radius, but managing keys in the KMS for every bucket is a lot more work.
Recommendation: ??? I'm still not sure. I'm starting to lean towards bucket, though. The object is in the bucket. The bucket currently belongs to exactly one tenant, but that's more likely to flex in the future than a object is to exist in multiple buckets. (Even if two buckets get the same bytes added, they'll be two different objects, with different metadata.) The relationship between the object and the bucket feels stronger to me than between the object and the tenant. But I can't articulate a non-fuzzy argument for it. |
Is it really that much more work to manage keys in KMS for every bucket? Where is the complexity/additional work? BTW, how many buckets will each tenant typically have? If it's 1000 buckets per tenant, and 1000 tenants (FilOne customers - I wish we had that many!), then we are talking about 1M keys - that can easily fit into an in-memory cache. And we don't rotate these keys often (if at all), so the cached copy can have long TTL.
One benefit of using per-bucket keys both in Hilt and Ingot: our architecture will be easier to understand and reason about when all services use the same KEK granularity. |
Agree, and for the open question - I think it is significantly simpler for Ingot to have a per-bucket KEK (not per tenant), since the bucket (space) DID will be the subject of the delegations we need to read/write into Forge, which we will already have in cache or need to fetch anyway. Per tenant requires Ingot to have some notion of tenants, which I think it doesn't currently.
I don't mind too much, per bucket implies that Hilt needs to create (and store) the bucket key. This is how the proposal in the tenant management RFC stands currently, except Hilt throws away the private key after using it to create delegations, but this could easily just be stored instead. I think this is right but @hannahhoward has suggested it could be created by Ingot and delegated to Hilt. I think there's a lot of hurdles with this approach so I'd prefer to keep it as is, and more so if there is support here. |
@alanshaw and I talked today and ended up (put not definitively) back at a single key. Remember that we're talking about a key held in a KMS, and as non-exportable as we can comfortably get from the region provider: either a true hardware module, or a dedicated piece of off-the-shelf software in the Compose (or whatever the appliance ends up as) which knows how to handle keys well. That key is only used to wrap and unwrap keys, and the encrypted keys are in the Ingot DB, colocated with the KMS. Rotating the region key is relatively cheap, so I recommend we do it proactively on a regular schedule. Monthly, maybe? The process requires making a new key and re-wrapping all the keys in the DB. We can do it at our leisure, we just need to have access to the old and new keys at any given time and to record which key we need in the DB row. I understand the worry in having the whole thing sit on a single key, but consider:
That all makes me lean toward a single region key (at any given time, with regular rotation). It would also exactly match what Hilt does when it wraps keys: they're wrapped with a single Hilt key, which should also be rotated regularly. |
|
Another question, especially for @frrist: Do the two DB tables here reduce to just a Parts table, which is really a Blobs table, and just maps blob CIDs to blob CEKs? Everything else should be in the MST. Do we need any of it in Postgres also? Edit: Oh, dang, I hadn't gotten to Appendix C yet. This is just added to |
📖 Preview
UCAN protocol to come next.