Purpose
By the time we ship the rack, we will not have the full-fledged trust quorum available such that we can derive disk encryption keys from the rack cluster secret as described in [rfd301]. However, we still want to encrypt the entire zpool that owns each U.2 device via enabling encryption on the root dataset.
Threat models
Our threat models with regards to disk encryption escalate in terms of the our security capabilities. In other words, we can only handle weaker attacks until trust quorum is fully implemented. We’ll dictate a few threat models below. In all cases our security goal is to ensure that an attacker cannot recover data on stolen drives.
L1- An attacker is capable of stealing an arbitrary number of disk drives.L2- An attacker is capable of stealing a single sled containing disk drives and is able to boot that sled.L3- An attacker is capable of stealing fewer thanKsleds, whereKis the trust quorum size, and booting them such that they communicate.L4- An attacker is capable of stealing more thanKsleds, whereKis the trust quorum size, and booting them so that they can communicate.
This document proposes a mechanism to thwart threat model L3 by rack
shipment. Threat models L2 and L3 are prevented by some form of trust
quorum. L4 is outside of our current long term threat model including trust
quorum.
Determination
We ended up implementing and shipping the Low Rent Trust Quorum (LRTQ) as
described in the bootstore README.
This meets threat model L3 although it is lacking in other ways that the full
trust quorum will fix.
We are using the storage key derivation described in
RFD 301 section 4,
including the U.2 Drive Info. The input key material placed into HKDF to
generate the output key material (OKM in [rfd301]) comes from LRTQ.
Alternatives
Leave the disks unencrypted. This is unsatisfying, because it will result in complete disk re-encryption. The actual data is encrypted with ZFS specific random key wrapped by our provided key. Rotation allows making our wrapper key stronger without having to re-encrypt all the data.
Derive keys from cryptographically insecure IKM. This satisfies the
L1threat model, but is underwhelming at best.Generate and store a large random number in plaintext on the M.2s as the value used as input to HKDF to generate the
OKM. We can make this cryptographically secure, but it also opens up the ability for an attacker to steal a U.2 drive and an M.2 device and recover the data.Generate and store a large random on the M.2s to use as the salt to HKDF with the key being derived from non-cryptographically secure VPD information. This better, but still allows an attacker to recover plaintext data by stealing an M.2 as well as a U.2 drive and discovering the VPD information.
Upgrade into the Trust Quorum model
We will upgrade from LRTQ into more secure trust quorum schemes in an online manner.
Security Considerations
This whole RFD is based around security compromises made to meet our urgency requirements around rack shipment.
LRTQ does not defend against online attacks where the attacker is on the bootstrap network. Other limitations are described in the bootstore README.