TQ: Support for ZFS Key Rotation by plaidfinch · Pull Request #9737 · oxidecomputer/omicron

plaidfinch · 2026-01-28T22:22:51Z

When Trust Quorum commits a new epoch, all U.2 crypt datasets must have their encryption keys rotated to use the new epoch's derived key. This change implements the key rotation flow triggered by epoch commits.

Trust Quorum Integration

Add watch channel to NodeTaskHandle for epoch change notifications
Initialize channel with current committed epoch on startup
Notify subscribers via send_if_modified() when epoch changes

Config Reconciler Integration

Accept committed_epoch_rx watch channel from trust quorum
Trigger reconciliation when epoch changes
Track per-disk encryption epoch in ExternalDisks
Add rekey_for_epoch() to coordinate key rotation:
- Filter disks needing rekey (cached epoch < target OR unknown)
- Derive keys for each disk via StorageKeyRequester
- Send batch request to dataset task
- Update cached epochs on success
- Retry on failure via normal reconciliation retry logic

Dataset Task Changes

Add RekeyRequest/RekeyResult types for batch rekey operations
Add datasets_rekey() with idempotency check (skip if already at target)
Use Zfs::change_key() for atomic key + epoch property update

ZFS Utilities

Add Zfs::change_key() using zfs_atomic_change_key crate
Add Zfs::load_key(), unload_key(), dataset_exists()
Add epoch field to DatasetProperties
Add structured error types for key operations

Crash Recovery

Add trial decryption recovery in sled-storage for datasets with missing epoch property (e.g., crash during initial creation)
Unload key before each trial attempt to handle crash-after-load-key
Set epoch property after successful recovery

key-manager/src/lib.rs

sled-agent/config-reconciler/src/reconciler_task/external_disks.rs

This commit adds a 3 phase mechanism for sled expungement. The first phase is to remove the sled from the latest trust quorum configuration via omdb. The second phase is to reboot the sled after polling for commit the trust quorum removal. The third phase is to issue the existing omdb expunge command, which changes the sled policy as before. The first and second phases remove the need to physically remove the sled before expungement. They act as a software mechanism that gates the sled-agent from restarting on the sled and doing work when it should be treated as "absent". We've discussed this numerous times in the update huddle and it is finally arriving! The third phase is what informs reconfigurator that the sled is gone and remains the same except for an extra sanity check that that the last committed trust quorum configuration does not contain the sled that is to be expunged. The removed sled may be added back to this rack or another after being clean slated. I tested this by deleting the files in the internal "cluster" and "config" directories and rebooting the removed sled in a4x2 and it worked. This PR is marked draft because it changes the current sled-expunge pathway to depend on real trust quorum. We cannot safely merge it in until the key-rotation work from #9737 is merged in. This also builds on #9741 and should merge after that PR.

When Trust Quorum commits a new epoch, all U.2 crypt datasets must have their encryption keys rotated to use the new epoch's derived key. This change implements the key rotation flow triggered by epoch commits. ## Trust Quorum Integration - Add watch channel to `NodeTaskHandle` for epoch change notifications - Initialize channel with current committed epoch on startup - Notify subscribers via `send_if_modified()` when epoch changes ## Config Reconciler Integration - Accept `committed_epoch_rx` watch channel from trust quorum - Trigger reconciliation when epoch changes - Track per-disk encryption epoch in `ExternalDisks` - Add `rekey_for_epoch()` to coordinate key rotation: - Filter disks needing rekey (cached epoch < target OR unknown) - Derive keys for each disk via `StorageKeyRequester` - Send batch request to dataset task - Update cached epochs on success - Retry on failure via normal reconciliation retry logic ## Dataset Task Changes - Add `RekeyRequest`/`RekeyResult` types for batch rekey operations - Add `datasets_rekey()` with idempotency check (skip if already at target) - Use `Zfs::change_key()` for atomic key + epoch property update ## ZFS Utilities - Add `Zfs::change_key()` using `zfs_atomic_change_key` crate - Add `Zfs::load_key()`, `unload_key()`, `dataset_exists()` - Add `epoch` field to `DatasetProperties` - Add structured error types for key operations ## Crash Recovery - Add trial decryption recovery in `sled-storage` for datasets with missing epoch property (e.g., crash during initial creation) - Unload key before each trial attempt to handle crash-after-load-key - Set epoch property after successful recovery ## Safety Properties - Atomic: Key and epoch property set together via `zfs_atomic_change_key` - Idempotent: Skip rekey if dataset already at target epoch - Crash-safe: Epoch read from ZFS on restart rebuilds cache correctly - Conservative: Unknown epochs (None) trigger rekey attempt

Create a new key-manager-types crate containing the disk encryption key types (Aes256GcmDiskEncryptionKey and VersionedAes256GcmDiskEncryptionKey) that were previously defined in key-manager. This breaks the dependency from illumos-utils to key-manager, allowing illumos-utils to depend only on the minimal types crate. The key-manager crate re-exports VersionedAes256GcmDiskEncryptionKey for backwards compatibility.

illumos-utils/src/zfs.rs

sled-agent/config-reconciler/src/reconciler_task/external_disks.rs

sled-agent/config-reconciler/src/reconciler_task.rs

plaidfinch · 2026-02-04T03:14:10Z

illumos-utils/src/zfs.rs

+    /// Returns (exists=true, mounted) if the dataset exists with the expected
+    /// mountpoint, (exists=false, mounted=false) if it doesn't exist.
+    /// Returns an error if the dataset exists but has an unexpected mountpoint.
+    async fn dataset_mount_info(


I renamed this internal function because it was previously named dataset_exists and I wanted to use that name, and it was not a good name for what this function actually returns (DatasetMountInfo).

andrewjstone

This looks fantastic @plaidfinch.

We'll need to test this and possibly coordinate merging it in case it relies on trust quorum being enabled / lrtq upgrade, etc...

illumos-utils/Cargo.toml

andrewjstone · 2026-01-29T18:20:51Z

illumos-utils/src/zfs.rs

 impl DatasetProperties {
-    const ZFS_GET_PROPS: &'static str =
-        "oxide:uuid,name,mounted,avail,used,quota,reservation,compression";
+    const ZFS_GET_PROPS: &'static str = "oxide:uuid,oxide:epoch,name,mounted,avail,used,quota,reservation,compression";


This needs to be formatted.

andrewjstone · 2026-02-04T17:08:15Z

sled-agent/config-reconciler/src/dataset_serialization_task.rs

+                log,
+                "Rotating encryption key";
+                "dataset" => &req.dataset_name,
+                "new_epoch" => new_epoch,


Worth logging the current epoch?

could just add the current epoch to the new logging scope that's created above, perhaps...

andrewjstone · 2026-02-04T17:09:05Z

sled-agent/config-reconciler/src/dataset_serialization_task.rs

+                        log,
+                        "Failed to rotate encryption key";
+                        "dataset" => &req.dataset_name,
+                        "error" => %e,


Worth logging the old and new epochs here?

andrewjstone · 2026-02-04T17:20:57Z

sled-agent/config-reconciler/src/reconciler_task.rs

+    /// Returns true if any rekey operations failed.
+    async fn rekey_for_epoch(&mut self, target_epoch: Epoch) -> bool {
+        // Filter to disks that need rekeying:
+        // - Known epoch < target: definitely needs rekey


I don't see how this would be possible, but I"m curious if we should log an error in case e > target_epoch, just to cover all conditions.

andrewjstone · 2026-02-04T17:22:06Z

sled-agent/config-reconciler/src/reconciler_task.rs

+        }
+
+        if request.disks.is_empty() {
+            info!(


Should this be warn! ?

andrewjstone · 2026-02-04T17:26:45Z

sled-agent/config-reconciler/src/reconciler_task.rs

+                format!("{}/{}", info.disk.zpool_name(), CRYPT_DATASET);
+            match self
+                .key_requester
+                .get_key(target_epoch.0, info.disk.identity().clone())


This can block indefinitely, while waiting to load the secret. I don't think that's a problem but wanted to mention it. I think other things can block in the reconciler also.

andrewjstone · 2026-02-04T17:36:45Z

sled-storage/src/dataset.rs

+
+        // Get the key for this epoch
+        let key =
+            match key_requester.get_key(epoch, disk_identity.clone()).await {


It may be worth it at some point to get the set of known committed epochs when loading the latest key. Loading the latest key decrypts and loads all prior epochs, so this should be cheap. Then if we aborted a bunch for some reason we wouldn't need to make those request. This is just an optimization though and not necessary to do in this PR.

jgallagher · 2026-02-04T16:34:08Z

illumos-utils/src/zfs.rs

    pub compression: String,
+    /// The encryption key epoch for this dataset.
+    ///
+    /// Only present on encrypted datasets (e.g., crypt datasets on U.2s).


Maybe worth noting here that this is not present on datasets that inherit encryption from a parent (if I understand that correctly)?

jgallagher · 2026-02-04T16:39:11Z

sled-agent/config-reconciler/src/dataset_serialization_task.rs

+#[derive(Debug, thiserror::Error)]
+pub enum KeyRotationError {
+    #[error("failed to rotate encryption key for dataset {dataset}")]
+    ChangeKeyFailed {


This is fine as-is, but if this is the only variant this error will ever have, we could shorten it a bit by making it a struct?

#[derive(Debug, thiserror::Error)] #[error("failed to rotate encryption key for dataset {dataset}")] pub struct KeyRotationError { dataset: String, #[source] err: anyhow::Error, }

jgallagher · 2026-02-04T16:40:47Z

sled-agent/config-reconciler/src/dataset_serialization_task.rs

+#[derive(Debug, Default)]
+pub struct RekeyRequest {
+    /// Datasets to rekey, keyed by physical disk UUID.
+    pub disks: BTreeMap<PhysicalDiskUuid, DatasetRekeyInfo>,


I think this is fine but just making sure I understand - we only expect to ever have one encrypted dataset per disk, and any datasets that need encryption will be children of it (and inherit its encryption), right?

jgallagher · 2026-02-04T16:45:07Z

sled-agent/config-reconciler/src/reconciler_task.rs

+        // Ensure we process the initial epoch on startup. Using mark_unchanged()
+        // means that changed() will fire on the first value, allowing us to
+        // catch any missed rekeys from crashes.
+        self.committed_epoch_rx.mark_unchanged();


I don't think we need this now that we moved rekeying into do_reconciliation; we're about to go into the loop which unconditionally starts with a call to do_reconciliation, which means we will immediately try to rekey if necessary. As written this will induce a second do_reconciliation (which is harmless but not useful).

jgallagher · 2026-02-04T16:47:53Z

sled-agent/config-reconciler/src/reconciler_task.rs

+        // Check if any disks need rekeying to the current committed epoch.
+        // We use borrow_and_update() to mark the epoch as seen, so we don't
+        // trigger another reconciliation for the same epoch change.
+        // Note: We must copy the epoch out of the Ref before any await points.


// Note: We must copy the epoch out of the Ref before any await points.

This is true but I think in general we want the time we hold the watch channel to be as short as possible. If we did something like:

let ref = watch_rx.borrow_and_update(); do_a_synchronous_thing(ref); drop(ref); do_an_async_thing().await;

we still keep the lock on the channel held throughout do_synchronous_thing, which we'd prefer to avoid. It's wonderful when watch channels hold Copy types like this one and we can just immediately deref as you have here.

jgallagher · 2026-02-04T16:55:38Z

sled-agent/config-reconciler/src/reconciler_task.rs

+            let dataset_name =
+                format!("{}/{}", info.disk.zpool_name(), CRYPT_DATASET);


Turbo-nit - maybe move this down into the Ok(key) => { ... } branch? That'd minimize the scope of the variable and let us skip the allocation if we don't need it.

jgallagher · 2026-02-04T18:15:09Z

sled-storage/src/dataset.rs

+        let name = format!("{}/{}", zpool_name, dataset);
        let epoch = if let Ok(epoch_str) =
-            Zfs::get_oxide_value(dataset, "epoch").await
+            Zfs::get_oxide_value(&name, "epoch").await


Was this wrong before and always failing?

Ha yes. Claude found this one yesterday and Finch pointed it out to me. It was always wrong, but innocuous because LRTQ doesn't allow key rotations.

jgallagher · 2026-02-04T18:15:54Z

sled-storage/src/dataset.rs

-    let mut command = tokio::process::Command::new(illumos_utils::zfs::ZFS);
-    let cmd = command.args(&["list", "-H", dataset]);
-    Ok(cmd.status().await?.success())
+    Ok(Zfs::dataset_exists(dataset).await?)


At this point should we remove this helper and inline Zfs::dataset_exists wherever it's called?

jgallagher · 2026-02-04T18:16:39Z

sled-storage/src/dataset.rs

+        // loaded keys across process restarts, so if we crashed after a
+        // successful load-key but before setting the epoch property, the
+        // correct key would still be loaded and our load-key would fail.
+        let _ = Zfs::unload_key(dataset_name).await;


What happens if this fails?

jgallagher · 2026-02-04T18:17:30Z

sled-storage/src/dataset.rs

+            "epoch" => epoch,
+        );
+
+        // No need to unload here -- we unload at the start of each iteration


Do we need to unload on the last iteration?

andrewjstone · 2026-02-04T20:43:31Z

We tested this on a4x2 with TQ enabled and keys were created correctly during RSS. We then added a sled and rotation completed correctly. We're going to relaunch with trust quorum disabled and see if we can upgrade out of lrtq successfully and then add a sled in our next experiment.

hawkw · 2026-02-04T21:24:03Z

illumos-utils/src/zfs.rs

+            .await
+            .map_err(|e| ChangeKeyError {
+                name: dataset.to_string(),
+                err: anyhow::anyhow!("{e}"),


i think this should be

Suggested change

err: anyhow::anyhow!("{e}"),

err: anyhow::Error::from(e),

rather than formatting it, as that will eat the structured representation of the error's source chain, while Error::from preserves it.

hawkw · 2026-02-04T21:26:34Z

sled-agent/config-reconciler/src/dataset_serialization_task.rs

+                log,
+                "Rotating encryption key";
+                "dataset" => &req.dataset_name,
+                "new_epoch" => new_epoch,


could just add the current epoch to the new logging scope that's created above, perhaps...

hawkw · 2026-02-04T21:29:10Z

sled-agent/config-reconciler/src/reconciler_task.rs

+            .filter(|i| match i.cached_epoch {
+                Some(e) => e < target_epoch,
+                None => true,
+            })


turbo nit: Option<T: PartialEq> is itself PartialEq, and None compares less than any Some, so this can just be

Suggested change

.filter(|i| match i.cached_epoch {

Some(e) => e < target_epoch,

None => true,

})

.filter(|i| i.cached_epoch < Some(target_epoch)

andrewjstone · 2026-02-05T06:26:34Z

Woohoo. Tested on hardware. LRTQ upgrade succeeded. This required a small patch, which we should pull into this branch: 494d49a

I think it's probably good to go in once the comments are resolved and this patch is pulled in. Will re-assess when fresh tomorrow.

plaidfinch commented Jan 28, 2026

View reviewed changes

key-manager/src/lib.rs Outdated Show resolved Hide resolved

sled-agent/config-reconciler/src/reconciler_task/external_disks.rs Outdated Show resolved Hide resolved

andrewjstone mentioned this pull request Jan 31, 2026

Trust Quorum Tracking #8262

Open

48 tasks

andrewjstone mentioned this pull request Jan 31, 2026

TQ: Support sled expunge via trust quorum pathway #9765

Draft

plaidfinch force-pushed the tq-zfs-key-rotation branch from eb41801 to c070803 Compare February 4, 2026 02:38

plaidfinch force-pushed the tq-zfs-key-rotation branch from c070803 to e45b10d Compare February 4, 2026 02:43

plaidfinch linked an issue Feb 4, 2026 that may be closed by this pull request

TQ: Support for ZFS Key Rotation #9587

Open

plaidfinch force-pushed the tq-zfs-key-rotation branch from 1a9b8cf to 291e41e Compare February 4, 2026 03:10

plaidfinch marked this pull request as ready for review February 4, 2026 03:11

plaidfinch force-pushed the tq-zfs-key-rotation branch from 291e41e to ad6274e Compare February 4, 2026 03:12

plaidfinch requested review from andrewjstone and jgallagher February 4, 2026 03:12

plaidfinch commented Feb 4, 2026

View reviewed changes

andrewjstone reviewed Feb 4, 2026

View reviewed changes

jgallagher reviewed Feb 4, 2026

View reviewed changes

hawkw reviewed Feb 4, 2026

View reviewed changes

		let dataset_name =
		format!("{}/{}", info.disk.zpool_name(), CRYPT_DATASET);

Conversation

plaidfinch commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Trust Quorum Integration

Config Reconciler Integration

Dataset Task Changes

ZFS Utilities

Crash Recovery

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewjstone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewjstone commented Feb 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewjstone commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

plaidfinch commented Jan 28, 2026 •

edited

Loading