[rfd418] discusses how to get from the current Mupdate-driven rack update workflow ([rfd326]) to an automated update workflow. There are already a number of high-priority milestones that automated update is working toward. Meanwhile, we presently remain on CockroachDB v22.1, which is now out-of-support upstream.
To perform an upgrade of an existing cluster, CockroachDB only supports
upgrading between two major versions (such as v22.1 → v22.2). Each major
version supports its own on-disk format as well as the previous major version’s
on-disk format. By default when all nodes of a cluster are upgraded to a
new version, the upgrade process is auto-finalized, and downgrading to the
original version is no longer possible. To control finalization, prior to
upgrading the cluster the cluster.preserve_downgrade_option
cluster setting
must be set. ([crdb-upgrade])
These constraints will eventually need to be supported by the automated update workflow, but we also want to move forward in the short-term with upgrading CockroachDB into supported versions. We discussed a procedure to manually roll out new major versions of CockroachDB while retaining the ability to downgrade if needed during the April 9 Update Sync.
Determination
In version 8 of the control plane, we will introduce a change that sets the
cluster.preserve_downgrade_option
cluster option to the current version of
CockroachDB. We will then use a tick-tock model to perform upgrades to each
subsequent major version of a series of at least two control plane versions:
Tick: Upgrade the version of CockroachDB in the
cockroachdb
zone to the next major version. Verify on a long-term testing environment that CockroachDB downgrades work properly.Tock: Finalize the upgrade by resetting
cluster.preserve_downgrade_option
, and then setcluster.preserve_downgrade_option
to the current version.
This allows deploying a new CockroachDB major version while retaining the ability to downgrade if needed in a future release, until the Tock for that major version is deployed.
In the immediate term, assuming all goes well, the schedule would be:
v8: CockroachDB v22.1 Tock (
cluster.preserve_downgrade_option
is presently unset, so set it)v9: CockroachDB v22.2 Tick
v10: CockroachDB v22.2 Tock
v11: CockroachDB v23.1 Tick
v12: CockroachDB v23.1 Tock
v13: CockroachDB v23.2 Tick
v14: CockroachDB v23.2 Tock
Two considerations to strive for when implementing this: * We shouldn’t add any extra manual steps to the Mupdate process. * We shouldn’t write any new code that we’re going to throw away once Nexus is able to drive rolling CockroachDB upgrades on its own.
When preparing to ship a new release, we should audit the Technical Advisories associated with it.
Open Questions
Can we upgrade to a new patch release between a Tick and Tock release? (Almost certainly, but we ought to test this.)
External References
[RFD 326] The Minimum Upgradable Product (MUP)
[RFD 418] Towards automated system update
[crdb-upgrade] Upgrade to CockroachDB v22.2