357 - External DNS in the MVP / RFD / Oxide

RFD

357

Authors

Updated

The Oxide system must provide DNS names for services like the external API and console so that operators and end users alike can reach these services from web browsers, the official CLI, or other programs that use the API. This RFD documents and fleshes out a long-discussed proposal in which the customer delegates a DNS domain to the Oxide system and the system provides authoritative DNS servers for that domain. These DNS servers, part of the control plane, point customer clients at the services provided by the Oxide system.

See also the DNS-related sections of [rfd267].

The mechanism of propagating updates to our DNS servers is discussed in [rfd367].

Assumptions

Operators and end users need to be able to access the external API and console from web browsers, the official CLI, and other programs that use the API directly. These services are generally secured using TLS. This pretty much necessitates having DNS names for these services that the Oxide system is at least aware of, if not in direct control of.
The operator will want to control at least some part of the DNS name because this is what their end users will see in the browser, etc.
Some customers will want to deploy Oxide systems such that they have no direct connectivity to the internet. Relatedly, some customers may consider internal DNS names and IPs to be private information that they would not want exposed on the internet. (But this is not the only reason they wouldn’t want Oxide systems on the internet.)
In this context, we (Oxide) do not control the DNS clients. Here, DNS clients are those of arbitrary web browsers and third-party operating systems and clients. This is a major difference from [rfd248], the system for service discovery within the control plane.
During initial setup of the Oxide system, the operator will configure at least one IP pool from which the system may allocate IP addresses for services that the system provides on the customer’s network. (See [rfd21], [rfd267].) More concretely, the operator will give at least one block of IP addresses to the Oxide system so that it (the Oxide system) can use these IPs for external DNS servers, external API and console servers, etc.

Determinations

(or: a summary of the proposal)

Operator configuration

As mentioned in [rfd57], during initial setup, the operator will delegate a DNS domain to the Oxide system. The system will use this domain to provide DNS names for the external API and console and potentially other services. The information provided during initial setup includes:

The DNS domain that’s being delegated to the system.
At least two IP addresses on the customer network to be used by the Oxide system for running DNS servers. The details of this configuration are to-be-determined. See [_dns_server_ip_configuration].

In terms of the number N: we suggest 3-10. Addresses are sometimes valuable resources, but too few addresses limits our ability to scale DNS horizontally and to survive transient failure.

Ideally, the customer will have set up the glue records for delegation in their own DNS servers before the Oxide rack arrives on-site. Setting this up is part of the critical path for provisioning the first VM, but it’s not blocked on having the rack on-site.

External DNS servers

The Oxide system will provide authoritative DNS servers for the delegated domain. We’ll call these external DNS servers (as in: serving DNS about the Oxide system to external consumers, not servers that are external to the system).

These external DNS servers will listen on the addresses mentioned above on the customer network.
In implementation, these external DNS servers will look exactly like the internal DNS servers described in [rfd248]. They’ll be first-class control plane components. They’ll be built, packaged, deployed, and managed similarly to Nexus, CockroachDB, or the internal DNS servers. They’ll run the same DNS server software as the internal servers and configured similarly (but with a different set of DNS data). Just like with the internal DNS servers, Nexus will be responsible for the DNS data served by these servers. When the control plane needs to change the DNS records served by these servers, Nexus will explicitly update each server using its HTTP API. This would generally happen through a saga action.
These external DNS servers will be deployed as a separate (internal) service from the internal DNS servers in order to keep the runtime components separate.

Each of the IP addresses should have a DNS server, but in principle these could overlap. This would allow the customer to give us, say, 10 IP addresses for future horizontal scalability. But we could attach these 10 addresses to only three DNS servers if we feel that’s all we need.

External DNS names

Through the external DNS servers, the system will provide one DNS name for each Silo: a DNS name to be used for both the console and external API. Following the pattern in RFD 21, each Silo would get a dns_name that defaults to the name. Then we would generate DNS names $silo_dns_name.sys.$delegated_domain. So if the customer delegates oxide.corp.globex.example and creates Silo eng, the console and API would both be served from eng.sys.oxide.corp.globex.example. These are the DNS names that end users would use to reach the console and external API. The "sys" subdomain distinguishes these names from any names that might be provided under RFD 21 so that cookies would never be accidentally shared across domains.

This DNS name would have A/AAAA records containing IP addresses on the customer network that are used for Nexus instances serving the console and API. Allocating and managing these is beyond the scope of this RFD, but the assumption here is that these come from an IP pool designated by the operator for this purpose. The details of exactly how these records are assembled in response to a given query are to-be-determined. The server may select a subset of A/AAAA records and/or may randomize the order of returned results. These are common (if only weakly effective) techniques for load balancing when using typical DNS clients.^[1] Eventually we will probably also want the ability to configure which Nexus instances are in the pool of externally-accessible ones so that we can isolate instances for debugging or to scale down with the ability to roll back (scale back up) quickly.

Note that the only DNS names provided here are per-Silo. There is no domain proposed for the Oxide system as a whole. That’s partly because a goal of Silos is to virtualize the Oxide system.

To support reverse DNS, we should also serve PTR records for the customer-facing IPv4 and IPv6 addresses used for Nexus and the console. This can likely be punted to post-MVP.

Updating external DNS servers

The set of external IP addresses for the external DNS servers is fixed by the operator configuration described above. If the operator gives us 5 specific addresses for DNS, then those are the only 5 addresses we can use, and we should use all of them (lest clients see failures because they picked an address we weren’t using). But this does not mean there can be only a fixed number of DNS servers or that they have to be updated in-place. We can still use standard blue-green-style deployments to update these servers. To do this, we’d deploy a new external DNS server, get it up-to-date and verify it’s working, then move one of the fixed external IPs from one of the existing DNS servers that we want to decommission from that server to the new one. This still allows a low-risk rollback by moving the IP back to the old server.

External DNS server security

Confidentiality of DNS requests and responses can be provided by things like DNS over HTTPS and DNS over TLS. We don’t believe this is a particular goal for our customer base in the MVP timeframe and we do not intend to implement it now.

Authentication of DNS data can potentially be provided by the same mechanisms or DNSSEC. In the MVP, we’re only providing external DNS for services on HTTP/TLS. TLS already provides end-to-end authentication of the service. We don’t intend to do anything else to authenticate DNS in the MVP timeframe.

TLS keys and certificates

The system will provide APIs to manage the private keys and certificates used for customer-facing services (i.e., the public API and the console). This is deliberately a bit vague. See [_certificate_management] for discussion.

Out of scope for MVP

This proposal does not cover:

General-purpose DNS service for use by customer instances. The DNS servers here will not be recursive and are not intended to be any client’s primary DNS servers.
Providing DNS names for instances or other networking resources as described in [rfd21]. They could be extended to do that, and we may well do that, but it’s not the focus here.

Discussion

Background on DNS delegation

In DNS, delegation is the mechanism by which the owner of some DNS domain (who runs DNS servers for that domain) delegates to someone else (generally running separate servers) authority over some subdomain. Suppose the Globex company wants to deploy an Oxide system for their developer teams. Let’s say they already use corp.globex.example for internal infrastructure and they run authoritative DNS servers for corp.globex.example. They can delegate the subdomain oxide.corp.globex.example to the Oxide system. This means:

The Oxide system is responsible for running authoritative DNS servers for oxide.corp.globex.example.
The Oxide system can create whatever records it wants with that name. For example, oxide.corp.globex.example could have A/AAAA records specifying IP addresses on the Oxide system that serve some generic landing page.
The Oxide system can create whatever records it wants under any subdomains, too. For example, it could create A/AAAA records specifying IP addresses for api.oxide.corp.globex.example.
The Oxide system could even delegate some subdomain to some other DNS servers (though we do not envision doing this).

A delegated domain is often called a zone. But zones are not synonymous with levels of the hierarchy. For example, globex.example, might be one zone containing names corp.globex.example, www.globex.example, and even cypress-creek.corp.globex.example. Or these might all be separate zones. This is up to the administrator of globex.example and the people to whom they delegate these zones.

To set up delegation like this, an administrator for the parent zone corp.globex.example creates two kinds of records:

NS records for oxide.corp.globex.example that provide names for the authoritative nameservers for oxide.corp.globex.example. These are often ns1.oxide.corp.globex.example, ns2.oxide.corp.globex.example, etc.
glue records that provide the IP addresses for these names, since otherwise there would be a circular dependency trying to find the nameservers (e.g., ns1.oxide.corp.globex.example).

This is important for us because it means that if we want the Oxide system to have control over some domain, and if we expect the customer want to pick parent domain, then they need to delegate a domain to us. And that means some work on their side to configure their own nameserver in order to set up the rack.

DNS names and Silos

We’ve generally been assuming that separate Silos will have separate API/console DNS names. This is probably not strictly necessary. Here are the (arguably vague) reasons why this is preferred:

Silos are intended (in part) to look like virtual Oxide systems. End users with access to two Silos ought to see them as two different Oxide systems. They ought to have different DNS names.
Silos can have different identity providers. We cannot even show the user a login page without knowing which Silo they’re trying to log into. We could provide a landing page showing all Silos, but that doesn’t work for use cases where Silos are not supposed to be publicly discoverable. For that use case, the first page that a user lands on must have the Silo somehow in the URL. There are a few options here:
- Put the Silo into a path component (e.g., https://oxide.globex.example/my-silo) for all requests. In various discussions people found this distasteful since most users will only ever see one Silo. Plus, it seems to invite poking around (what can I put instead of my-silo?).
- Put the Silo into a path component only for login requests. For other requests, use the Silo associated with the session from the session cookie. This is marginally better because most requests don’t need the Silo in them. But it doesn’t work for this to be only login requests. It would need to be present for any other endpoint that’s publicly accessible (e.g., maybe listing global images?). This would be kind of a bizarre criterion that seems likely to lead to confusing routes.
- Put the Silo into the DNS name (server hostname). This is potentially more concise. It’s also natural from the perspective that every Silo gets its own console URL served at /. It’s not clear that this is any more secure, but it seems to invite less poking around of the form "what if I make this same request to a different Silo?"? Also, the default cookie behavior will avoid sharing cookies between Silos' console applications. (Though we could always configure the cookies to be restricted by path if we wanted.)

Note that just because Silos have different DNS names does not mean they’d have different IP addresses or Nexus instances.

It seems likely that any of these could be made to work fine.

TLS certificates

Whatever scheme we use for assigning DNS names to our API and console service must account for how we will obtain trusted TLS certificates (and the corresponding private keys) for these services.

Background

TLS is used to secure our HTTP API and console. It’s closely related to DNS because a key function of TLS is cryptographically proving to HTTP clients that they’re talking to an authentic server for a given DNS name. That is, a customer accessing https://oxide.corp.globex.example/ needs to know that they’re really talking to the Oxide system. A server does this by presenting a certificate with a chain of signatures starting with a trusted certificate authority (CA) and ending with a signature from the server itself. Each level of the chain is responsible for verifying the identity of the next one before adding their signature. For the top level, clients ship with a fixed set of trusted public CAs. Some organizations run their own internal CAs, in which case they likely have infrastructure for augmenting the built-in trusted set of CAs with their own CAs' certificates on all the clients that they care about.

To obtain a signed certificate for a service S, typically:

S generates a private/public key pair.
S generates a certificate signing request (CSR) that includes the public key and other metadata (like the domain name) and is signed using the private key.
S sends the CSR to a certificate authority (CA).
The CA verifies the identity of S.
The CA sends back to S a certificate that contains most of the contents of the CSR plus a signature from the CA’s private key.

Traditionally, steps 3-5 above were done manually. ACME is a system for automating this entire process. It supports multiple types of challenges that can be used to prove the requester’s identity. These necessarily require connectivity between the ACME server and the service for which the certificate is being generated. LetsEncrypt is a widely-used public implementation of ACME. Many systems use ACME with LetsEncrypt to obtain trusted certificates and renew them automatically before they expire.

An alarming number of organizations report having experienced "disruptive outages caused by expired certificates". It would seem like a valuable feature if the Oxide system could be fully responsible for managing its own TLS certificates. Unfortunately, this is complicated by our assumptions that (1) customers will want the console and API to be hosted under domains that they control and (2) that a common deployment model will be that none of the rack’s public-facing endpoints (the public API, console, and external DNS) will be accessible from the public internet. This makes it hard for the Oxide system to prove to any public service like LetsEncrypt that it owns the domain using any of the available challenge types. From the perspective of the public internet, it doesn’t.

Separate from all of this, wildcard certificates can be used to authenticate not just one DNS name, but any DNS name immediately under it. A wildcard certificate for *.example.com works for foo.example.com and bar.example.com, but not baz.foo.example.com. These are potentially interesting for us because they can enable us to manage a lot fewer TLS certificates. To the extent that managing each certificate involves the customer (see below), this might be pretty useful. Wildcard certificates are often considered less secure than using separate certificates for separate domains because the assumption is that the certificate is being copied to multiple places and compromise of any of these places winds up compromising all of them. It’s not clear that this risk applies to the Oxide system. These DNS names are already virtualized over a single infrastructure. If we had several certificates (e.g., for Silos or separate services), we’d still be storing them the same way. Maybe one could argue that there could be different administrative privileges for separate certificates so that if someone used their privileges to compromise a certificate, they wouldn’t necessarily get the certificates for other Silos or services?

Certificate management

Regardless of which DNS names we choose and whether we use wildcard certificates, we will need to manage some private keys and TLS certificates. Several approaches have been discussed:

Expose an API (and console flow) for explicitly providing the private key and certificate.
Expose APIs (and console flows) for generating a private/public key pair, generating a CSR to use that pair for a particular DNS name, and uploading a signed certificate for the DNS name.
Build in an ACME client to be used with a public ACME server like LetsEncrypt.
Build in an ACME client to be used with a private ACME server. (This likely is the same as option 3 from our perspective but has very different implications for the customer.)
Build a more complicated flow that involves phone home to an Oxide-run service that can answer ACME challenges from a public ACME server.

These are not mutually exclusive. We could support any of these.

These have various tradeoffs:

With options 1 and 2 (APIs to manage certificates directly), the customer is responsible for the certificates. They need to obtain them, keep track of expiration, and update them. That’s a bunch of work. They may already do this for other certificates in their system. But they still need to build automation (or process) that’s specific to the Oxide system. These options are also the most flexible. Whatever policies the customer may want can be implemented through the certificates they generate or automation (or process) that they build around it. These options also have no runtime dependency on any service outside the customer’s control.
Option 1 is notably less secure than option 2 because the private key is moved from the system that generated it, going against best practices and risking leakage. However, it’s less work for both Oxide and the customer. We’d pick this if we’re prioritizing urgency or if we think that the Oxide-specific work here for the customer would be too much friction for early customers. Option 1 is almost a strict subset of option 2, so it could be a reasonable stepping stone.
Option 3 (use LetsEncrypt) is a great choice for any [hypothetical] customers willing to expose Oxide-system-provided services like external DNS on the public internet. We can completely own the certificate management problem, both creating them and renewing them. The engineering work for Oxide would be moderate. The problem is that it’s fairly risky to expose the Oxide system on the public internet. We know many customers will not want to do this, so this cannot be our only supported option.
Option 4 (use an internal ACME server) is a great choice for customers that already run their own CA and ACME server. Again, we can own the certificate management problem (aside from running the ACME server). Little Oxide-specific work would be required from the customer. But it only works if they do already run their own CA and ACME server.
Option 5 is only a vague idea at this point, inspired somewhat by [plex-tls]. In this option, Oxide could run something like an ACME proxy, ferrying requests from the Oxide system to LetsEncrypt and answering the resulting challenges. This can only work if the system can be configured to phone home (which does not require the same kind of external connectivity as option 3, but does still require some connectivity) and if Oxide’s phone home service can actually answer challenges for the Oxide system’s DNS names. This likely means that the DNS names would need to be public and under Oxide’s control. The easiest way to do this would be to host the console and API under a domain we own like oxide-customer.com. There are many unanswered questions with this idea (including whether our customers would be okay with using a DNS domain they don’t control). If it could be made to work, then again Oxide could own the certificate management problem without requiring that the customer expose the Oxide system on the public internet.

We propose implementing option 1 or option 2 (both being API-driven management of the private keys and certificates) for the MVP because they’re moderate in both our engineering investment and customer integration work and they can support whatever policies a customer might want. Follow-on work to support these other options can be prioritized based on customer feedback.

Which certificates need to be managed?

We’ve proposed DNS names $silo_dns_name.sys.$delegated_domain. Thus, a customer could provide us with one certificate for each $silo_dns_name or a wildcard for *.sys.$delegated_domain. At this time, we don’t have a specific determination here — let’s do what’s easiest to implement. We expect most customers, especially early customers, won’t have a large number of Silos, so using non-wildcard certificates hopefully wouldn’t be a big burden.

Availability

Our external DNS servers must have high availability, which is to say that they must survive at least one and probably multiple failures. The current design uses self-contained servers, so it’s pretty straightforward to deploy quite a few of them.

DNS is a distributed A-P system (in the CAP sense). It’s explicitly eventually consistent, meaning that two clients might see different views of the system at the same time, even if an individual change can eventually be expected to be observed by all clients. This is extremely helpful for implementing the server because it means we don’t need to ensure that Nexus propagates changes to external DNS atomically. We just need to ensure that we keep attempting to update them until we succeed.

Unfortunately, this doesn’t mean DNS is highly available. In an ideal world, any DNS domain would have N + M identical but independent nameservers where N are required to serve peak load, and M of them could fail without any impact to the availability of either DNS itself or any of the services using it (e.g., an HTTP service under this domain). This can be made to work and we intend to deploy things this way for internal service discovery ([rfd248]). However, it relies on specific behaviors from DNS clients that are not widely implemented (e.g., that they query more than one server or retry requests on failure). Unfortunately, the safer assumption is that a client will do the least useful thing (e.g., if there are five NS records, it will choose only one; that one will be the one that’s not up right now; it will wait for several seconds in some synchronous context; and it will not retry the request on failure).

The upshot for us is that we should generally run a working DNS server at every IP address for which there’s an NS record and we should try to minimize the time during which service at those IP addresses is unavailable. To this end, we’ll want the ability to dynamically move these IPs between DNS servers, and we should never remove one of these IPs from a working server unless we’re about to add it to another working server. Instead, the update process should deploy a new server, get it working, then move an IP address from an old server. With luck, this will not be a significant cause of unavailability in the MVP. Future enhancements could use short TTLs and maybe quickly stop advertising nameservers that we believe are offline, though it’s not clear how that will work given that we don’t control the customer’s glue records.

Scalability

In terms of scale, we might consider:

volume of DNS data
volume of read load
volume of update load

The volume of DNS data is likely to be quite small — on the order of a few records per Silo, of which we expect most customers to deploy only a handful in the MVP. We could easily reach thousands of Silos before this was remotely a consideration.

Thanks to DNS’s eventual consistency (see above), read load is often easy to scale horizontally by providing more nameservers or extending TTLs (with some cost in latency-to-update the DNS data). We haven’t rigorously quantified the expected load, but we expect it to be pretty modest. Even hundreds of queries per second would reflect quite a lot of concurrent end users. Experience suggests that that level is easily handleable by a single threaded server storing hundreds of hostnames. If we need to scale out, we can deploy more servers.

The update load is very light. Database updates only happen when an externally-visible service changes IP addresses, which we’d expect only to happen during updates and maybe in response to unexpected failures. The expected volume would be much less than 1 per second.

These numbers may change dramatically post-MVP when we support the [rfd21] instance DNS features. (See [_out_of_scope_for_mvp].) The total DNS data is not likely to be huge and the read load remains horizontally scalable. The mechanism of propagating updates is discussed in [rfd367].

Importance of control over the DNS domain

We assume that where possible, the control plane will want to use standard cloud deployment practices like immutable infrastructure and blue-green deploys. This means that when updating Nexus, we’d deploy new Nexus nodes, verify that they’re functioning as expected, then bring them into service by giving them an IP on the external network and put them into external DNS, then remove old nodes by doing the reverse. To achieve this, we need direct control over the DNS data (it’s modified multiple times in this example).

It also seems pretty likely that we’ll want to expose more services in the future (e.g., an object store). By having the customer delegate a zone to the Oxide system, we preserve the flexibility to add more DNS names or change the DNS names we use for existing services (provided we can do so in a compatible way).

Importance of dynamically-assignable IPs

Similarly, we’ll want to be able to update the DNS servers themselves using blue-green deploys. This can be done in the same way as above for Nexus, provided that we have the ability to attach (and detach) the specific externally-facing nameserver IPs to (and from) the DNS servers.

Note that the number of addresses in-service doesn’t have to match the number of servers. We could conceivably support 12 addresses (for future horizontal scale) that are attached to only 4 instances (3 on each server, if that’s all we need for scalability and availability).

DNS names and Cookies

Per [rfd169], the console uses session cookies to authenticate requests that originate in the user’s web browser. All requests made by the browser on behalf of the console should have the session cookie so that the request can be authenticated. But whether a given cookie is included in a browser request depends on the DNS domain being requested, the DNS domain of the page that issued the request, and the attributes of the cookie. See "define where cookies are sent" in this Mozilla guide for a more detailed description.

As an example, if we were to serve the console and API from separate DNS names like console.my-silo.example and api.my-silo.example, the default behavior would be that cookies set by the server at one of these domains would not be visible by the other. We could use the domain attribute to cause cookies to be sent to all subdomains under my-silo.example, which might be okay because we control those, but it’s a broader scope than needed.

For now, we plan to put the API and console at the same DNS name, so it’s okay if our cookies are strictly available only to the exact same hostname.

DNS Server and External API Networking

While not part of this RFD’s determinations, it seems useful to document our expectations around networking for the external DNS servers and API servers.

Recall that [rfd21] talks about a few kinds of external IP addresses. These external IPs all work broadly the same way: they’re reserved from an underlying IP pool when allocated and assigned to an Instance by having Nexus reach out to boundary services (Dendrite) to point that IP at the corresponding component (by updating the Tofino state table). The difference is around when allocation, assignment, unassignment, and deallocation happen. Of interest here are Ephemeral IPs (which are allocated and assigned when an Instance starts and unassigned and deallocated when an Instance stops) and Floating IPs (which are allocated, assigned, unassigned, and deallocated by explicit API operations, which means the allocation is independent of the lifetime of any particular Instance).

The external DNS servers must be hosted on the specific external IP addresses that have been designated for that purpose, since any change needs to be coordinated with changes to the customer’s DNS configuration. As a result, these addresses will function like Floating IPs: reserved when they’re configured (i.e., during initial setup and again only if an operator changes them) and assigned dynamically to whichever external DNS server(s) are in-service right now.

In principle, we have more flexibility with the external API servers, in that the set of IPs on which we host the external API can change as long as DNS reflects the current set of IPs. As a result, we could use something more like an Ephemeral IP, where we allocate and assign it whenever we want to bring a Nexus instance into service for the external API, and then unassign and unallocate it when we want to remove that Nexus instance from service. This may well be fine. However, the poor behavior of many DNS clients (see [_assumptions]) means we may prefer to reuse addresses when possible, in which case we may prefer to treat these like Floating IPs.

DNS Server IP Configuration

As mentioned in Determinations, operators will need to configure the rack with a set of IPs on which to run the external DNS servers. These would generally come from either their own IP pool or an IP pool provided by the customer for general use by the Oxide control plane.

A few options have been considered here:

The operator literally provides a list of IP addresses. The system finds the IP pool containing each address and verifies that this is the IP pool that was previously designated for general use by the Oxide control plane. At runtime, the system dynamically assigns these IPs to individual DNS servers (see [_dns_server_and_external_api_networking] above).
The operator provides a list of Floating IPs (see [rfd21]). This is logically the same as option 1. But to be an actual Floating IP, they’d have to be associated with a particular VPC and Project, which implies that the external DNS servers are also in some VPC and Project. This may be useful in the long run, but not immediately necessary and it’s not work we want to take on right now.
The operator creates a special IP pool just for the DNS servers and specifies that here, rather than the list of IPs. On the one hand, this extends the idea that IP pools are the way we group IP addresses intended for a specific purpose. However, there are additional semantics on this set of IP addresses: namely, that they must have glue records in the parent DNS server and that we’ll be running a DNS server on each IP. This may violate the principle of least surprise for both end users and people working on Omicron (that adding and removing addresses to this pool has side effects).
The operator specifies the IP pool intended for general control-plane use and a number N of addresses that should be used for DNS servers. This is just a simpler, more restrictive form of option 1, but doesn’t seem to have particular value.

This RFD leaves the specific choice to-be-determined. As of this writing, we intend to proceed with option 1, as it’s expedient and has no major downside. This can be evolved in the future.

Alternatives considered

Customer operates DNS, synchronizing with the Oxide system

Another idea is that we still have a DNS zone as described above, but the customer is responsible for operating the authoritative DNS servers. The Oxide system could still provide external DNS servers. But the intent would be that they’re only used as the source for AXFR/IXFR-style zone transfers to customer-operated DNS servers. For better and worse, this approach shifts responsibility for operating the DNS servers to the customer. (Better because it’s potentially less work for us to scale and operationalize this; worse because it may be a worse product experience.)

Customer is wholly responsible for external DNS

It’s conceivable that we avoid any responsibility for operating external DNS and instead require that customers own this completely. This would considerably complicate the customer experience since they’d have to apply DNS updates whenever the Oxide system wants to modify DNS (e.g., during Nexus software update or to mitigate unavailability (planned or otherwise)). It would also complicate the process of upgrading Nexus because we’d either have to rely on operator action in the middle of the update or give up the benefits of controlling DNS. Finally, we’d also have to give up on providing the [rfd21] features around providing DNS names for things like instances. This seems untenable.

The only upside here is that it could involve minimal development work and the customer could have more control over DNS (e.g., TTLs or the specific DNS names used). The major downsides are potentially much worse customer experience, more development work for software update, and potentially riskier software updates.

Separate DNS names for the console and API

There are logically two externally-facing services:

the externally-facing ("public") API
the web console (which uses the external API)

These are logically pretty separate components. For the console, Nexus mostly just serves static assets. [rfd223] discusses alternative architectures where the web console could even use a different runtime environment.

Above we propose one DNS name $silo pointing to a service that serves both the console and API. We could instead separate this into two separate DNS names: console.$silo and api.$silo. For now, they could resolve to the same set of IP addresses — the Nexus external IP addresses — and we wouldn’t have to change much about Nexus. The main advantage is that it would be easier in the future to separate these components, since outside of Nexus they’d be treated as separate components anyway. This in turn would allow us to scale them independently, for example.

The disadvantage: there’s some complexity to deal with, like: which service do the login endpoints come from? Which one are the login cookies set for? In other words: if the advantage of this is that we could more easily separate these later, that only works because we’re essentially doing the work to separate them now. When we think of this as taking on more work in the MVP timeframe so that we have less to do later, for the particular future where we might want to separate these services, it doesn’t sound so appealing.

Use the same servers for internal and external DNS

The proposal above suggests using identical components for internal DNS (see [rfd248]) and external DNS (this RFD). But they’d be deployed as separate fleets. We could instead deploy a single fleet of servers that operate different domains on different addresses. (That is, we operate the internal domain for queries arriving on the internal network and we operate the external domain for queries arriving on the external network.)

Given that we’re already sharing the software component, there’s not much advantage to this. Instead, by separating the two sets of servers, we get the ability to scale these services independently, plus separate security and fault domains. This way, a denial of service or compromise of the external DNS servers will not necessarily affect the internal ones. This is important, since the external servers are at higher risk (being exposed directly to the customer) and the internal ones are absolutely critical to the availability of the control plane.

Security considerations

To-be-fleshed-out:

The networks on which our external DNS servers will be listening are completely uncontrolled by Oxide. These servers may be at particular risk of exposure to attackers.
There are many ways to operate DNS in ways that unintentionally expose information unintentionally. We should review best practices for operating DNS servers to make sure we avoid exposing these.

Terminology notes

This RFD distinguishes between "public" and "externally-facing". Here, "public" means "directly accessible via the Internet", while "external" and "externally-facing" only refers to something that’s exposed beyond the Oxide system, which usually means on the customer’s network. Externally-facing things are not necessarily public. In practice, we don’t usually are about the distinction because both are highly untrusted. An important case where it matters is that using LetsEncrypt requires that the system have some public (not just externally-facing) service.

This RFD uses the term "Oxide system" to mean what we often call the Oxide rack. Everything here applies to a multi-rack system, though it’s to-be-determined how this will change for the multi-cluster, multi-AZ, or multi-region cases (see [rfd24]).

External References

[RFD 21] RFD 21 User Networking API
[RFD 24] RFD 24 Multi-Rack Oxide Deployments
[RFD 57] RFD 57 Initial Rack Setup
[RFD 169] RFD 169 Console Authentication and Session Management
[RFD 223] RFD 223 Web Console Architecture
[RFD 248] RFD 248 Omicron service discovery: server side
[RFD 267] RFD 267 Customer Network Integration
[RFD 367] RFD 367 DNS propagation in Omicron
[How Plex Is Doing HTTPS For All Its Users] How Plex Is Doing HTTPS For All Its Users

Footnotes

1
More sophisticated DNS clients can do much better. See RFD 248. But we don’t control the clients here. If we need more sophisticated load balancing, we may need an actual load balancer component sitting between Nexus and external clients. We do not expect to need this for the MVP.
View

RFD 357 External DNS in the MVP

Table of Contents