RFD 21
User Networking API
RFD
21
Updated

The networking APIs that we expose to users are at the heart of the Oxide Rack. This RFD is a part of the broader user API in RFD 4 and serves to focus on the networking part of it. First, this document introduces the terminology that we will use. Then it goes through the high-level concepts and the way that different features interact together. Next, it goes through the specific API endpoints and how they operate. Finally, it talks a bit about future direction.

The following RFDs serve as useful background for this document:

RFD 9 compares the various cloud providers and trade offs that we need to think about with respect to the networking design. RFD 24 covers discussions about the coherence domains and scope of various resources and introduces terminology that covers the different groupings of infrastructure. Finally, RFD 4 introduces all of the basic information about the User API which introduces ideas like projects and more.

Terminology

This section introduces the terminology that we will use throughout the user API. How the different pieces fit together will be described in the Concepts section.

  • VPC: A VPC is a virtual private cloud. It represents an isolated network fabric.

  • IP subnet: An IP subnetwork is a specific sub-division of your broader network. It represents a concrete block of either IPv4 or IPv6 addresses and is summarized by a CIDR block.

  • VPC Subnet: A VPC Subnet is the fundamental building block of a VPC. It represents a subnet that you can allocate IP addresses from for networks. A VPC subnet consist of both an IPv4 and an IPv6 IP subnet.

  • VPC Routing Table: A table of entries that determine the next destination of a given IP packet.

  • VPC Firewall: A network firewall is a tool which accepts or denies packets based on a series of rules.

  • Virtual NIC (VNIC): A network interface card that appears inside of a virtual machine instance. A Virtual NIC has a primary IPv4 and IPv6 address associated with it.

  • Floating IP: An IP address that can be dynamically assigned from one instance to another through an API call, though the IP address does not appear inside the instance.

  • Ephemeral IP: A public IP address that is assigned to an instance while the instance is running and is removed when it is not.

  • Internet Gateway: A special entity on the network that provides a means for instances on a VPC to reach the Internet.

  • IP Address Pool: A collection of IP addresses maintained by operators of the environment.

Concepts

Resources in an Oxide rack are organized into projects (see RFD 4) and the vast majority of the networking resources are as well. Each project can be thought of as having its own independent physical network fabrics. Just like in a data center, these network fabrics have their own subnetworks, routers, firewall rules, and are isolated from other networks. We call this a virtual private cloud or VPC.

When a project is created, a default VPC is created as well. Though a project can contain multiple independent VPCs. Each VPC has full access to all of the standard IPv4 private address space ranges, 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. It also has access to IPv6 private address space with a randomly generated prefix based on IPv6 Unique Local Addresses in the fd00::/48 range.

A VPC network isn’t exactly the same as a traditional data center network. Here are some of the things which may be different:

  • The underlying network is virtualized. This means that things like routers and firewalls are taken care of automatically and don’t require a dedicated hardware device to implement. This allows your network’s capabilities to scale with your deployment.

  • By default, each VPC network is independent and cannot communicate with another one.

  • All IP addresses on a VPC are private. They are not routable outside of the VPC. The IP addresses are assigned from private address space.

  • Like a project, a VPC exists beyond a single Oxide rack and exists across an entire region.

  • Some traditional networking concepts like VLANs do not apply. While ARP and NDP work, other broadcast and multicast traffic does not work.

  • By default, an instance is restricted to only using the IP and MAC addresses that are assigned to it.

  • By default, while instances on a VPC can reach the Internet, hosts on the Internet cannot reach instances on a VPC except as part of a flow that originated from that instance within the VPC. This can be changed by allocating a Floating IP to an instance.

VPC Example

Let’s look at an example of a VPC that’s used to implement a blog that’s logically made up of three layers: a load balancer, HTTP/application servers, and a database tier:

Example VPC
Figure 1. An example application in a VPC

Let’s look at the different components. The items listed below correspond to the values in the image.

  1. This represents the entire VPC. A VPC is associated with a project. All of the instances within the VPC are isolated other VPCs on the system and have their own, private set of IP addresses.

  2. This is the first of two VPC Subnets. It has both an IPv4 and IPv6 CIDR block assigned to it. This VPC Subnet contains all of the application traffic and has its own name.

  3. This is the second of two VPC Subnets. It has both an IPv4 and IPv6 CIDR block assigned to it. This VPC Subnet contains all of the databases.

  4. These are all of the individual instances in the 'App' Subnet (2). Each instance has an IPv4 and IPv6 address, though one could design the system such that only the load balancer had both. Each instance has an address from the corresponding IP CIDR blocks that are assigned to the VPC Subnet.

  5. This represents all of the instances that are assigned to the 'DB' subnet. Like in (4), they have addresses that come from their VPC Subnet.

  6. This is the VPC Router. It is a scalable router that is a part of the underlying network fabric. It maintains a set of routes for the entire VPC that ensures the different VPC Subnets can talk to one another and provides a route to the Internet.

  7. The Internet Gateway is a scalable NAT that allows all of the instances in the 'App' and 'DB' VPC Subnets to be able to make outgoing connections to the Internet.

  8. This is an IPv6 Floating IP address. It provides an external IP address, which allows network communication to be initiated from outside of the project (for example, from the Intrnet). The floating IP maps a single external IPv6 address 2600:3c00::f03c:91ff:fe96:a264 to an IPv6 address inside of the VPC. Here, it maps to the load balancer’s internal IPv6 address fd12:3456:789a::32.

  9. This is an IPv4 Floating IP address. It provides similar connectivity as in (8). In this case it maps an IPv4 external address 72.14.186.115 to the load balancer’s internal IPv4 address 10.169.10.30.

To make this clearer, let’s work through some sample flows here. Assume that a user is trying to reach the blog over IPv4 and is doing an HTTP GET request. The HTTP request would first target the IPv4 floating IP 72.14.168.115. When that traffic reaches the Oxide network, the network will translate that into a request that is directed to the 'Load Balancer' instance, 10.169.10.30.

From here, the 'load balancer' would forward it to one of the two HTTP instances, 'HTTP 1' or 'HTTP 2' based on its internal policies. Assuming it chose 'HTTP 2', then it would send that HTTP request to the 'HTTP 2' instance, using either its IPv4 or IPv6 address. The application server will then break down the request and fulfill it. If as a part of it, it needs to access the database cluster, it’ll send a message to one of the databases, which will be facilitated by the 'VPC Router' (6). There may be firewalls in place that may further restrict communication between the instances and the VPC Subnets.

When the database replies, it will reply back to the 'HTTP 2' instance, whose traffic will be routed through the Router. The 'HTTP 2' instance will then reply to the 'Load Balancer'. When the 'Load Balancer' wants to reply to the traffic, it will go through the 'Internet Gateway' (7). The 'Internet Gateway' (7) is responsible for making sure that the reply has the right outgoing IP address (that of the floating IP) and making sure it goes out to the Internet. All traffic leaving the VPC must go through an 'Internet Gateway' (7).

Let’s consider another example, let’s say that 'Postgres 2' want to make a connection to the Internet just to ping 8.8.8.8. In this case, the traffic from 'Postgres 2' would first go to the Router (6). The rules in the router would direct it to the 'Internet Gateway' (7). A NAT session will be established for this communication and the rack will pick an IP address and port to use for the NAT translation from a rack-wide shared pool. After rewriting the packet, when the 'Internet Gateway' (7) replies it’ll rewrite the response it got back from 8.8.8.8 and direct it towards the IPv4 address of 'Postgres 2' 10.168.20.12. That will traverse the 'Router' (6) and then 'Postgres 2' will receive the response.

In addition to these examples described above there are a couple of other things to point out about the VPC:

  • All of the instances on the 'App' and 'DB' VPC Subnets can look each other up using internal DNS.

  • The floating IP address will appear in external DNS. This can be looked up by applications outside of the VPC or even the Oxide fleet.

  • You can use the Firewall to restrict what can communicate with what in the VPC.

Subnets

A VPC is broken into a series of VPC Subnets. A VPC Subnet spans more than one Oxide rack and can operate in an entire availability zone, which is selected when the subnet is created. Each VPC Subnet has an associated IPv4 and IPv6 CIDR block. The IPv4 block must be allocated from one of the IPv4 private address ranges. The largest IPv4 subnet that can be created is /8 and the smallest is a /26 which allows for approximately 64 addresses. A VPC may have multiple VPC Subnets, each with distinct IPv4 address ranges, the only constraint being that VPC Subnets may not overlap. VPC Subnets also have IPv6 address ranges associated with them. These must be be Unique Local Addresses in the range fd00::/48.

When a VPC is created, a default VPC Subnet is created for you. This uses the IPv4 address range 172.30.0.0/22, and a random IPv6 Unique Local Address in the range fd00::/64. This /64 range is allocated out of the /48 prefix allocated to the VPC, which may be chosen at the time the VPC is created, or a random prefix will be assigned at that time.

When an instance is created, it is associated with a VPC Subnet and receives an address from that VPC Subnet. By default, the following communication rules are set up on a VPC Subnet:

  • All instances on the VPC Subnet can talk to one another due to the default firewall rules.

  • A default gateway is created in the VPC Subnet which can be used to route to other subnets or the Internet.

  • An IP address is reserved on the network to act as a private DNS server.

  • All instances can reach the Internet through the default gateway.

When creating an instance, a user may specify an IP address from the subnet to use or allow the system to pick one. In addition, IP addresses in the network may be reserved through the API. A reserved address will not be allocated by the system automatically. It can only be used by explicitly requesting it when provisioning an instance. An instance may use both IPv4 and IPv6 addresses from the VPC Subnet or only a single one.

A number of addresses in a network are used by Oxide to provide rack services. For example, if the subnet were 192.168.1.0/24 and in IPv6 fd12:3456::/64, the following addresses would be reserved:

Table 1. Network IP Address Usage
UseLogical AddressIPv4 AddressIPv6 Address

Network Address

First address in the network

192.168.1.0

fd12:3456::0

Network Gateway

Second address in the network

192.168.1.1

fd12:3456::1

DNS Services

Third address in the network

192.168.1.2

fd12:345::2

Future Use

Fourth address in the network

192.168.1.3

fd12:345::3

Future Use

Fifth address in the network

192.168.1.4

fd12:345::4

Broadcast Address

Last address in the network

192.168.1.255

Not Applicable

VPC Routers

A VPC Router defines a series of rules that indicate where network traffic should be sent depending on its destination. A VPC Router is part of the underlying fabric of the VPC and is different from the routing table found inside of a guest (e.g. the information you get when one runs netstat -rn).

Every rule in a VPC Router contains the following:

  • Name: A name for the rule.

  • Description: A textual description for the rule.

  • Destination: The IP CIDR block that this rule covers. For example, 10.169.10.0/24. However, see also the open questions on Routing Table Destinations.

  • Target: The place that traffic that matches the destination should be sent.

There are a number of different targets. These targets include:

  • An Internet Gateway.

  • A specific VPC Subnet.

  • A particular instance or IP address.

  • A VPC that belongs to another project, which peering has been enabled for.

  • No destination, which says that the traffic should be dropped.

There are two types of routing tables:

  • The VPC-wide System VPC Router

  • Custom VPC Routers which apply to specific VPC Subnets

In each VPC, there is a VPC-wide VPC System Router that is created when the VPC is created. Routes are automatically added to and removed from the System VPC Router. Routes cannot be added or removed directly from this table; however a few entries can be modified. The VPC System Router contains the following types of entries:

Table 2. VPC Route Types
Route TypePurposeDestination TypeModifiable

Default Route

Determines the default destination of traffic, such as whether it goes to the Internet or not.

An Internet Gateway

Yes

VPC Subnet Routes

Routes that are automatically added for each VPC Subnet in the VPC.

A VPC Subnet

No, they are added and destroyed with VPC Subnets.

VPC Peering Routes

Routes that are automatically added when VPC peering is established.

A different VPC

No, they are added and destroyed when VPC peering is established or torn down.

In addition to the VPC System Router, a VPC may contain a number of VPC Custom Routers. The VPC Custom Router is used to provide additional routes and override the behavior of the VPC System Router. Each VPC Subnet may have a single, optional VPC Customer Router associated with it. The same VPC Custom Router can be associated with multiple VPC Subnets.

Rule Ordering

The rules for picking routes are:

  1. Take the route with the most-specific prefix that matches in either the VPC System Router or a VPC Subnet’s VPC Custom Router.

  2. If two rules have the same most-specific prefix, then the one in a VPC Custom Router applied to a VPC Subnet has priority over the VPC System Router.

When searching for a rule to apply, the VPC routing engine will use the rule with the most specific destination. Consider the following VPC System Router:

Table 3. Example VPC System Router Rules
DestinationTargetDescription

0.0.0.0/0

Internet Gateway

Catch all rule to access the Internet over IPv4

::/0

Internet Gateway

Catch all rule to access the Internet over IPv6

10.169.10.0/24

VPC Subnet A

Rule to route IPv4 traffic to VPC Subnet A.

10.169.20.0/24

VPC Subnet B

Rule to route IPv4 traffic to VPC Subnet B.

10.169.20.0/24

VPC Subnet C

Rule to route IPv4 traffic to VPC Subnet C.

fd12:3456:789a::/64

VPC Subnet A

Rule to route IPv6 traffic to VPC Subnet A.

fd12:3456:789b::/64

VPC Subnet B

Rule to route IPv6 traffic to VPC Subnet B.

fd12:3456:789c::/64

VPC Subnet C

Rule to route IPv6 traffic to VPC Subnet C.

If a packet was sent to 10.169.10.5, it would get routed to VPC Subnet A. While VPC Subnet A and the Internet Gateway both match the packet, because the VPC Subnet A route is more specific, it will be taken. If someone were to send a packet to the IPv6 address 2607:f8b0:4005:80b::200e, because it only matches the Internet Gateway rule of ::/0, that is where it will be sent.

Let’s consider a subnet that has a VPC Custom Router with the following rules:

Table 4. Example VPC Custom Router
DestinationTargetDescription

10.169.20.0/24

Drop

Rule to make sure IPv4 traffic to VPC Subnet B is dropped.

fd12:3456:789b::/64

Drop

Rule to make sure IPv6 traffic to VPC Subnet B is dropped.

172.16.0.0/12

10.169.30.33

A Rule to forward all traffic for a VPN to a specific entry point.

Let’s consider that this VPC Custom Router is attached to VPC Subnet C and we’re trying to send a packet from it. First, let’s say we’re trying to send a packet to 10.169.10.5. This is on VPC Subnet A and we take the route from the VPC System Router.

If we look at what happens if we send a packet to 172.16.23.95, the most specific rule is the one in the VPC Custom Router. While there is a rule that applies in the VPC System Router, the one in the VPC Custom Router is more specific. Note, which Router it’s in wouldn’t matter. All that matters is which rule has the most-specific prefix. Because of this rule, the packet will be forwarded to 10.169.30.33 which we presume is running some kind of VPN software and will send the packet over the VPN.

Finally, let’s say we try to send a packet to 10.169.20.5, which is an IP address in VPC Subnet B. There are two rules that already exist for this: one is in the VPC System Router and the other is in the VPC Custom Router. Both of them have the same prefix, so there is no winner via the prefix match. This leads us to apply the second rule, that the VPC Custom Router has priority. This means that the packet will be dropped.

Internet Gateway

An Internet Gateway provides instances in a VPC access to the Internet and acts in a similar fashion to a traditional source NAT that one would find in a home network. When a VPC is created, an Internet Gateway is created by default as well. The default Internet Gateway shows up in the VPC System Router by default. A subnet without an Internet Gateway cannot route to the Internet on its own.

It is possible to create additional Internet Gateways and to associate them with VPC Custom Routers. This allows you to be able to have no outbound access by default, but to allow certain subnets to have access.

Each Internet Gateway is associated with a pool of external addresses that may be shared with other VPCs and projects.

However, there are many cases where applications need to deal with remote services that allow and deny access based on the IP address. To deal with this, a number of Floating IP addresses may be allocated and associated with an Internet Gateway for traffic within a single availability zone. When using this mode, the system will not dynamically scale up the number of IP addresses associated with the NAT as that would otherwise defeat any IP address based filtering.

An additional use of this mode is to have no IPs associated with the Internet Gateway, causing all outbound traffic to be dropped. As described in the next section, an instance with a floating IP address will use that when making outbound connections. See the next section for more information.

External IPs

By default, all networking for instances on a VPC is private to that VPC. This means that while instances can make requests to the Internet, they cannot receive inbound connections.

The system provides two different ways to allocate an external IP address:

  1. Ephemeral IPs: A external IP address that is temporarily assigned to an instance while it is running. Its life time is tied to the state of the instance.

  2. Floating IPs: A floating IP is a permanent object that can be attached and detached from an instance. Its life time is separate from any instance it is attached to.

When an instance has a Ephemeral or Floating IP address, it will not be visible inside of the guest. This means that it will not show up in tools like iconfig or ip addr. Instead, the Ephemeral or Floating IP address will act as a 1:1 NAT. Basically all traffic to and from that IP address will be forwarded to that particular instance and its primary interface. When the instance replies to that address, it will also be translated back into the external Ephemeral and Floating IP address.

To support an Ephemeral or floating IPv4 address, the instance must have a corresponding private IPv4 address on its primary interface. Similarly, to support an Ephemeral or Floating IPv6 address, the instance must have a corresponding IPv6 address.

To use an Ephemeral or Floating IP address, an instance must be in a subnet with an Internet Gateway. This helps maintain the simple rule that if an instance does not have an Internet Gateway, it cannot reach the Internet at all. The Internet Gateway will ensure that all traffic from that instance uses its Ephemeral or Floating IP when making new outbound connections and when replying to traffic. If an instance has more than one Floating IP address, then the Internet gateway will use all of the associated addresses when making outgoing connections.

Ephemeral and Floating IPs can be assigned and removed while the instance is running. An Ephemeral IP may be changed to a Floating IP through the API and a Floating IP can be changed back to an Ephemeral IP as well. Neither of these operations interrupt the connectivity of the IP address.

Ephemeral IPs

An Ephemeral IP is an IP address that is temporarily assigned to an instance while it’s running and released when the instance is stopped or destroyed. An Ephemeral IP is perfect for instances that need external connectivity, but the actual IP address isn’t important.

Instances with Ephemeral IPs assigned to them are always advertised in external DNS using the instance scheme. The name will not change regardless of whether or not the underlying Ephemeral IP address associated with the instance changes.

An instance can either have a single external IPv4 address, a single external IPv6 address, or both a single external IPv4 and IPv6 address configured.

Floating IPs

A Floating IP address is an external IPv4 or IPv6 address that can be moved between different types of objects. For example, a Floating IP could be used to represent a service. As the service is upgraded new instances are created and destroyed, and the Floating IP address can be moved along side it. This allows consumers to have a consistent address and name.

A Floating IP address can be assigned to multiple different types of things, including instances, Internet Gateawys, and future looking objects such as load balancers.

Floating IPs have their own DNS names and schemes. They do not show up in an instance’s external DNS, but rather in the Floating IP DNS scheme. Multiple Floating IPs can share the same DNS name. This creates a single DNS entry with multiple records.

IP Pools

An IP pool is a collection of external addresses that are maintained by operators. There are separate pools for IPv4 and IPv6 addresses. A given pool may have more than one IP CIDR block inside of it. IP pools may be made available to all projects or they can be restricted to a specific project.

When allocating an Ephemeral IP, Floating IP, or an Internet Gateway, an IP Pool may be optionally specified, which will cause the corresponding IP address to come from those in the IP Pool. Note, an IP Pool is never used for addresses from VPC Subnets that are used by guests.

DNS

The Oxide API services provide DNS servers for resolving the names of instances to their underlying IP addresses. There are two different types of DNS servers that are provided:

  1. Internal DNS servers that advertise VPC addresses.

  2. External DNS servers that advertise floating IP addresses.

Types of Records

The following types of DNS records are supported for an instance:

  • A records: These map a host name to an IPv4 address.

  • AAAA records: These map a hostname to its corresponding IPv6 address.

  • PTR records: These map an IP address back to the corresponding host name. While less common, these are often required for mail servers and other applications.

For a discussion of other types that could make sense, see the DNS record type design discussion.

Record Structure and Visibility

DNS for Instances

When an instance is created, it is automatically registered in Internal DNS. In this case, the primary IPv4 and IPv6 addresses are registered as A and AAAA records. Internal DNS exists on a per-VPC basis. Using the network’s DNS servers, an instance is always able to resolve any address on the VPC. If an instance is not on that VPC, it will not be able to resolve names outside of that VPC.

When an Ephemeral IP address is assigned to an instance, then that instance will appear in external DNS. Names in external DNS are accessible outside of the Oxide environment by other applications. A DNS A record is created whenever an IPv4 Ephemeral IP address is assigned and a DNS AAAA record is assigned whenever an IPv6 Ephemeral IP address is assigned.

Names in DNS follow the same structure, regardless of whether or not they are being used internally or externally. This structure is:

..inst....
  • <instance> refers to the DNS name of the instance

  • <az> refers to the DNS name of the availability zone

  • <vpc> refers to the DNS name of the VPC

  • <project> refers to the DNS name of the project

  • <org> refers to the DNS name of the organization

  • <suffix> refers to the DNS suffix that is used. For internal DNS this is always .internal. For external DNS, this varies based on the installation.

Let’s look at an example. Here are two names that refer to the same instance. One is in internal DNS and one is in external DNS:

glorfindel.us-east-1.inst.gondolin.noldor.tolkien.internal
glorfindel.us-east-1.inst.gondolin.noldor.tolkien.oxide.fingolfin.org

Here glorfindel is the DNS name of the instance. us-east-1 is the DNS name of the availability zone. gondolin is the DNS name of the VPC, noldor is the DNS name of the project, and tolkien is the DNS name of the organization. The first DNS host name is the name in internal DNS, which is why it has the .internal suffix. The second name is the one in external DNS and oxide.fingolfin.org is the suffix. The DNS suffix is specific to an installation.

In all of the above objects, we explicitly said it was the DNS name. The DNS name is a separate name for each object that defaults to the object’s name. DNS has some additional constraints in terms of naming that aren’t always there for the main name attributes. In addition, it’s important that renaming something that users see and interact with on a regular basis doesn’t impact the names that machines are using unless intended.

When two VPCs have been peered together, subnets that are shared will show up in DNS with the corresponding names that match that project.

DNS for Floating IPs

Floating IPs are automatically registered in both Internal and External DNS, similar to the instances. However, the same (external) address is always advertised. A DNS A record is created for IPv4 Floating IP addresses and a DNS AAAA record is created for IPv6 Floating IP addresses. Multiple Floating IPs can have the same DNS name. This allows you to create a single hostname with multiple IPv4 and IPv6 records. This can be used to create round-robin DNS or to have both IPv4 and IPv6 support for the same name.

Floating IPs use a similar, but slightly different scheme from instances:

.fip....
  • <name> refers to the DNS name of the Floating IP

  • <vpc> refers to the DNS name of the VPC

  • <project> refers to the DNS name of the project

  • <org> refers to the DNS name of the organization

  • <suffix> refers to the DNS suffix that is used. For internal DNS this is always .internal. For external DNS, this varies based on the installation.

The main difference is that there instead of the <instance>.<az>.inst prefix you have <name>.fip.

More concretely, let’s say we had the name blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org. Like in the previous example, here blog is the DNS name of the floating IP. There may be more than one floating IP with the same DNS name, allowing for more than one record to be associated with the name (for example both an IPv4 and IPv6 address). gondolin is the DNS name of the VPC, noldor is the DNS name of the project, and tolkien is the DNS name of the organization. Unlike with instances, there is only ever an external suffix, there is no internal one. Therefore oxide.fingolfin.org is the installation-specific suffix.

Let’s say that this corresponded to the example VPC. If both the IPv4 and IPv6 floating IP addresses from the example had the name 'blog', and we ran the host command on this name, here’s what we’d expect to see:

rm@elbereth ~ $ host blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org
blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org has address 72.14.186.115
blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org has IPv6 address 2600:3c00::f03c:91ff:fe96:a264

Note how both the IPv4 and IPv6 address show up. If instead of one IPv4 and IPv6 floating IP address, we instead had three IPv4 addresses, we would instead see:

rm@elbereth ~ $ host blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org
blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org has address 72.2.112.194
blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org has address 165.225.172.11
blog.fip.gondolin.noldor.tolkien.oxide.fingolfin.org has address 165.225.164.26

VPC Firewalls

A VPC Firewall is a tool that can be used to limit what instances can and cannot talk to other instances. Each VPC has its own independent VPC firewall. A VPC firewall is made up of a series of rules. There is one set of rules for incoming traffic and a second set of rules for outgoing traffic.

The VPC firewall is a stateful firewall. This means that when a connection is established due to allowed rules, there don’t need to be explicit rules going in the other direction. Consider the case where inbound traffic is denied, but outbound traffic is allowed. Then an instance makes a connection outbound, it still expects to receive some amount of inbound traffic. Because the firewall is stateful, an exception is made to allow that specific inbound reply back in.

Each firewall rule has the following attributes:

  • Status: The rule’s current status. The rule can be enabled or disabled. This allows a rule to be manipulated without deleting it.

  • Direction: Either inbound or outbound to indicate which direction the rule applies to.

  • Target: Indicates the group of instances that the rule applies to.

  • Filters: Indicates a reduction of the Firewall rule. For the rule to apply, it must pass the filter. The filter can cover:

    • The Source (inbound) or Destination (outbound) addresses.

    • The IP protocol of the traffic. This could be, for example, TCP, UDP, or ICMP.

    • The ports that the traffic is using. For example, HTTP traffic commonly uses TCP port 80 or SSH uses TCP port 22.

  • Action: Describes what to do when a rule matches. There are two options: allow and deny. The former allows traffic through and the latter drops traffic.

  • Priority: A value between 0 and 65535. Rules are ordered according to priority. The rule with the lowest priority is evaluated first. The default priority is 1000.

When specifying a target or a filter’s source or destination, it may be any of the values specified in Target Strings.

For filters, the use of explicit IP addresses and blocks is not recommended. Using tags, subnets, and instances, where possible will ensure that as an instance’s IP addresses and NICs change that nothing slips through firewall rules. It also allows most rules to target both IPv4 and IPv6 by default without having to think about whether the instance is using one or the other or updating a rule when that changes.

The priority of the rule dictates its evaluation order. Rules are sorted according to priority and rules with the lowest priority are evaluated first. The first rule that matches is used. Unlike routing, the prefix does not matter, only the priority. If there is both a matching allow rule and deny rule at the same priority then the deny rule takes priority.

Default and Implicit Rules

Each VPC has several implicit rules that cannot be deleted. These implicit rules operate with a priority of 65535, meaning that they are evaluated last. These rules are:

  • Implied allow outbound: This rule allows all instances to communicate outbound over both IPv4 and IPv6.

  • Implied deny inbound: This rule causes all inbound traffic to instance to be dropped.

The following are rules that are added by default when a VPC is created. These rules have a priority of 65534, meaning that they are evaluated before the implied rules, but all other rules have priority.

  • Default allow internal inbound: This rule allows inbound traffic to all instances on the VPC as long as it originated from the VPC.

  • Default allow ssh: This rule allows inbound TCP connections on port 22 from anywhere.

  • Default allow ICMP: This rule allows inbound ICMP traffic from anywhere. This allows network tools like ping to work.

  • Default allow RDP: This rule allows inbound TCP connections on port 3389 from anywhere. This allows Windows remote desktop to operate.

Traffic that is Always Blocked

Some classes of traffic are always blocked, no matter what the source or destination. The only Ethernet protocols that are allowed on the network are:

  • IPv4

  • ARP

  • IPv6

Traffic that does not match one of the three types, will be dropped. There are no restrictions on the IP protocol that is sent. It can be TCP, UDP, ICMP, or anything else.

By default, instances are not allowed to spoof the IP address or MAC address that they are sending from. If they do not match what has been assigned to the instance and the interface, then that traffic will be dropped. There are some use cases such as software based routers that want to forward packets with different addresses. This can be enabled on a per-instance basis.

Rule Debugging

A challenging aspect with Firewalls is to understand their efficacy. The API should provide a means of evaluating whether a given packet would make it to an instance. See the additional discussion of Firewall Flow Logging in the future directions. Some of that may be useful sooner.

Examples

Let’s take the example that we used at the start and see how we might use firewall rules to further constrain how different things can talk to one another. The following table summarizes the firewall rules that we’re going to apply. These rules are ordered in their priority order.

Table 5. Example VPC Firewall Rules
Rule NumberDirectionActionTargetFilters: IPsFilters: ProtocolFilters: PortsExplanation

1

Inbound

Allow

VPC Subnet 'DB'

VPC Subnet 'DB'

All

All

This is a rule that allows all of the databases to talk to one another.

2

Inbound

Allow

VPC Subnet 'DB'

Tag: HTTP

TCP

5432

This rule allows instances with the HTTP tag, being the 'HTTP 1' and 'HTTP 2' servers to connect and query the databases.

3

Inbound

Allow

Instance 'Load Balancer'

All

TCP

Ports 80, 443

This rule allows the load balancer to receive HTTP and HTTPS traffic from outside of the VPC and from the broader Internet.

4

Inbound

Allow

VPC Subnet 'App'

Entire VPC

All

All

This rule allows any instance in the 'DB' and 'App' Subnets to make connections to an instance in the 'App' Subnet.

5

Inbound

Deny

All

All

All

All

This is the default inbound-deny rule that stops all other inbound traffic from functioning. This is the lowest priority Inbound rule.

6

Outbound

Allow

All

All

All

All

This is the default outbound-allow rule that allows all other outbound traffic to function. This is the lowest priority Outbound rule.

In this example, we have a number of rules in place that allow inbound connectivity between different groups of instances. If we look back at the original flow we described, during the VPC Example these rules allow for traffic to flow in a natural way. Here is a bit more information on some of these rules.

First, you’ll notice that there are a lot of inbound allow rules, but there aren’t many outbound allow rules. This is because of the default rules which are rules 5 and 6. The default firewall rules allow all outbound traffic to be allowed, but all inbound traffic to be denied. Therefore, we have a number of exceptions to that default deny rule which have a higher priority.

Take rule 3. This rule allows the load balancer to receive inbound connections from anywhere, but only on TCP ports 80 and 443, which are the common ports for HTTP and HTTPS. Without this rule, the load balancer wouldn’t be able to receive traffic. The filter ensures that it’s only on the ports that this should be allowed on. If some other service was started on the load balancer instance, it wouldn’t immediately be accessible to the rest of the world. Similarly, because the target of rule no 3. is the load balancer instance, this doesn’t apply to any other instances.

Rules 1 and 2 lock down the 'DB' Subnet in two different ways. First, Rule 1 allows the instances from within the 'DB' Subnet to be able to talk to each other on any port. However, if you’re outside of the 'DB' Subnet you could not talk to any of the instances within it. This would make it rather hard for the databases to actually serve their traffic.

This is where Rule 2 comes into play. It allows inbound access to the databases, but with a rather constraining filter. Because TCP port 5432 is the main port for Postgres, the filter constrains all communication to that port. The other part of the filter is on a tag. Instances can be grouped together with a tag (See RFD 4 for more on tags).

Using a tag we can collect all of the HTTP instances together. The difference between using the tag and using the whole 'App' VPC Subnet is the 'load balancer'. In this case, we don’t want the load balancer to be able to talk to the databases directly, so we use a more constrained set. By using a tag, if we expand the set of instances needed for serving HTTP traffic, we can add the tag to them and know that the firewall rules will automatically adjust. The same is true if we delete an instance with a tag.

Finally, the 4th rule allows targets all instances in the 'App' Subnet. Here, it allows all instances in the entire VPC to connect to any instance in the 'App' Subnet.

Based on these rules here are various connections that will be denied and result in the traffic being dropped:

  • The load balancer trying to talk to any of the databases.

  • The HTTP/Application servers trying to talk to anything other than TCP port 5432 on the databases.

  • External communication reaching anything other than the load balancer (which could happen if the Floating IPs were moved or an Ephemeral IP were added).

VPC Peering

While each VPC is independent and private, there are times that you want to be able to privately share traffic between two VPCs between different projects and organizations within a region without going out over public IP addresses.

For example, a company that provides a database as a service may have have a VPC or subnet per customer. In an ideal world, they would be able to peer that VPC or subnet into their customer’s private networks. An example of this is discussed in RFD 9 Networking Considerations.

To facilitate this, two VPCs can be peered together. When two VPCs are peered together, for each subnet that’s been peered, the following happens:

  • A VPC Peering Gateway is created that represents the remote VPC that can be used in VPC Custom Routers.

  • A route is added for that subnet to the VPC System Router that uses the VPC peering gateway.

  • DNS requests for the remote subnet using the DNS scheme that refers to the far side’s VPC, Project, and Organization will function. Note, the local DNS search path will not be modified to include searching that.

  • One has the ability to specify a remote VPC as the filter of VPC Firewall rules.

When a VPC peering relationship is established, subnet propagation can happen in two different modes:

  1. Automatic mode: All subnets in a VPC are shared. As subnets are created and destroyed, they will automatically be shared or be removed.

  2. Manual mode: Each side can manually specify the subnets that they wish to share.

The system will not allow two subnets from either end of a VPC to be shared if they overlap in their blocks. The use of IPv6 ULA /48 prefixes on a per-VPC basis should eliminate this possibility; however, it is easy to end up in this situation with IPv4 and the default subnet. Automatic mode cannot be enabled if there is existing overlap.

When a subnet is removed, the routes, firewall rules, and ability to make DNS requests about that subnet are removed. When a peered VPC relationship is removed, then all other information and rules about it are removed as well.

Currently, a VPC can only be peered to one other in this fashion. See also the discussion of future directions for VPC Peering and VPNs.

Inside the Instance

This section describes the behavior of an instance with respect to the network.

VNICs

Inside an instance, you will find one or more Virtual NICs (VNICs). VNICs show up to the operating system as a normal PCI Express Network Interface Card and show up in normal networking tools like ifconfig, ip link, ipadm, and the Powershell Get-NetIPAddress.

While this behaves the same as a normal device with a few exceptions:

  • The interface will always appear up.

  • The speed of the interface is not a reflection of the actual speed of the link (there is none, because the NIC is virtual).

  • Certain commands and tools that ask for features of Ethernet (such as link advertisements, auto-negotiation configuration, or blinking a NIC’s LEDs) will not function the same way and will likely fail.

On each Interface inside of the instance, there will be an IPv4 and/or an IPv6 address depending on the instance’s configuration. These addresses will always come from the same subnet.

DHCP and Options

The per-instance IPv4 and IPv6 addresses will be supplied over DHCP for both IPv4 and IPv6. A number of additional values will be transmitted over DHCP. These include:

  • The interface’s MTU, which is always 1500 currently. See the MTU design discussion for more on that.

  • The instance’s hostname, which comes from the instance’s DNS name.

  • The instance’s DNS domain name, which is based on the VPC Subnet.

  • The DNS Domain search domain, which is based on the VPC Subnet.

  • The NTP server for the instance, which is provided by Oxide.

  • The default gateway for all traffic, which is based on the VPC Subnet.

Currently, there are no plans for custom DHCP options on a per-subnet or VPC basis. However, to better assess if this is required, there are several questions in the DHCP and Options questions section. The design of transmitting several things over DHCP is such that if we want to add that in the future, it is easy to.

With the exception of the instance’s host name and possibly the MTU, all of the other properties are properties of the system and not expected to change. For the instance’s host name updates to be propagated into the instance, DHCP must be restarted. For this to happen before the DHCP lease expires, tools inside of the guest instance may be required.

Routing and Gateways

Inside of instances, we are currently planning on borrowing from GCP and giving instances an off-link gateway and giving the smallest address allocation (a /32 for IPv4 and a /128 for IPv6). This uses the RFC 3442 Local Subnet Routes feature.

While it requires a bit more knowledge from the guest, this minimizes the amount of ARP traffic that the guest sends and makes sure that all traffic is always destined for an IP address that we control. Making it easier to deal with and capture packets for firewall rules, routing, and more.

Critically, this means that the only route that an instance needs is that of the off-subnet gateway, which will be the VPC Subnet’s gateway. As the VPC System Router or the VPC Subnet’s VPC Custom Router is updated, no changes need to be propagated to individual instances.

The Primary and Multiple Interfaces

An instance may have additional interfaces inside of it. Each interface can belong to a different VPC Subnet from the same VPC. An instance’s networking cannot span more than one VPC.

The first interface is considered the 'primary' interface. The primary interface has several properties:

  • Only the primary interface’s IP addresses show up in Internal DNS.

  • Ephemeral and Instance IPs are always forwarded to it.

  • It is the only interface to receive a default route over DHCP.

Additional interfaces from different subnets can also be allocated. These secondary interfaces will not be registered in DNS, though all firewall rules targeting them will be applied.

See additional discussion on the design of Hot-plug of Interfaces and IPs and future direction on Multiple IPs per Interface.

API

The networking APIs and routes are broken into two categories:

  • VPC-Scoped API Routes

  • Project-Scoped API Routes

While all resources are always owned by a corresponding project, several of the resources and routes are specific to a VPC, rather than the project as a whole. VPC-scoped routes are always under the /projects/{project_name}/vpcs hierarchy, while the other networking functions are found under the top-level project API route /projects/{project_name} and represent resources that can be used across multiple VPCs in the same project, such as floating IPs.

VPC-scoped API Route Summaries

Table 6. API Route Summary: VPCs
VerbRoutePurpose

GET

/projects/{project_name}/vpcs

List all VPCs in a project

POST

/projects/{project_name}/vpcs

Create a new VPC in a project

GET

/project/{project_name}/vpcs/{vpc_name}

Get details about a specific VPC

PUT

/projects/{project_name}/vpcs/{vpc_name}

Update details about a specific VPC

DELETE

/projects/{project_name}/vpcs/{vpc_name}

Delete a VPC

From here on, all APIs are rooted under the VPC, so there is an implicit /projects/{project_name}/vpcs/{vpc_name} at the start of every route.

Table 7. API Route Summary: Subnets
VerbRoutePurpose

GET

/subnets

List all VPC Subnets in a VPC

POST

/subnets

Create a new VPC Subnet

GET

/subnets/{subnet_name}

Get details of a VPC Subnet

PUT

/subnets/{subnet_name}

Update details of a VPC Subnet

DELETE

/subnets/{subnet_name}

Delete a VPC Subnet

GET

/subnets/{subnet_name}/ips

List IP Addresses on a VPC Subnet

GET

/subnets/{subnet_name}/ips/{ip_address}

Get information about an IP Address on a VPC Subnet

PUT

/subnets/{subnet_name}/ips/{ip_address}

Update details about an IP Address

Table 8. API Route Summary: VPC Routers
VerbRoutePurpose

GET

/routers

List all of the VPC Custom and System Routers

POST

/routers

Create a new VPC Custom Router

GET

/routers/{router_name}

Get all of the details of the VPC System or VPC Custom Router

PUT

/routers/{router_name}

Update the details of a VPC Custom Router

DELETE

/routers/{router_name}

Delete a VPC Custom Router

GET

/routers/{router_name}/routes

List all of the routes on a given router.

PUT

/routers/{router_name}/routes

Create a new route in the specified VPC router.

Table 9. API Route Summary: Firewalls
VerbRoutePurpose

GET

/firewall/rules

Return the VPC’s firewall rules.

PUT

/firewall/rules

Update the VPC’s firewall rules.

Table 10. API Route Summary: Internet Gateways
VerbRoutePurpose

GET

/gateways/internet

List Internet gateways

POST

/gateways/internet

Create a new Internet gateway

GET

/gateways/internet/{inetgw_name}

Get information about the specified Internet Gateway

PUT

/gateways/internet/{inetgw_name}

Update information about an Internet Gateway

DELETE

/gateways/internet/{inetgw_name}

Delete the specified Internet Gateway

Table 11. API Route Summary: VPC Peering
VerbRoutePurpose

GET

/gateways/vpcs

List VPC Peering gateways

POST

/gateways/vpcs

Create a new VPC Peering gateway

GET

/gateways/vpcs/{vpcgw_name}

Get information about the specified VPC Peering gateway

PUT

/gateways/vpcs/{vpcgw_name}

Update information about a VPC Peering gateway

DELETE

/gateways/vpcs/{vpcgw_name}

Delete the specified VPC Peering gateway

GET

/gateways/vpcs/{vpcgw_name}/peered/subnets

List VPC subnets that have been shared by peering

GET

/gateways/vpcs/{vpcgw_name}/peered/subnets/{subnet_name}

GET details about a VPC Subnet that has been peered

PUT

/gateways/vpcs/{vpcgw_name}/peered/subnets/{subnet_name}

Update information about a Peered VPC subnet

Project-scoped API Route Summaries

Table 12. API Route Summary: Floating IPs
VerbRoutePurpose

GET

/projects/{project_name}/floating-ips

List all floating IPs currently assigned to the VPC

POST

/projects/{project_name}/floating-ips

Allocate a new floating IP to the project

GET

/projects/{project_name}/floating-ips/{ip_name}

Get information about the specified floating IP.

PUT

/projects/{project_name}/floating-ips/{ip_name}

Update information about the floating IP, such as the instance or Internet gateway it is assigned to or DNS information.

DELETE

/projects/{project_name}/floating-ips/{ip_name}

Delete the specified floating IP.

GET

/projects/{project_name}/ip-pools

List all IP Pools that are available to a project.

GET

/projects/{project_name}/ip-pools/{pool_name}

Get information about a specific IP Pool.

Table 13. API Route Summary: Utility Functions
VerbRoutePurpose

GET

/search/ips

Look up what resources refer to the specified IP address in the project.

General API Concepts

Target Strings

In many places in the networking APIs, there is the ability to specify a target string. This occurs in VPC Router’s entries, VPC firewall rules, and VPC firewall filters. These target strings are made up of two pieces, a resource scope and a resource identifier. For example "vpc:my_vpc", "subnet:db", "ip:192.168.1.1/32", etc. In these cases 'vpc', 'subnet', and 'ip', are the resources scope, and the other part after the ':' are the resource identifiers. Not all HTTP objects support all of the resource scopes.

In addition, when using lists of resources in the API, we use a similar target string style specification to provide us a good way to extend the types of resources that may be listed in a resource as the system evolves.

The following types of resource scopes and identifiers are generally supported in the API:

Table 14. API Resource and Target Descriptions
Resource ScopeResource IdentifierDescriptionExampleAllowed context

vpc

The name of the VPC

Targets all networking traffic from the specified VPC. This is either the current VPC or a peered VPC.

"vpc:default"

Firewall rule target, firewall host filter, route target

subnet

The name of the VPC Subnet

Targets all networking traffic from the specified VPC Subnet.

"subnet:databases", "subnet:blog-web"

Firewall rule target, firewall host filter, route target

instance

The name of the instance

Targets all of the IP addresses that are assigned to that instance.

"instance:frontdoor-lb"

Firewall rule target, firewall host filter, route target

tag

The name of a tag

Targets all instances that have the same tag.

"tag:https"

Firewall rule target, firewall host filter

ip

A specific IP address or CIDR block

Targets the specific IP address

"ip:10.20.30.00/24", "ip:2600:3c00::f03c:91ff:fe96:a264"

Firewall host filter, route target

inetgw

The name of an Internet Gateway

Targets the specified Internet Gateway

"inetgw:default"

Firewall host filter, route target

fip

Name of a Floating IP

Targets the specified floating IP.

"fip:foobazco.org"

Route target

Whole Object Updates, ETags, and Conditional PUTs

Every HTTP object has a corresponding GET method and can be updated with a PUT method. When using the GET method, the entire object is returned. To update the object, the entire new version of the object must be included. There is no partial update functionality with the PUT method.

All of the GET and PUT endpoints support the use of an ETag to perform conditional requests. The ETag is returned as part as an HTTP header of a successful GET or PUT request.

Object Lists

There are several HTTP resources in the networking APIs that represent endpoints that can be used to list resources. For each resource that can be listed, there is pagination, meaning that not all entires will be returned in a single request.

The networking API will work like our other APIs and should follow what was laid out in RFD 4 User Facing API for pagination.

VPC APIs

VPCs are represented by an object. When a project is created, a VPC is usually created automatically. Every VPC contains the following fields:

Table 15. The VPC API Object
KeyValue TypeRead-OnlyDescription

name

String

No

The name of the resource. This is used as the main key in the API.

description

String

No

An optional description.

dns_name

String

No

The name that is used for the resource in DNS.

id

String

yes

A UUID for the object.

time_created

string

yes

The date and time that the object was created.

time_modified

string

yes

The data nd time that the object was last modified.

List VPCs (GET /project/{project_name}/vpcs)

The List VPCs endpoint, GET /project/{project_name}/vpcs, returns an array of JSON objects, each of which has the form described above.

XXX Example

Create VPC (POST /project/{project_name}/vpcs)

The Create VPC endpoint, POST /project/{project_name}/vpcs, can be used to create a new VPC inside of the project {project_name}. When creating a VPC, one can opt to have the VPC defaults set up or not. When creating a VPC with the defaults, the following will be configured:

  • A default VPC Subnet will be created with both IPv4 and IPv6 CIDR blocks.

  • The VPC firewall will be populated with the default rules.

When created a VPC, the following parameters may be specified.

Table 16. The Create VPC API parameters
KeyValue TypeRequiredDefault ValueDescription

name

string

yes

-

The name of the resource.

description

string

no

-

An optional description of the VPC.

dns_name

string

no

The value of name.

Sets the value of the VPC that will be used in DNS. If this is not specified, this property will be seeded with the value of the name property.

ipv6_prefix

string

no

The system default

This sets the default prefix that will be used for IPv6 addresses. This must come from the IPv6 Unique Local Addresses in the fd00::/48 range. If it is not supplied, the default one will be chosen.

setup_defaults

boolean

no

false

Causes the VPC to be created with its default properties as discussed above.

VPC Subnet APIs

A VPC Subnet are represented as a JSON object with the following fields:

Table 17. The VPC Subnet Object
KeyValue TypeRead-OnlyDescription

name

String

no

The name of the resource. This is used as the main key in the API.

description

String

no

An optional description.

availability_zone

String

yes

The availability zone the subnet is located in.

custom_router

String

no

The name of the optional custom router that this subnet uses.

dns_name

String

No

The name that is used for the resource in DNS.

id

String

yes

A UUID for the object.

ipv4_block

String

yes

The IPv4 CIDR block associated with the VPC Subnet.

ipv6_block

String

yes

The IPv6 CIDR block associated with the VPC Subnet.

mtu

Integer

yes

The MTU of the network in bytes. It is always 1500.

time_created

string

yes

The date and time that the object was created.

time_modified

string

yes

The data nd time that the object was last modified.

vpc_id

String

yes

The UUID of the VPC this VPC Subnet is a part of.

VPC Router APIs

Table 18. The VPC Router Object
KeyValue TypeRead-OnlyDescription

name

String

no

The name of the resource. This is used as the main key in the API.

description

String

no

An optional description.

id

String

yes

A UUID for the object.

time_created

string

yes

The date and time that the object was created.

time_modified

string

yes

The data nd time that the object was last modified.

vpc

String

yes

The name of the VPC this VPC Subnet is a part of.

VPC Route APIs

The VPC Route rules are represented as a JSON array of route objects. Here’s an example of a set of rules that might appear in a VPC Custom Router’s routes:

[
	{
		"name": "vpn_v4",
		"destination": "ip:172.16.0.0/12",
		"target": "instance:my_vpn",
		"description": "Rule to route traffic to our DC in Hobbiton"
	},
	{
		"name": "drop_db",
		"destination": "subnet:cust_db",
		"target": "drop",
		"description": "don't let folks route to that subnet"
	}
]

The body of each rule has the following fields:

Table 19. VPC Route Object Body
KeyValue TypeValid ValuesDescription

name

string

Standard RFC1035 names, as in the rest of the API

The name of the resource. This is used as the main key in the API.

destination

string

The 'vpc', 'subnet', and 'ip' Target Strings.

Specifies what network traffic the route matches. See the discussion below.

target

string

The 'vpc', 'subnet', 'instance', 'inetgw', and 'ip' Target Strings

Lists where traffic specified in destination should be forwarded to for a specific. See the discussion below.

description

string

N/A

An optional string that provides additional information about the rule

In the destination member, the Target Strings have the following semantics:

  • ip: Indicates that the routing rule matches the specified IP addresses.

  • subnet: Indicates that the routing rule matches the VPC Subnet’s IPv4 and IPv6 prefixes.

  • vpc: This may only be used to specify a peered VPC and represents all of the subnets that are currently peered.

In the target member, the Target Strings have the following semantics:

  • ip: Sends all traffic to the specified IP. If this IP is not something that exists on the network, then traffic may be dropped.

  • instance: Sends all traffic to the primary IP address of the specified instance. If the instance’s IP address changes, this will change along with it.

  • inetgw: Sends all traffic to the specified Internet Gateway.

  • subnet: This ensures that the specified traffic is sent to the specified VPC Subnet. This is used by the VPC System Routing Router to ensure that all traffic for a specific VPC Subnet is directed to it.

  • vpc: This may only be used to specify a peered VPC. This ensures that all traffic is sent to the peered VPC.

Get VPC Router Routes (GET /routers/{router_name}/routes)

The GET VPC Router Routes endpoint, GET /routers/{router_name}/routes, returns the set of VPC Router routes that exist for the specified router.

XXX Example

Set VPC Router Routes (PUT /routers/{router_name}/routes)

The Set VPC Router Routes endpoint, PUT /routers/{router_name}/routes, replaces the set of VPC Router routes for the specified router. The body format is as described above for GET. Note, the VPC System Router rules have restrictions on what can be modified.

XXX Example

VPC Firewall APIs

The VPC Firewall rules are represented as an array of JSON objects. Here’s an example of a set of rules:

[
	{
		"name": "http",
		"status": "enabled",
		"direction": "inbound",
		"targets": [ "tag:web" ],
		"filters": {
			"protocols": [ "TCP" ],
			"ports": [ "80", "443" ]
		},
		"action": "allow",
		"priority": 1000,
		"description": "server our HTTP traffic externally"
	},
	{
		"name": "bastion",
		"status": "enabled",
		"direction": "inbound",
		"targets": [ "subnet:db", "subnet:admin" ],
		"filters": {
			"hosts: [ "vpc:our_vpc", "ip:1.2.3.4/32", "ip:2600:3c00::f03c:91ff:fe96:a264/128" ],
			"protocols": [ "TCP" ] ,
			"ports": [ "22" ]
		},
		"action": "allow",
		"priority": 50,
		"description": "allow our bastion host to ssh into systems"
	},
	{
		"name": "no-ssh",
		"status": "disabled",
		"direction": "inbound",
		"targets": [ "vpc:our_vpc" ]
		"filters": {
			"protocol": "TCP",
			"ports": "22"
		},
		"action": "deny",
		"priority": 100,
		"description": "drop non-bastion ssh traffic, disabled on 4 APR 2020 for debugging by rm"
	}
]

Each rule has the following fields:

Table 20. Firewall Rule JSON Object Body
KeyValue TypeValid ValuesDescription

name

string

Standard RFC1035 names, as in the rest of the API

The name of the resource. This is used as the main key in the API.

status

string

"enabled" or "disabled"

Determines whether or not the rule is in effect.

direction

string

"inbound" or "outbound"

Determines whether the rule is checked when a packet arrives at an instance (incoming) or when it leaves the instance (outgoing).

targets

string array

The 'vpc', 'subnet', 'instance', 'tag', and 'ip' Target Strings

Lists the sets of instances that the rule applies to.

filters

JSON Object

See Firewall Filters

Items that filter the scope of the rule

action

string

"allow" or "deny"

Indicates whether the rule should allow or deny traffic.

priority

integer

0-65535

Indicates the relative priority of the rule

description

string

N/A

An optional string that provides additional information about the rule

Firewall Filters

A filter is a JSON object with a series of keys, each of which describe a different axis to filter the object on. A filter can have any combination of the following fields, but it must have at least one of them:

  • hosts: An array of strings, each of which is a target string. See Target Strings for further restrictions

  • protocols: An array of strings. Valid protocols are: TCP, UDP, and ICMP

  • ports: An array of strings, each of which is a port to allow. A range of ports can also be specified by including two numbers separated by a '-'. This range is treated as inclusive. For example, "80-100" would match ports 80 through 100, including both 80 and 100.

A given packet must match all of the filters to be affected by a rule. If a filter indicated that the protocol TCP and port 12345, then if the packet was UDP on port 12345 or TCP traffic on another port, it would not match.

When more than one entry is specified for either of hosts, protocols, or ports, then the packet can match any of them. For example, if you look at the "http" rule in the earlier example, then it will match both TCP packets on port 80 and TCP packets on port 443.

Phrased a different way, each filter category (hosts, protocols, and ports) are joined together with a logical AND (&&), while entries within a category or joined together with a logical OR (||).

Get VPC Firewall Rules (GET /firewall/rules)

The GET VPC Firewall Rules endpoint, GET /firewall/rules, returns the set of VPC firewall rules that currently exist. The object returned is a single JSON object with all the firewall rules as described above.

XXX Example

Put VPC Firewall Rules (PUT /firewall/rules)

The PUT VPC Firewall rules endpoint, PUT /firewall/rules, replaces the entire set of firewall rules with the new object. The object format is as described above.

XXX Example

VPC Internet Gateway APIs

Table 21. The VPC Internet Gateawy Object
KeyValue TypeRead-OnlyDescription

name

String

no

The name of the resource. This is used as the main key in the API.

description

String

no

An optional description.

availability_zone

String

yes

The availability zone the Internet Gateway is located in.

id

String

yes

A UUID for the object.

gateway_ip_mode

String

No

Either 'automatic' or 'manual'. Indicates whether the system should automatically select IPs to use for the Internet gateway on its own or if the user will explicitly scale up and done the IPs.

gateway_ip_pools

String Array

No

An optional array of strings that indicate the IP pools to leverage for the Internet gateway when in 'automatic' mode. The gateway must be in 'automatic' mode for this to be used.

gateway_ips

String Array

No

An array of strings, each of which indicates a floating IP address to use for the gateway. The gateway must be in 'manual' mode for this to be set.

gateway_type

String

Yes

Indicates the type of Gateway. This will always return the string 'internet'.

time_created

string

yes

The date and time that the object was created.

time_modified

string

yes

The data nd time that the object was last modified.

Floating IP APIs

Table 22. The VPC Floating IP API Object
KeyValue TypeRead-OnlyDescription

name

String

No

The name of the resource. This is used as the main key in the API.

description

String

No

An optional description.

dns_name

String

No

The name that is used for the resource in DNS.

ip

String

Yes

The string form of the IP address.

ip_type

String

Yes

Indicates if it’s an IPv4 or IPv6 address.

ip_pool

String

Yes

An optional parameter that indicates the name of the IP pool this was allocated from.

target

String

No

A Target Strings that indicates what the resource is attached to. Supported targets are instances and VPC Internet Gateways.

time_created

string

yes

The date and time that the object was created.

time_modified

string

yes

The data nd time that the object was last modified.

IP Pool APIs

Table 23. The IP Pool Object
KeyValue TypeRead-OnlyDescription

name

String

No

The name of the resource. This is used as the main key in the API.

description

String

No

An optional description.

ip_block

String

yes

The IP CIDR block associated with the IP Pool. Whether it is IPv4 or IPv6 depends on the value in ip_type.

ip_type

String

Yes

Indicates if this pool references IPv4 or IPv6 addresses.

time_modified

string

yes

The data nd time that the object was last modified.

API Summary

RoutePurpose

/projects/{project_name}/vpcs/{vpc_name}

Manage VPCs

/projects/{project_name}/vpcs/{vpc_name}/subnets/{subnet_name}

Manage VPC Sunets

/projects/{project_name}/vpcs/{vpc_name}/subnets/{subnet_name}/ips/{ip_address}

Manage IPs on a VPC Subnet

/projects/{project_name}/vpcs/{vpc_name}/routers/{router_name}

Manage VPC System and Custom Routing Tables

/projects/{project_name}/vpcs/{vpc_name}/routers/{router_name}/routes/{route_name}

Mange specific routes for a VPC System and Custom Routing Table

/projects/{project_name}/vpcs/{vpc_name}/firewall/rules/{rule_name}

Manage VPC Firewall Rules

/projects/{project_name}/vpcs/{vpc_name}/gateways/internet/{inetgw_name}

Manage VPC Internet Gateways

/projects/{project_name}/vpcs/{vpc_name}/gateways/vpc/{vpcgw_name}

Manage VPC Peering

/projects/{project_name}/vpcs/{vpc_name}/gateways/vpc/{vpcgw_name}/peered/subnets/{subnet_name}

Manage VPC Subnets received through VPC Peering

/projects/{project_name}/floating-ips/{ip_name}

Manage Floating IPs

/projects/{project_name}/ip-pools/{pool_name}

Manage IP Pools

/projects/{project_name}/search/ips

Search for the usage of an IP address

Open Design Questions

IPv6 Addressing

IPv6 addressing marks an opportunity for a major departure from how IPv4 is often treated in cloud deployments. Because IPv4 addresses are scarce and there is a limited set of both public and private addresses, all of the major cloud providers have virtualized IPv4 addresses that customers use for communication. In addition, they use a 1:1 NAT for mapping Internet-accessible IPv4 addresses to an internal address.

With IPv6, these same constraints aren’t necessarily required. AWS is using their large IPv6 address space allocations to give every VPC a public /56 which is split into explicit /64 allocations for a subnet. The implication is that for those IPv6 uses, there is no difference between whether an address is considered public or private. Rather, it is all about the firewall posture.

A related challenge with this model is the question of how does a floating IP work. Critically, AWS does not support Elastic IP for IPv6. This makes sense, given the design that they’re using. AWS likely is not doing any kind of virtualization games here, instead they’re just using normal routing and, ACLs, and firewalling.

While the current API proposes one solution to this, there is a related thing that we need to consider, which is how we map addresses to and from one location. The advantage of the current NAT approach is that it doesn’t require any adjustments in the guest. However, this means that it’s also not possible to have multiple external IP addresses and distinguish them. An alternative approach that would require more work in the guest would be to route those addresses to the guest directly. This relies on the use of addresses outside of the ULA space.

The current API design mimics IPv4 and allows IPv4 and IPv6 to have a similar experience. It also works on the assumption that customers will not have a large amount of publicly routed IPv6 space.

To try and help answer these questions, there is the IPv6 part of the customer questions section.

Routing Table Destinations

Currently, VPC Router destinations are mostly described as subnets. As we have other resources that might dynamically expand to more than one IP block (for example a default subnet with IPv4 and IPv6), it may make sense to allow destinations to be input that way. If we did this, that would mean that the destination would include the following:

  • IPv4 and IPv6 CIDR blocks

  • Specific Subnets

  • Networks from VPC Peering

  • Future looking things like VPNs to and from clouds like AWS or on-premises where routes can be exchanged used BGP.

The main things that this approach raises are questions around things like:

  • Having to explain the concept to customers that may be a little different from what they’re used to (though there are already different types of targets for routes).

  • Complexity in the back end and making sure changes are reflected across the system.

  • Having a way to print out the normalized into IP CIDR block form.

On the other hand, it would solve some potential problems:

  • One wouldn’t forget to route one of IPv4 or IPv6

  • The system could automatically adjust when faced with changes.

I’m not sure there are many customer questions that I would ask to help us clarify what direction to proceed down. My inclination is probably to add support for it in the user API.

Internet Gateway Design

The scalability of an Internet Gateway and its design from a user perspective is an interesting challenge. Here are some of the goals and constraints that we have, some of which are in tension:

  1. We want to make it easy for groups of instances to use a fixed set of external IPs to make it easier when external services allow and deny access to them based on the IP address.

  2. We want to be able to make this scale with a users footprint in the network. When there is a lot of NAT activity going on we need to figure out how to design this such that we avoid NAT exhaustion. This means we may need the number of IP addresses to scale and indicate when we’re running out. Though if we know when we’re running out, we should avoid asking the user to manage it.

  3. Currently in the VPC Router design there is a VPC-wide System Router. However, the floating IP address may realistic be limited to a single AZ. Is it OK to have users manage addresses on a per-AZ basis.

  4. We want to make sure that we avoid a single instance or point that is doing the NAT, to ensure it’s scalable. In an ideal world, being able to perform the NAT on the local machine that traffic is originating would be great and would fit in with some of what we’re doing from a general architecture and floating IP direction.

I don’t have concrete other API proposals; however, this is what has led to the current set of thoughts. However, as we evaluate what makes sense and what doesn’t, this may be useful to keep in mind. There are a bunch of trade offs between scalability by default and having a dedicated set of IP addresses.

The ability for an Internet Gateway to have no IP addresses associated with it feels like a bit of a kludge. While it does simplify the router / floating IP interaction in some ways, I wonder if it makes it more complicated at the same time. It’s not clear to me what percent of instances will want to have all outbound Internet access cordoned off, but also have a floating IP address associated with a single instance in a given subnet.

Floating IPs and Routing Tables

The current design requires that a router entry be present in either the system or a subnet’s VPC Custom Router for an Internet Gateway. The current view is that if there is no entry for an Internet Gateway the instance cannot route to the Internet. It’s not clear how much this matches what someone would naively expect or not.

We could consider having this bypass the Internet Gateway and associated policy altogether, though we still have to honor the VPC Router for other aspects (for example a deny rule for a CIDR block). Ultimately, I think it’s clearer that we do require it so that way thing are consistent. Even if it does introduce a few oddities for the Internet Gateway discussed above.

DNS Record Types

At the moment we’re only suggesting supporting the A, AAAA, and PTR records in our DNS systems. There are a few other record types we should look at:

SSHFP Records

The SSHFP record type encodes an ssh host key fingerprint in DNS. This gives a second, out of band means for communicating what a host’s ssh host key fingerprint is, which comes up every time a new host is contacted.

OpenSSH (the mainstay ssh server in Unix-like systems) trusts the ssh host key at two different levels. By default, it will note the fact that this is in DNS in addition to the normal prompt about the host key fingerprint. The second level occurs if the DNS zone is signed with DNSSEC. In that case OpenSSH will automatically accept the host key and will not prompt the user about it (unless there is a mismatch with known hosts).

Because this requires DNSSEC to be present to be fully useful, it is not clear whether or not this makes sense. It also would require an agent inside of the guest to publish and update metadata about the key and know about changes that occurred to it.

SRV Records

SRV records provide a generic way to indicate a type of service. An SRV record can indicate multiple IP address and port combinations with different weights. For these records to work they would need cooperation from the instance itself. To see examples of where that might make sense, see the DNS future directions section.

DNS Record Scheme

The DNS scheme for records that was proposed is somewhat verbose. The following were the goals that went into this design:

  1. Try to avoid the use of UUIDs by ensuring that names could be sufficiently unique without limiting what a user can call something.

  2. Prefer the use of the same name internally as externally to simplify life.

  3. Make sure that the DNS scheme will deal with future directions.

The current scheme is designed as:

....

By using the organization, project, and VPC, we can avoid the issue of whether or not duplicates exist. While it’s easy to enforce that there are no duplicates for a single user, when we start to bridge multiple projects and VPCs together, that becomes much more difficult and cumbersome.

Currently, the defined name spaces are for instances and Floating IPs. These take the forms <instance>.<az>.inst and <name>.fip respectively. If we were to integrate load balancers or service names into DNS, then those could have their own scheme. For example we could see all of them as:

..inst....
.fip....
.svc....
.tag....
.lb....

This currently suggests that we want to include everything as below the VPC for resources. That may be reasonable, given that this is the granularity for networking.

To reduce the friction of names, there are a few things we could do. For example, by default, we can set the DNS search domains in an instance to things like: inst.<vpc>.<project>.<org>.<suffix> or <vpc>.<project>.<org>.<suffix>. This will give us good ergonomics. In addition, because of things like VPC peering, being able to have longer names allows us to actually expose DNS names to all members of the peered VPC.

Hot-plug of Interfaces and IPs

We will inevitably hit the point where the wrong IP address is assigned to an instance or an instance wants to have a network interface added or removed. These two are not the same, though have some related challenges.

Today most virtual network interfaces whether they are based on virtio or SR-IOV do not generally show up as a hot-pluggable device. Most networking device drivers have not been made to be hot-pluggable or designed around surprise removal, unless working with a physical USB to Ethernet adapter, which we would not want to emulate for numerous reasons. That means that the only good way to remove an interface from a guest is basically to reboot it.

To that end, GCE explicitly doesn’t allow changing the number of interfaces for an existing instance. Amazon does.

Adding and removing IP addresses from instances is much easier mechanically, but there is still coordination inside of the guest. Because most addresses are assigned via DHCP or statically, someone has to go and update that configuration or trigger a DHCP renewal from the instance. Depending on the amount of machinery that we add to guests, this can be made fairly automatic. Alternatively we can set lower lease times for DHCP, but that has its own trade offs.

While we should design the API in such a fashion that adding support for these is possible, we’ll need to think carefully about the implications of each of the paths and work through that in the design.

MTU

Currently, all networks default to an MTU of 1500 bytes. While this is the most common MTU across the current public Internet, having a higher internal MTU can reduce the overhead for reaching higher-throughputs. On the other hand, when different networks have different MTUs, that can cause a lot of hard to debug friction, especially due to challenges around Path MTU discovery.

Currently, we include the MTU of subnets and have a default MTU in the top-level VPC. We can make it easy to change these going forward if we’d like as we explore things with customers. Some questions to help us better understand this are in the MTU part of the customer questions section.

Customer Questions

IPv6

To help us understand the open IPv6 questions discussed in IPv6 Addressing, here are some questions we might ask customers:

  • Do you use IPv6 in your environment today? If so, do you have large public IPv6 allocations?

  • How important are floating IPs to you? Is there value in them behaving the same way between IPv4 and IPv6?

  • Do you have applications where multiple different AWS Elastic IPs (or equivalent) are pointed to the same instance? If so, does the fact that you can’t distinguish between them cause problems?

DHCP and Options

  • If you use AWS, have you ever created a custom DHCP Option Set?

  • Do you run DHCP in your on-premises environment today? If so, what information do you distribute via DHCP?

  • When running in the cloud, have you ever used a DNS server other than the one provided?

MTU

  • Do you use Jumbo Frames inside your data center today or on AWS? If so, have you ever had issues with Path MTU discovery?

Future Directions

This section discusses future ways the networking APIs might evolve and how that might fit into what we do. The purpose here is to make sure we haven’t foreclosed developing down a given path.

DNS

While Oxide is not looking to be a public DNS registrar, there are other ideas that we can adopt that might make life easier for application writers. Especially in the face of not having a load balancer for version 1. These ideas were developed originally by Alex Wilson as part of the Joyent Triton Container Naming Service.

The more interesting thing that it did was to provide a bunch of DNS SRV records for services. Services could register an SRV record by pushing a metadata tag out. Once that was the case, CNS would advertise that as part of the SRV record. It would also do some amount of rudimentary health-checking by paying attention to whether the instance was up or down at an API level and if the host it was on was up or down. There was also metadata that an instance could add to temporarily remove itself from the SRV record.

While this requires cooperation, this could prove useful. As part of building out the Manta object store, SRV records where instances would register proved to be quite useful. Though the fact that instances had to explicitly register was important there for liveliness and could prove a reason that this doesn’t make sense.

A variant of the above is to allow a collection of addresses based on a tag. This would result in a number of A or AAAA answers depending on which instances had that particular tag.

The current design of our DNS record scheme does allow for expansion in this regard and wouldn’t make it too hard to include either approach.

Multiple IPs per Interface

Currently only a single IPv4 or IPv6 address can be assigned to an interface, though multiple interfaces can be assigned to an instance. In the future we should probably allow additional IP addresses to be assigned to an instance.

Multiple Peered VPCs

Today, we have a restriction on the number of VPCs that can be peered together. As we have customers build up system that exist across more than one region, we will want to be able to support peering together multiple different VPCs. The biggest gotchas are to understand what the routing and firewall rules between multiple disjoint things look like. For example you can imagine a love triangle of peered VPCs where A peers with both B and C, but B and C are not peered:

        VPC A
       /     \
      /       \
     /         \
  VPC C       VPC B

Working through the semantics of this or much more complicated relationships will be important to understand from an ergonomic perspective and also from the overlapping subnet perspective. It may make sense in such a world to allow a VPC Custom Router to pick how to win in the overlapping subnet problem. These are things that we should think about and make sure that the API is future proof for this all.

VPNs

Based on discussion in RFD 9, we are currently not trying to integrate VPNs in the first version of the product. However, we should consider how they might look.

There are a couple of different types of VPNs that we need to consider that customers are looking for. There are a few different types of VPNs to consider:

  1. Site to Site VPNs: These are used to bridge on-premises and cloud deployments today.

  2. Remote Access VPNs: These are used where a company has employees that are remote that they’d like to be able to bridge their devices onto a given network. This is sometimes called a 'Road Warrior' VPN.

While both of these are solvable in instances with software, the first case is one where we can really improve the experience with broader integration into the VPC Router and minimizing the setup issues. Realistically, both of these want different looking API structures. For the time being, we’ll only discuss the first category.

Today, all of the major providers utilize IPsec to create these site to site VPNs through a combination of dedicated hardware appliances form companies like Cisco and Juniper as well as scalable software services. BGP routes are exchanged over these links to allow for high-availability and to minimize the manual configuration that is required as networks are added and removed. While there are other technologies such as WireGuard and OpenVPN, we are focusing on IPsec due to the interoperability with AWS, GCP, Azure, and others.

When an IPsec VPN is created, the system will do the following:

  • Add a new VPC Router rule target that represents this remote connection for VPC Custom Routers and firewall rules.

  • Add routes that are pushed over the IPsec VPN as a type of route in the VPC System Router as long as they don’t conflict with existing local or peered subnets. We will also need explicit filter lists and rules to make sure that we can clearly deal with conflicting rules and making sure that one side doesn’t leak all of the rules. It may make sense to have an explicit prefix that we only accept routes for. The goal is to make it easy to deal with multiple actual routes for the VPN (AWS and GCP use this for HA), with not wreaking havoc with the network. These BGP-based routes will be automatically inserted and removed into the VPC System Route Table.

  • Add a new Firewall rule target that covers everything advertised by the IPsec VPN tunnel.

  • Allow one to put a filter on the types of routes that are accepted over the tunnel.

  • Indicate which subnets should be advertised over the IPsec connection, which may be all of them.

  • Potentially allow for Internal DNS queries that originate from over the tunnel to be handled for the shared VPC subnets.

This would fit into the API with the following high-level API summary:

Table 24. API Route Summary: VPN Gateways
VerbRoutePurpose

GET

/gateways/vpns

List VPNs

POST

/gateways/vpns

Create a new VPN

GET

/gateways/vpns/{vpn_name}

Get information about the specified VPN

PUT

/gateways/vpns/{vpn_name}

Update information about a VPN

DELETE

/gateways/vpns/{vpn_name}

Delete the specified VPN

Firewall Flow Logging

Firewall flog logging is a feature that many clouds and switches have which basically operates as a series of flow logs that include what connections were active, statistics about the connection, and what kinds of connections were dropped.

Here are some considerations for the API:

  • What is the granularity of flow logging? Is it enabled on a per-instance, subnet, or VPC basis?

  • When flow logging is enabled, are all rules logged or only a subset of them? What is the granularity for that control?

  • How do we actually collect and manage that data? How do we keep the overhead on the number of connections that are logged to a reasonable volume so as not to disrupt service? Many other cloud systems tie into a logging or analytics engine. It’s not clear exactly what that would look like here.

  • Is logging of an entire VPC or Subnet all at once important?

This would probably cause us to add additional fields to the existing firewall rules objects or other things such as instances for collecting the flow logging.

This could have the following high-level API endpoints for us to consider. These don’t go into all the details that we would like, but at least give us a starting point for how it might fit into the broader API. At a high level one could create a firewall logging session and then mark certain rules with which of the logging instances they belong to.

Table 25. API Route Summary: Firewall Logging
VerbRoutePurpose

GET

/firewall/logging

List all active VPC firewall logging sessions

POST

/firewall/logging

Create new VPC firewall logging session

GET

/firewall/logging/{logger_name}

Get information about a specific VPC firewall logging session

PUT

/firewall/logging/{logger_name}

Update information about a specific VPC firewall logging session

DELETE

/firewall/logging/{logger_name}

Delete a specified VPC firewall logging session

GET

/firewall/logging/{logger_name}/data

Get the current session data (this interface will need to probably be re-imagined.

Multiple Disjoint Firewalls

In the current API design, we have a singular firewall resource that contains all the rules that are present in the VPC. The scope of these rules determines what sets of instances we target. There is a possible future where we would want disjoint collections of rules for a VPC. While I’m not sure where exactly this fits in, it’s worth evaluating how this impacts the API.

Today, in the vein of the global firewall we have a single entry point: /firewall/rules. This represents the system firewall. In a world of multiple disjoint firewalls, this could instead be represented as /firewalls/{firewall_name}/rules.

This means that if we were to make a transition, we would need to pick a name for the primary firewall, perhaps system and then we would need to reserve both system and rules for the names of firewalls and basically rewrite actions on /firewall/rules to /firewalls/system/rules.

An alternative approach would be to say there is an explicit name for the firewall object today and just always have it there, but no means of creating or deleting sets of rules. This would leave /firewalls as more of a general listing endpoint which has a single entry and then we’d have /firewalls/system/ and /firewalls/system/rules. The previously described aspects of firewall logging would all change in an analogous fashion.

I’m not really sure how we would apply these disjoint rule sets to different VMs, but if we came up with something, it would reduce the complexity of adding it.

Load Balancing

Load balancing is a flagship feature of many clouds and one of the things that customers care quite a lot. RFD 9 Networking Considerations goes into some rationale as to why it’s not in version 1 of the product, but we can imagine it will be there in a subsequent version before too long.

While a whole separate discussion is needed to describe the actual features of load balancers that are worth having (L4, L7, TLS, health checking, forwarding vs. terminating, etc.) there are a few things that we want to tease out as part of the current API design.

  1. Load balancers will almost certainly appear in some form of DNS and they will fall into and out of it. While we don’t know if there would be a single IP or multiple, we should assume that we’ll want to fit it into our existing schemes for DNS in a reasonable way. A token proposal is in the DNS Record Scheme section.

  2. Load balancers probably want to be a regional entity so that way they can easily target anything in the AZ. We also probably want to think about how anycast IP addressing may fit into things if we ever want to go to a global scope.

  3. We should make sure it’s easy for floating IPs to possibly be assigned to not just an instance, but also a load balancer. The load balancer may also be internal, so being able to reassign internal addresses is also useful.

  4. We will want to make sure that when instances are replying to traffic from a load balancer that it is taken into account with the routers and gateways. There are several approaches that we will want to think about depending on how the load balancer is implemented. If this is all based on routing games, then we may want Internet Gateways to be a part of this. On the other hand, allocating explicit IPs on the subnets that the load balancer is forwarding to so that there’s always an on-network place to return the traffic to and that it originates it from, may be a smoother experience.

  5. We want to make it easy to understand traffic that originates from and goes to the load balancer in firewall rules for instances.

  6. As we look at thing like firewall rule logging and other statistics mechanisms, we’ll want this to fit in as well.

  7. When we design a scalable NAT, a load balancer is in some ways just a slight rehash on aspects of that problem.

From an API perspective, the load balancer pretty easily fits in as a project-scoped service, as it can potentially operate between VPCs. Here are examples of where it would fit in, with an implicit /projects/{project_name} on the endpoints. However, you could also replace the /net with a specific VPC scope of /vpcs/{vpc_name}. Note the APIs here are completely hypothetical and would need to change based on the service we actually want to create:

Table 26. API Route Summary: Load Balancers
VerbRoutePurpose

GET

/projects/{project_name}/lbs

List load balancers in a project

POST

/projects/{project_name}/lbs

Create a new load balancer.

GET

/projects/{project_name}/lbs/{lb_name}

Get information about a specific load balancer.

Patch

/projects/{project_name}/lbs/{lb_name}

Update information about a specific load balancer.

DELETE

/projects/{project_name}/lbs/{lb_name}

Delete the specified load balancer.

GET

/projects/{project_name}/lbs/{lb_name}/targets

List the load balancer targets

POST

/projects/{project_name}/lbs/{lb_name}/targets

Create a new target for a load balancer

GET

/projects/{project_name}/lbs/{lb_name}/targets/{target_name}

Get information about a specific target of the load balancer

PUT

/projects/{project_name}/lbs/{lb_name}/targets/{target_name}

Update information about a specific target of the load balancer

DELETE

/projects/{project_name}/lbs/{lb_name}/targets/{target_name}

Remove a target from the load balancer.