RFD 4
User Facing API
RFD
4
Updated
Note
This RFD served as an early sketch of the API in order to stimulate discussion and external feedback…​ which it did! The principles and fundamentals of API design described still apply. The specifics of the API, however, are long out of date. Instead see RFD 322 for a significant update to the organization of APIs, other RFDs for domain-specific API information, and—​most definitively—​to the actual OpenAPI description (See here).

The developer and operator experience of interacting with an Oxide rack is of the utmost importance. We want to consider not only the needs of developers and operators but also the needs of integrations into tooling and software already being used. We want to aim for minimalism and simplicity and try to have the most intuitive API, while still maintaining support for up-stack software like Terraform, Packer, and Kubernetes cloud manager / persistent volumes. This way Kubernetes, Terraform, etc will run seamlessly on the Oxide rack!

Existing cloud provider APIs are feature-heavy and complex. We will try to keep this API to a minimal interface that fulfills all the needs of the consumers. We will start with the basics and grow with customers needs. As the API grows, we will want to keep in mind making sure the behavior and naming stays intuitive to users.

Note
Any and all API schema attributes in this document might change in the future or during implementation time. These merely serve as an example.

General API Properties

OpenAPI

We will need to generate clients for our API in many different languages. The defacto way to spec APIs is OpenAPI. We will use the OpenAPI specification to document and generate clients for our API. What is nice about OpenAPI is the rich ecosystem of tooling surrounding it. It also eases pain around API versioning and making sure that we do not make any backwards incompatible changes.

Client-friendly data modeling

Each language client should be generated in a way that does not make developers cringe. For example, some generated clients for Go are not idiomatic with the programming language so it makes Go-developers go bonkers. This is not just about client generation; it also implies constraints on our API design and implementation. For example, we have found that Rust enums with struct variants translate into an OpenAPI schema that is not handled well by most client generators.

Documentation

The defacto best API developer docs are Stripe’s. This is due to the side-by-side documentation and code. Having the ability to copy paste a code snippet in any language as well as a curl command gives this documentation the gold star.

There are a few OpenAPI generators for documentation.

  • redoc: This is most aesthetically close to the Stripe documentation site. One problem with redoc is that it does not statically generate the files, instead it does all the rendering with JavaScript at runtime. Since we have a spec and will generate docs from that I would much prefer that the docs be statically generated and not rely on runtime generation. This will give users a much better experience on old or slow browsers.

  • spectacle: This option statically generates the documentation but doesn’t have nearly as good of a design as redoc. We can spend time modifying the design of spectacle if it is within scope to get it to be more elegant like that of Stripe. I think this is the more sane route than the redoc route. Modifying the css seems easier than changing the behavior of whatever JavaScript runtime redoc is using.

Examples

One of the rather neat things about OpenAPI is the ability to add examples via the yaml files for that api, you can read more on that here. We should use these wherever we can.

Testing

Developer Playground

Stripe does this really nicely.

You can use the Stripe API in test mode, which does not affect your live data or interact with the banking networks. The API key you use to authenticate the request determines whether the request is live mode or test mode.

We could make a playground for the Oxide API where people can perform API calls without modifying actual infrastructure. This would make the life of upstream consumers of our API a lot easier.

Behavior

For upstream consumers of the API (Kubernetes, Terraform, etc) we should define the behavior of our API and various API calls.

Auditing

All actions will be recorded in an audit log and made accessible to operators. This is exposed through the /system/audit API.

Important
TODO: flesh this out in another RFD and then link back to it here.

Style Conventions

  • Integer counters start with n (like ncpus instead of just cpus).

  • Timestamps start with time (e.g., timeCreated) and are always ISO 8601 timestamps in this format.

  • Property names use camel case with the initial character being lowercase and the first letter of acronyms capitalized (e.g., billingAccountId, not billingAccountID). Since this is counter to the Go style guidelines that say all letters of acronyms are capitalized, we will ensure that the clients we generate for the Go language are idiomatic go and that billingAccountID points to json:"billingAccountId". This way we can have our cake and eat it too :) make all language clients idiomatic.

Time

As stated above timestamps will be in the ISO 8601 format always. We will not specify a timezone just the full timestamp, in UTC format. This time stamp will appear in any logs and will be returned from the API for any fields involving time.

When we abbreviate time in the UI it will be shown in the following way per unit of time.

Units of time abbreviation
NameAbbreviationExample

Milliseconds

ms

200ms

Seconds

s

7s

Minutes

m

1m

Hours

h

20h

Days

d

6d

Weeks

w

1w

Months

mo

10mo

Years

y

5y

For events that happen at a specific point in time, we can show it as the following more human readable way when needed.

Below, Compact refers the format for lists of items and Extended refers to the format for detail pages, single items.

Human readable event times
When?CompactExtended

0s - 1m 59s

Just now

Just now

2m - 59m

3m ago

Today at 2:30 PM

Today

6h ago

Today at 2:30 PM

Yesterday

Yesterday

Yesterday at 2:30 PM

Last 7 days

Tuesday

Tue at 2:30 PM

Current Year

May 12

May 12 at 2:30 PM

Past years

May 12, 2015

May 12, 2015 at 2:30 PM

For showing a duration, the time passed since something started, we will use the following formats.

Duration
When?CompactLogs

<1s

<1s

<1s

<1m

4s

4s

<1h

2m 25s

2m 25s

<1d

3h 24m 5s

3h 24m 5s

>1d

4d 6h 5m

4d 6h 5m

Versioning

We will want a way to version our API so it is easy for users to understand. There are a few ways to go about this.

Stripe seems to go above and beyond with their methodology. We should do the same. Instead of using HTTP / in-band metadata to advertise the fact that it’s going away, they built out migrations which translate older requests into newer ones internally.

Stripe keeps things backwards compatible for a certain amount of time before destroying the migration, emailing the client developers a whole bunch about the change before that point.

Asynchronous Operations

Some API calls tell the control plane to "do a thing", whether that’s delete, create, or update a resource. Performing said task takes longer than the API call itself and the API will not block until it is done, it will return with an operationId in the response. The operationId can then be used to view the status of the operation without having to poll the resource status repeatedly. This is modeled after GCP’s operations.

This will only apply to API calls that are asynchronous. In the clients that are generated for those API calls we will denote asynchronous calls as functions ending with Async. For example a synchronous call to create a project will be CreateProject, but an asynchronous call to create an instance will be CreateInstanceAsync. This way developers utilizing those clients will not need the cognitive load to figure out if an API is asynchronous or synchronous.

For more details see Operations below. You can also use Etags for conditional requests.

We will also implement blocking functions in our client libraries to ``wait'' until an operation has completed. This will make integrations into upstream tools easier.

PUT and PATCH Behavior

Our default position is that everywhere in the API that you can update a resource, you do it with an HTTP PUT that replaces the whole resource.

RFC 5789 defines the HTTP PATCH verb for allowing clients to specify only what’s changed, rather than having to specify the entire contents of the resource. RFC 6902 defines JSON Patch, a JSON-based format for describing changes to JSON documents similar to diffs and patches for text files. Future versions of the API could support HTTP PATCH using JSON Patch if that’s deemed useful, and careful design may allow libraries to implement this atop the existing GET and PUT implementations for a resource, but we’ll defer this feature for now. Our expectation is that in most cases, clients would want to make changes to resources using Etags and HTTP conditional requests to avoid clobbering changes made by other clients. Further, in most cases it’s likely easier for clients to generate an entirely new copy of the resource than a diff from the original. As a result, there’s not much advantage to a diff-based format. (We also considered RFD 7396 (JSON Merge Patch), but it’s more restricted than RFC 6902 and wouldn’t support modifying either arrays or any object where null is meaningful.)

Update Behavior

If you update a resource while another update operation on that resource is being performed it will fail. The error returned will denote that a concurrent operation is being performed.

Delete Behavior

If you delete a resource while another delete operation on that resource is being performed, subsequent requests for delete should return that the resource was not found (404).

Deletes can be performed on any resource no matter what state the resource is in.

If the resource is in the failed state the User can still delete it, depending of course on if the failure was not due to trying to delete the resource. We might need a different way to handle that…

Availability Zones and Regions

When configuring their Oxide racks, operators can define which racks are in which availability zones and regions. One thing Azure Stack does super poorly is that each new rack is in its own zone, essentially every Azure Stack instance is it’s own zone. Obviously this is nuts and makes no sense.

For Oxide, it is up to the operator to define which racks are in which availability zones and regions. The point of availability zones is to help with fail-over if one zone fails the other should still be accessible. Oxide will not guarantee any of those physical elements needed for accessibility that are out of our control, such as a power outage or network outage. Regardless, operators can manage those risks on their own and classify racks into zones and regions.

A lot of this is thought out in [rfd24].

GraphQL

It has been discussed a bit but we will not prioritize a GraphQL interface for our API. This is for a bunch of different reasons:

  • GraphQL is still super young, our customers are not accustomed to it.

  • We want to dog-food our API as our customers will consume it for our UI and CLI so we do not want to consume the API that is least likely to be used (that being the GraphQL interface).

  • We can always add a GraphQL API down the road but, again, we should dog-food the API that is used the most in our tooling so that we catch bugs.

  • Our consumers are more likely to grasp an actual SQL interface written on top of our API so, alternatively, we might consider that. Especially for [rfd45] where operators might want to query for dropped packets at some range of time on a subset of machines. Of course, that should also not be a priority.

  • We do not have the problems that GraphQL solves… yet.

All this being said, it would be interesting to be the first "cloud-provider like API" with a native GraphQL interface. Although, it should not be a priority. And if we do it, we should make sure we have enough testing in place to know that is it always up to date with our actual API and has the same expectation about quality and performance. For example, GitHub’s GraphQL is always out of date and never works correctly. That is a terrible user experience.

General API Components

Identity Metadata

All API objects have similar identity metadata stored when the object is created or modified. We will make this global by using one uniform metadata data structure, then reference it in every other object. Some of this data is readOnly meaning the user cannot modify it.

This means that every resource has:

  • An id: uuid assigned at creation

  • A name: required, see below for details

  • An optional description

  • A timeCreated and a timeModified

More of this is defined in resources.

The IdentityMetadata data type will look like the following OpenAPI schema reference:

IdentityMetadata:
type: object
required:
- name
properties:
id:
description: |
The ID of the resource. This is assigned upon creation and cannot be
modified.
type: string
format: uuid
readOnly: true
name:
description: |
Name of the resource. Provided by the client when the resource is
created. The name can also be modified after creation. Names of
resources must be unique under the scope of the parent resource.
For example, Instance names are required to be unique to a project
and project names are required to be unique to an Organization.
Organization names need to be unique globally.
In the case of a User resource, the name is equivalent to the
username or slug for the user and is required to be unique to the
Organization.
The name must be 1-63 characters long, and comply with
RFC1035. Specifically, the name must be 1-63 characters long and
match the regular expression [a-z]([-a-z0-9]*[a-z0-9])? which means
the first character must be a lowercase letter, and all following
characters must be a dash, lowercase letter, or digit, except the
last character, which cannot be a dash.
type: string
example: excited-torvalds
description:
description: Optional description of the resource.
type: string
timeCreated:
type: object
properties:
timestamp:
description: The date and time the object was created.
type: string
format: date-time
readOnly: true
userId:
description: The user ID that created the object.
type: string
format: uuid
readOnly: true
timeModified:
type: object
properties:
timestamp:
description: The date and time the object was last modified.
type: string
format: date-time
readOnly: true
userId:
description: The user ID that last modified the object.
type: string
format: uuid
readOnly: true

Operations

Asynchronous API responses return an operationId. The API calls themselves are non-blocking but this allows clients to wait for an operation to complete without having to poll the status of a resource.

This makes implementing stuff like Terraform of K8S Operators really nice because they do:

  • Create API

  • Wait API

  • Read API

Usually three blocking network calls. Nice.

Versus without:

  • Create API

  • Loop, exponential back-off, problem left to reader on Read API until "status" == "running" (or equivalent)

N network calls and forces all clients to re-implement wait logic.

The following API endpoints relate to Operations:

GET /operations/{id} - Get an operation by its ID

The Operation data type will look like the following OpenAPI schema reference:

Operation:
type: object
required:
- id
properties:
id:
description: The ID of the operation.
type: string
format: uuid
readOnly: true
timeDone:
description: |
Time that the operation was completed.
If the value is not empty, it means the operation is completed, and
either error or response is available.
type: date-time
readOnly: true
timeStarted:
description: |
Time that the operation was started.
type: date-time
readOnly: true
error:
$ref: '#/components/schemas/Error'
response:
description: |
The normal response of the operation in case of success. If the
original method returns no data on success, such as Delete, the
response is empty. If the original method is standard
GET/POST/PATCH, the response should be the resource.

Errors

Errors are standard and include an error code and a human readable label.

The Error data type will look like the following OpenAPI schema reference:

Error:
type: object
properties:
code:
description: |
The error code.
type: string
readOnly: true
example: "ENOENT"
message:
description: |
A human readable description of the error.
type: string
readOnly: true

Etags

ETags, short for entity tags, are a common way to conditionally verify an HTTP cache. An ETag is a digest which represents the contents of a given resource.

When a response is returned by the server it will include an ETag to represent the resource’s state as part of the HTTP response headers. Subsequent HTTP requests which want to know whether or not the resource has changed since the last request can send along the stored ETag via the If-None-Match header.

The server will then compare the current ETag for the resource to the one provided by the client. If the two ETags match, the client’s cache is considered fresh and the server can respond with a 304 Not Modified status.

If the resource has changed since the last time the client has requested the resource the server will respond with a new ETag and the updated response.

Etags will also be used to avoid data corruption when concurrent updates are being done at the same time to a resource. For an example of this refer to [rfd43] for IAM policies.

The Etag header will look like the following OpenAPI schema reference:

Etag:
name: Etag
description: |
The RFC7232 ETag header field in a response provides the current entity-
tag for the selected resource. An entity-tag is an opaque identifier for
different versions of a resource over time, regardless whether multiple
versions are valid at the same time. An entity-tag consists of an opaque
quoted string, possibly prefixed by a weakness indicator.
in: header
example: W/"xy", "5", "7da7a728-f910-11e6-942a-68f728c1ba70"
required: false
schema:
type: string

Authentication

The nice thing about OpenAPI is you can also denote security schemes.

We should implement oauth2 as that is a widely accepted standard, this can be used for the management dashboard as well as for client interactions. We can also allow for authentication by SSH Keys like Triton. Both modes of authentication should be acceptable. SSH Keys will be really nice for authentication with our command line tool! We can also hook into the information from the SSHKey to know if they have Yubikey functionality like 2FA, FIDO/U2F, etc.

The permissions and authorization model is defined in [rfd43].

Two-factor Authentication

We will want to support two-factor authentication through mobile device apps like Google Authenticator or Authy but also U2F and FIDO for Yubikeys.

We will also want a way for operators to force all users to have two factor authentication enabled. GSuite does this really nicely by giving new users around a week to configure their two factor authentication method. We can do the same. One thing GSuite is terrible at is: reminding the user that they need to enable two-factor authorization. We should send each user a few reminder emails, as well as alerting them through the UI.

We will also want to give users backup passwords to use in the case they are locked out. They will be prompted by the UI to save these immediately after configuring two factor authentication. Slack and GitHub do this nicely in their user flows for two-factor authentication.

Integrations with Third Party Services

Active Directory and Google SSO being possibly the most popular, we should keep this in mind while building authentication and the user/teams database layout. In the future we might integrate heavily with these third party services.

We will need support for SSO and SAML definitely.

It should also be noted that lots of enterprises use OIDC or Okta so we should intergrate with those platforms as well.

A lot of the hierarchy in [rfd44] should be built with the thought in mind that it might heavily integrate with active directory, Google SSO / GSuite, and other third party tools.

In this scenario, Google Groups should map directly to our concept of teams. In the Active Directory model, Active Directory groups can map to Oxide teams and a users LDAP will be mapped to their Oxide user.

Access Tokens

The simplicity of the GitHub personal tokens with specific scopes is admirable, being able to authenticate easily over curl with a personal token is a great user experience. This also enables us to add to the dashboard/console UX a curl command for each user flow in the UI. Google cloud does this and it is a widely loved feature.

Concepts and Resources

There are some overall elements that define resources:

API resources are objects (usually pieces of infrastructure) that users can create, use, manage, and delete.

UUIDs. Every resource has a uuid identifier that’s unique for all time across all Oxide racks.

Naming. Every resource also gets an alphanumeric name that’s unique within its scope (e.g., no two resources of the same type in the same project can have the same name).

The API is intended to support an arbitrary number of all types of resources. Pagination is the first part of this, but this can also affect the way resources are addressed (e.g., by id or name). For example, if somebody has to list all of the resources to find the one they want to modify (because they need to find some uuid that corresponds to the name they already know), that’s not very scalable even if the list operation is scalable. We should allow resources to be called with GET by either their name and project or uuid to avoid this. This works since names are unique to the scope of their project.

Therefore, the endpoints that reference an individual resource work on either the uuid or the name.

Descriptions. Every resource has an optional description that can be defined by the user.

Pagination. Every API endpoint that’s used to list resources supports pagination using a limit and marker pattern. The resources are returned in a sorted order based on some field (more on this below) and the marker is generally a particular value of this field.

The endpoints to list resources can sort and paginate using either the uuid or the name. Some endpoints may also support filtering or sorting and pagination using other fields.

Asynchronous vs. Synchronous. Some resources are expensive to create. For these, the "Create" endpoint is asynchronous and returns an operation identifier (operationId). See operations for more information. Other resources are cheap to create and directly return the created resource, for example projects and user’s ssh keys.

Quotas. Although the API supports an arbitrary number of objects, operators or the system itself may place limits on the number of resources that users can create. This is covered by quotas in [rfd52].

There are a few other RFDs covering API resources and there will be more in the future. These default elements apply to all those resources as well as any future resources.

Other API resource RFDs are:

  • [rfd21] covers the networking API components.

  • [rfd43] defines permissions and authorization.

  • [rfd44] covers users, teams, organizations, billing accounts, etc.

  • [rfd45] covers the system level API for operators.

  • [rfd52] covers quota policies.

  • [rfd56] covers the billing API.

Tags

All resources have the concept of tags. Tags are strings tied to a resource such as production or development. All endpoints to list resources can be filtered based off tags if denoted by the user.

It has been noted that tags are often misused on other cloud providers for storing searchable metadata for instances. This is super true, consumers of cloud APIs tend to use automation and populate tags with whatever metadata they have. And usually this tends to not be pretty and some UUID-like strings.

In an effort to keep tags to a useful human readable and consumable medium we will make tags a customizable resource in the following ways:

  • Tags can have a color, users can customize the way the tags appear in the UI with a color. This makes the user defined tags nice to distinguish from the machine defined tags.

  • Tags can be denoted as "machine-consumable". A tag that is machine consumable means that it is being used in other software and not a tag that needs priority in the user interface. For example a machine-consumable tag would not need to be shown on the instances list page, but should be shown on an individual instance page and classified there as a machine-consumable tag.

The reasons behind this design is as follows. Regardless of what we do for tags, folks will still abuse them the way they are doing so on various cloud providers today. So why not instead give them the functionality to distinguish between the tags that are generated and read by machines and those that are created and used by humans.

GCP has a concept of labels and tags, one that is hard to distinguish from as to what should be used for what. I think our approach will be less complex while allowing for folks to continue to use the same patterns they do today.

You need the tags IAM permission as covered in [rfd43] to interact with tags on resources. Since tags are how Firewall Rules can be defined, this ensures only folks with correct permissions on tags can perform actions.

Projects

One abstraction that has been beloved by users is Microsoft’s concept of a resource group. Resource groups hold compute resources. For example, inside a resource group you can have several static IPs, compute instances, and any other resource in Azure. What is nice about this is when you delete a resource group it deletes all the resources in that group. It is a "directory" for resources. In the case of Azure, deletes could perform faster, but the overall design of this grouping is quite nice for testing and organizing resources for your projects.

Other cloud providers do not have this same layer of abstraction. Google Cloud has projects, but GCP projects take up to two weeks to fully delete. This does not allow for fast iterations and grouping resources for quick testing only to delete them all minutes later. Deleting multiple resources quickly on any other cloud provider is quite hard. Just the act of grouping resources together is not really possible on other cloud providers without tags and definitely not designed for that.

Behavior. For the Oxide API, we can have a similar behavior to resource groups in the form of "Projects":

  • Users can create as many Projects as they want and create compute resources in each Project.

  • Compute resources must live in a Project, they can not fly solo. That being said a "default" project is created for every new user. The project can contain a walk-through with tool-tips to help the user go through creating their first resources, etc. The default project can be deleted, since users might want to clean it up after they have learned.

  • Compute resources can not live in more than one Project.

  • Projects are isolated from each other in the same way different tenants are isolated from one another.

  • Access control allows users to delegate which users or teams fro [rfd44] can read or modify the resources in a Project.

  • Projects can be deleted while instances are running. In the UI, the user will be warned and have to confirm about this when deleting a project. We will also make this very explicit in the documentation as well. (Might be nice in the future to add a "hold" for a project, meaning another option on a project that denies it from being deleted unless the hold is first removed from the project.)

  • Projects cannot be nested. This will overly complicate things and confuse the user if we do this right out of the gate.

  • The project name needs to be unique to that Organization.

New resources are created in a project by nested API commands under the parent /projects/{projectId} to denote that the new resource lives in that project.

Can resources be moved from one project to another?

Microsoft does allow for this behavior with resource groups and Google does not allow for this with projects. Let’s add this behavior so if users accidentally spin up resources in the wrong project they can move them. We will want to warn the user if there are infrastructure implications to moving. Likely we will only have to move the reference to the resource not the actual resource. For instance, hopefully a disk does not have to be moved but instead we point the database to the same physical location while "moving it to a different project."

This should happen automatically but in the UI the user should be warned and have to confirm the move. The move should return an error if moving a resource is not possible due to the infrastructure implications of the move, such as moving a network would be incompatible with the other projects network. We should automatically move all dependencies as well, for instance if a VM has disks attached, of course the user should have to confirm to verify the implications of the move, but automatically it should show the user that all the dependencies will be moved as well, and then do the move after confirmation.

Security. "Projects" are secure by default. You cannot access resources in one project from another project. All resources in a project are not exposed on the public internet or have any ports exposed publicly by default.

Networking. Each project has it’s own default network and subnetwork. Instances provisioned in the same project can communicate over the local internal network. Instances in different projects have no network access to each other. Depending on the behavior a user expects, they should either place instances in the same project or different projects.

If a network exists in project A, can an instance in project B be attached to network in project A? This seems viable if a database team runs a database in a specific project but other teams need to be consumers of said database. In that case, and other similar situations, the teams should expose the endpoint (the specific IP[s] and port[s]) to the resource that needs to be accessed by other projects. This can be done explicitly with Firewall Rules for the project. This is covered in [rfd21].

For more information on the thought process behind networking see [rfd9]. Also, [rfd21] covers the networking API components.

The following API endpoints relate to Projects:

GET /projects - List all projects
POST /projects - Create a new project
GET /projects/{projectId} - Get a project by ID
PATCH /projects/{projectId} - Update a project by ID
DELETE /projects/{projectId} - Delete a project by ID

The Project data type will look like the following OpenAPI schema reference:

Project:
type: object
required:
- name
properties:
$ref: '#/components/schemas/IdentityMetadata'
billingAccountId:
description: |
The ID for the billing account for this project.
type: string
format: uuid
metrics:
type: object
readOnly: true
description: |
Metrics on the project. Only returned if an extra parameter
is sent with the GET API request for metrics.
properties:
cpu:
type: object
properties:
reservedCores:
description: |
Sum of all the number of cores reserved in the project.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
usageTime:
description: |
CPU usage for all cores, in seconds.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
utilization:
description: |
The fraction of the allocated CPU that is currently in use
on all the instances in the project.
This value is reported by the hypervisor for the VM and can
differ from utilization, which is reported from inside the
VM. Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
usage:
description: |
CPU usage in megacycles over all instances.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: int64
memory:
type: object
properties:
usage:
description: |
Total memory used by running instances in the project,
in MiB.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
reserved:
description: |
Sum of all the memory reserved for all the instances in the
project, in MiB.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: int64

Instances

An instance is a virtual machine to be used for compute. Each instance must live inside a project. When a project gets deleted the instance is deleted as well.

SSH Keys. A thing that GCP does nicely is automatically add users' ssh keys to the instances. Oxide will do this as well. For every user that has access to an instance in a project, Oxide will add their public ssh keys from their user account to the instance. When ssh keys are modified in their user account or a user/team is removed from accessing a project, the instances will automatically update the allowed ssh keys. If a user updates their SSHKeys and we cannot update them in the instance because it is out of disk space or something else we should place an error on the instance so they can see via the UI and API (if running a GET on the instance) that there is an error on their instance.

For instances, like windows that do not use API Keys, a password for the instance will be used instead.

Networking. Each Project has its own default network and subnetwork. Therefore, instances in the same project can communicate via their private IP addresses or hostname via DNS. DNS inside the project is provided by internal DNS to preserve the hard isolation between projects, therefore it cannot be accessed from outside the project. If you are looking for strong network isolation between instances, put them into different "Projects".

[rfd21] covers the networking API components.

Service class. An instance can be in the "standard" service class or "spot" service class. The default is "standard". All instances are required to have a particular service class. Operators will have controls for the target fraction of utilization for each service class so that they can express policies like "don’t use more than 50% of the system for standard instances; don’t let the system fill up to more than 80%". Instances in the "spot" service class run under the assumption that they might disappear quickly without notice. "Spot" instances will also be billed differently, refer to [rfd56] for billing.

The following API endpoints relate to Instances:

GET /projects/{projectId}/instances - List all instances in a project
POST /projects/{projectId}/instances - Create a new instance in a project
GET /projects/{projectId}/instances/{instanceId} - Get an instance by ID
PATCH /projects/{projectId}/instances/{instanceId} - Update an instance by ID
DELETE /projects/{projectId}/instances/{instanceId} - Delete an instance by ID

# Disks
PUT /projects/{projectId}/instances/{instanceId}/disks - Attach a disk to an instance
DELETE /projects/{projectId}/instances/{instanceId}/disks/{diskId} - Detach a disk from an instance

# Utilities
PUT /projects/{projectId}/instances/{instanceId}/reboot - Reboot an instance by ID
PUT /projects/{projectId}/instances/{instanceId}/start - Start an instance by ID
PUT /projects/{projectId}/instances/{instanceId}/stop - Stop an instance by ID

# Tags
PUT /projects/{projectId}/instances/{instanceId}/tags/{tagName} - Add a tag to an instance
DELETE /projects/{projectId}/instances/{instanceId}/tags/{tagName} - Remove a tag from an instance

The Instance data type will look like the following OpenAPI schema reference:

Instance:
type: object
required:
- name
- image
- bootDiskSize
properties:
$ref: '#/components/schemas/IdentityMetadata'
projectId:
type: string
format: uuid
readOnly: true
type:
description: |
Type of the instance. Either the type of the instance is required
or CPUs and memory need to be explicitly set.
type: string
password:
description: |
ONLY applicable to instances such as Windows that do not use
SSH Keys for authentication.
type: string
ncpus:
description: |
The number of CPUs for this instance. Can be 1 or an even
number up to 32 (2, 4, 6, ... 24, etc).
type: int64
memory:
description: |
The total memory for this instance. Memory must be a multiple of
256 MB and must be supplied in MB (e.g. 5 GB of memory is 5120 MB).
type: int64
image:
description: |
The image used to create the instance. This can be any of the
following:
{name} (ex. debian) - returns the latest version of debian
{name}:{version} (ex. debian:sid) - returns the latest version of debian sid
{name}:{digest} - returns a specific image which is denoted by the content addressable digest of that image
type: string
serviceClass:
description: |
The service class for the instance. This can be either "standard" or
"spot". The default is "standard". "Spot" instances run under the expectation
they might disappear at any given time.
type: string
enum:
- standard
- spot
bootDiskSize:
description: The size of the boot disk for the image, in GiB.
type: integer
format: int64
hostname:
description: |
Specifies the hostname of the instance. The specified hostname
must be RFC1035 compliant. If hostname is not specified, the
default hostname is
[INSTANCENAME].instances.[PROJECTNAME].internal.oxide.
type: string
format: hostname
status:
description: Status of the instance.
readOnly: true
type: string
enum:
- starting
- running
- stopping
- stopped
- repairing
- failed
disks:
description: Disks attached to the instance.
readOnly: true
type: array
items:
$ref: '#/components/schemas/Disks'
network:
type: object
properties:
publicIPAddress:
description: |
The public IP address for the instance. This is given to the
VM by the control plane depending on how the setup of the Oxide
rack.
readOnly: true
type: string
interfaces:
description: |
An array of network configurations for this instance. These
specify how interfaces are configured to interact with other
network services, such as connecting to the internet.
Multiple interfaces are supported per instance. If none are
given at creation time, the default subnetwork is added.
type: array
items:
$ref: '#/components/schemas/NetworkInterface'
allowIpForwarding:
description: |
Allows this instance to send and receive packets with non-matching
destination or source IPs. This is required if you plan to use this
instance to forward routes. Defaults to false.
type: bool
tags:
description: |
Tags for the instance. A tag must be 1-63 characters long, and
comply with RFC1035. Specifically, a tag must be 1-63 characters
long and match the regular expression [a-z]([-a-z0-9]*[a-z0-9])?
which means the first character must be a lowercase letter, and all
following characters must be a dash, lowercase letter, or digit,
except the last character, which cannot be a dash.
type: array
items:
type: string
metrics:
type: object
readOnly: true
description: |
Metrics on the compute instance. Only returned if an extra paramter
is sent with the GET API request for metrics. These are cumulative
since the time the instance was created.
properties:
uptime:
description: |
How long the VM has been running, in seconds. Sampled every 60
seconds. After sampling, data is not visible for up to 240
seconds.
type: double
integrity:
type: object
properties:
earlyBootValidationStatus:
description: |
The validation status of early boot integrity policy.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
Early boot is the boot sequence from the start of the UEFI
firmware until it passes control to the bootloader.
type: string
enum: ["passed", "failed", "unknown"]
lateBootValidationStatus:
description: |
The validation status of late boot integrity policy.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
Late boot is the boot sequence from the bootloader until
completion. This includes the loading of the operating
system kernel.
type: string
enum: ["passed", "failed", "unknown"]
firewall:
type: object
properties:
droppedBytesCount:
description: |
Count of incoming bytes dropped by the firewall.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: int64
droppedPacketsCount:
description: |
Count of incoming packets dropped by the firewall.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: int64
cpu:
type: object
properties:
reservedCores:
description: |
Number of cores reserved on the host of the instance.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
usageTime:
description: |
CPU usage for all cores, in seconds. To compute the
per-core CPU utilization fraction, divide this value by
(end-start)*N, where end and start define this value's time
interval and N is reservedCores at the end of the interval.
This value is reported by the hypervisor for the VM and can
differ from usageTime, which is reported from inside the VM.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
utilization:
description: |
The fraction of the allocated CPU that is currently in use
on the instance. This value can be greater than 1.0 on some
machine types that allow bursting. This value is reported
by the hypervisor for the VM and can differ from utilization,
which is reported from inside the VM. Sampled every 60
seconds. After sampling, data is not visible for up to
240 seconds.
type: double
memory:
type: object
properties:
usage:
description: |
The memory being used on the instance, in MiB.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: double
network:
type: object
properties:
receivedBytesCount:
description: |
Count of bytes received from the network. Sampled
every 60 seconds. After sampling, data is not visible for
up to 240 seconds.
type: int64
receivedPacketsCount:
description: |
Count of packets received from the network.
Sampled every 60 seconds. After sampling, data is not
visible for up to 240 seconds.
type: int64
sentBytesCount:
description: |
Count of bytes sent over the network. Sampled every
60 seconds. After sampling, data is not visible for up to
240 seconds.
type: int64
sentPacketsCount:
description: |
Count of packets sent over the network. Sampled every
60 seconds. After sampling, data is not visible for up to
240 seconds.
type: int64

Networking

For more information on the thought process behind networking see [rfd9].

[rfd21] covers the networking API components.

Disks

Disks represent storage resources. Disks can be attached to instances, detached from instances, and then reattached to a different instance. You can only have a disk attached to one instance at a time. Disks can only be attached to instances in the same project as the disk.

When attached to an instance, disks get mounted at /mnt/disks/{disk-name}.

Disks are seeded from a snapshot. At the time the disk is created, the relationship between the original snapshot and the disk is gone (from the perspective of the user). Therefore, you cannot rollback a disk to an earlier snapshot. This is the behavior of the various cloud providers as well. If a user wants to restore a disk to an earlier snapshot they should create a new disk with the snapshotId of the data they wish to restore.

The following API endpoints relate to Disks:

GET /projects/{projectId}/disks - List all disks in a project
POST /projects/{projectId}/disks - Create a new disk in a project
GET /projects/{projectId}/disks/{diskId} - Get a disk by ID
PATCH /projects/{projectId}/disks/{diskId} - Update a disk by ID
DELETE /projects/{projectId}/disks/{diskId} - Delete a disk by ID

The Disk data type will look like the following OpenAPI schema reference:

Disk:
type: object
required:
- name
- size
- type
properties:
$ref: '#/components/schemas/IdentityMetadata'
projectId:
description: The project the disk belongs to.
type: string
format: uuid
readOnly: true
snapshotId:
description: |
The snapshot ID from which the disk is created. If empty, a blank
disk is created.
type: string
format: uuid
size:
description: Size of the disk, in GiB.
type: integer
format: int64
type:
description: The type of disk, currently only default.
type: string
enum: [default]
mode:
description: |
The mode in which to attach this disk, either read_write or
read_only. If not specified, the default is to attach the disk in
read_write mode.
enum:
- read_write
- read_only
status:
description: Status of the disk.
readOnly: true
type: string
enum:
- creating
- deleting
- failed
- attaching
- attached
- detaching
- detached
device:
description: The device name for the disk if attached to an instance.
readOnly: true
type: string
example: /mnt/disks/disk-name
timeAttached:
description: The time the disk was last attached.
type: string
format: date-time
readOnly: true
timeDetached:
description: The time the disk was last detached.
type: string
format: date-time
readOnly: true
metrics:
type: object
readOnly: true
description: |
Metrics on the disk. Only returned if an extra paramter
is sent with the GET API request for metrics. These are cumulative
since the time the disk was created.
properties:
readBytesCount:
description: |
Count of bytes read from disk. Sampled every 60
seconds. After sampling, data is not visible for up to
240 seconds.
type: int64
readOpsCount:
description: |
Count of disk read IO operations. Sampled every 60
seconds. After sampling, data is not visible for up to
240 seconds.
type: int64
throttledReadBytesCount:
description: |
Count of bytes in throttled read operations. Sampled
every 60 seconds. After sampling, data is not visible for
up to 240 seconds.
type: int64
throttledReadOpsCount:
description: |
Count of throttled read operations. Sampled every 60
seconds. After sampling, data is not visible for up to 240
seconds.
type: int64
throttledWriteBytesCount:
description: |
Count of bytes in throttled write operations. Sampled
every 60 seconds. After sampling, data is not visible for
up to 240 seconds.
type: int64
throttledWriteOpsCount:
description: |
Count of throttled write operations. Sampled every 60
seconds. After sampling, data is not visible for up to 240
seconds.
type: int64
writeBytesCount:
description: |
Count of bytes written to disk. Sampled every 60
seconds. After sampling, data is not visible for up to
240 seconds.
type: int64
writeOpsCount:
description: |
Count of disk write IO operations. Sampled every 60
seconds. After sampling, data is not visible for up to 240
seconds.
type: int64
Disk Life Cycle

As stated above a disk can be in a variety of different states denoted by its status.

A disk can have the following states:

  • creating - Resources are being allocated for the disk. The disk is not available yet.

  • deleting - The disk is being deleted because a user has requested it.

  • failed - The disk has failed. This can happen because the disk encountered an internal error. During this time, the disk is unusable. The disk can be deleted in this state as long as the failure was not with the delete. A longer, more human error message will be returned to the user that accounts for the failure.

  • attaching - The disk is attaching to an instance.

  • attached - The disk has been attached to an instance.

  • detaching - The disk is detaching from an instance.

  • detached - The disk is not attached to an instance and is ready to be attached to an instance if requested by a user. This is the default state of the disk post-creation.

Snapshots

Snapshots represent a snapshot of a Disk. Snapshots can be used for backups, to make copies of disks, and to save data before shutting down an instance.

When a snapshot is created from an active disk it is undefined what the state of the system is while the disk is attached to an active, running system. This is something to keep in mind during the creation of a snapshot. The quality of the snapshot depends on the ability of your apps to recover from snapshots that you create during heavy write workloads.

Best Practices for Snapshotting a Disk

When we refer to "data loss", we refer to the fact that changes from the final few seconds before the snapshot may not be reflected after recovery. There are a few best practices to avoiding this.

You can create a snapshot of a persistent disk even while your apps write data to the disk. However, you can prevent data loss in a snapshot if you flush the disk buffers and sync your file system before you create a snapshot. Pause apps or operating system processes that write data to that persistent disk. Then flush the disk buffers before you create the snapshot.

An alternative option is to freeze or unmount the filesystem before you take a snapshot. This is the most reliable way to ensure that your disk buffers are cleared, but it is more time consuming and not as convenient as simply flushing the disk buffers. Unmount the persistent disk completely to ensure that no data is written to it while you create the snapshot. This is usually unnecessary, but it does make data loss less of a risk with the snapshot. Make sure to remember to remount the disk after the snapshot is complete.

To create a disk from a snapshot, refer to the Disk endpoints.

NOTE: not reflected below in the API spec but a feature that comes with snapshots is the ability to schedule snapshots to be done on a fixed interval, much like a cron job.

The following API endpoints relate to Snapshots:

GET /projects/{projectId}/snapshots - List all snapshots in a project
POST /projects/{projectId}/disks/{diskId}/snapshot - Create a new snapshot of a disk
GET /projects/{projectId}/snapshots/{snapshotId} - Get a snapshot by ID
PATCH /projects/{projectId}/snapshots/{snapshotId} - Update a snapshot by ID
DELETE /projects/{projectId}/snapshots/{snapshotId} - Delete a snapshot by ID

The Snapshot data type will look like the following OpenAPI schema reference:

Snapshot:
type: object
required:
- name
properties:
$ref: '#/components/schemas/IdentityMetadata'
projectId:
description: The project the snapshot belongs to.
type: string
format: uuid
readOnly: true
diskId:
description: The ID of the disk the snapshot is for.
type: string
format: uuid
readOnly: true
progress:
description: Progress of the snapshot, as a percentage.
type: integer
format: int64
readOnly: true
example: 80
status:
description: Status of the snapshot.
readOnly: true
type: string
enum:
- creating
- deleting
- failed
- ready
- uploading
size:
description: The size of the snapshot, in GiB.
type: number
format: double
readOnly: true
example: 15.54

Images

Images are operating system images used to create boot disks for your VM instances. There are global images not confined to a project. You can think of these as defaults like Ubuntu, Debian, Windows, and other popular images.

Oxide will provide a global set of popular operating system images but you can create your own custom operating system images as well.

Images require both a name and a version. This is because there can be multiple images in the family "debian" but various versions of the image. Since, the name debian:sid is named after the version of the image, where sid is the version.

You can think of the format much like a Docker image where you can use debian:latest which returns the latest version of debian. debian:sid which returns the latest version of debian sid. And debian@{digest} which returns a specific version of an image at the content addressable digest passed. When creating a disk from an image, you can use any of the above.

Images can also be tied to a project if the image should not be a global image.

Semantic Versioning of Images

We should make it easy for users to do semantic versioning for their images. If we can parse the version of the image as a semantic version we should display it nicely in the UI.

The following API endpoints relate to Images:

# Global Images API
GET /images - List all images
POST /images - Create a new image
GET /images/{imageId} - Get an image by ID
PATCH /images/{imageId} - Update an image by ID
DELETE /images/{imageId} - Delete an image by ID

# Project Confined Images API
GET /project/{projectId}/images - List all images in a project
POST /project/{projectId}/images - Create a new image in a project
GET /project/{projectId}/images/{imageId} - Get an image by ID
PATCH /project/{projectId}/images/{imageId} - Update an image by ID
DELETE /project/{projectId}/images/{imageId} - Delete an image by ID

The Image data type will look like the following OpenAPI schema reference:

Image:
type: object
required:
- name
- version
- tarball
properties:
$ref: '#/components/schemas/IdentityMetadata'
projectId:
description: The project the disk belongs to, if not a global image.
type: string
format: uuid
readOnly: true
version:
description: |
Version of the image. This can be a tag name like sid or a longer
form sha hash of the image. Using a tag always returns the latest
version of that tag, where as using a sha hash returns an exact
version of an image that cannot be modified.
The name must be 1-63 characters long, and comply with
RFC1035. Specifically, the name must be 1-63 characters long and
match the regular expression [a-z]([-a-z0-9]*[a-z0-9])? which means
the first character must be a lowercase letter, and all following
characters must be a dash, lowercase letter, or digit, except the
last character, which cannot be a dash.
type: string
example: sid
digest:
description: |
The content addressable hash of the image that was uploaded. This
is computed server side after the image has been uploaded. The user
can then create a disk by setting the image name as
{name}@{digest}.
type: string
readOnly: true
example: |
sha256:47bfdb88c3ae13e488167607973b7688f69d9e8c142c2045af343ec199649c09
tarball:
description: Link to a network-reachable tarball containing the image.
type: string
size:
description: Size of the image when restored onto a disk, in GiB.
type: integer
format: int64
status:
description: Status of the image.
type: string
enum:
- deleting
- failed
- pending
- ready

External References