RFD 538
Webhook API
RFD
538
Updated

Webhooks provide a mechanism for the Oxide rack to notify external systems when events occur. In particular, notifications for fault management alerts ([rfd307]) are delivered via webhooks. This RFD describes the API used by external systems to subscribe to webhooks from the Oxide control plane, and how webhooks are delivered to receivers.

In addition, Appendex A contains guidance on how to use the mechanisms provided by the Oxide webhook API to implement a fault-tolerant, reliable webhook receiver.

The design of the internal implementation of the webhook system is described in RFD 364 Webhook Implementation.

Overview

Webhook notifications represent events that occur in the Oxide rack. Events include alerts generated by the rack’s fault management subsystem or by user-defined alerting rules, as well as audit log events for user and operator actions. When an event occurs, notifications are delivered as HTTP POST requests sent to a particular URL. An HTTP endpoint to which these requests are sent, and configuration associated with it, is referred to as a webhook receiver. Finally, the internal component of the Oxide control plane that is responsible for delivering webhook notifications to receivers is referred to as the webhook dispatcher.

Events

Events are the central concept of the webhook system’s data model. The data payloads delivered to receivers represent events recorded by the rack’s control plane. Receivers subscribe to events by specifying the event classes they wish to be notified of. The same event may result in notifications being sent to multiple receivers.

See here for a description of the format of webhook delivery requests.

Event Classes

Events are categorized into event classes. These classes indicate the resource scope that an event relates to, and what event occurred. An event class is a string consisting of one or more segments, with multiple separated by . characters. Each segment describes the category of the event with increasing specificity, with the first segment being the most general. These classes form a hierarchy of event types. Webhook receivers subscribe to one or more event classes to determine what events they will be notified for.

Top-level categories in the event class hierarchy will represent broad categories of entities, followed by increasingly specific subcategories.

Globbing

A receiver’s event subscriptions may include simple globs to subscribe to multiple categories of events. Globbing is performed on a per-segment basis. The "*" character in an event class will match any single segment at that position in the event class string, while "**" will match any number of segments.

For example, a webhook that subscribes to "instance.*" will receive notifications for "instance.create", "instance.delete", "instance.start", and any other event class that begins with instance. Similarly, a webhook that subscribes to "**.delete" will receive notifications for "project.delete", "instance.delete", and any other event class that ends with delete.

To list all event classes currently known to the system, use the GET /webhook-events/classes API endpoint. To view metadata about a particular event class, use the GET /webhook-events/classes/{class_name} API endpoint.

Event UUIDs

Each event that occurs within the system is uniquely identified by a UUID ([RFC9562]). These UUIDs are included in the event payload sent to webhook receivers. If an event results in notifications being delivered to multiple webhook receivers, the event UUID sent to each receiver will be the same. This may be used to correlate events across multiple receivers, and to de-duplicate repeated deliveries of the same event.

Important

Two webhook notifications with the same event UUID refer to the same event, rather than two distinct events with equivalent data. On the other hand, two webhook notifications with different event UUIDs refer to two distinct events, even if other data in the event payload is the same.

Webhook Receivers

In order to receive webhook notifications, a receiver endpoint must be registered with the Oxide control plane. Multiple receivers, with different URLs, may be registered, and each receiver may be subscribed to different sets of events.

See here for the API endpoints used to create, update, view, and delete webhook receiver configurations.

For guidance on steps receiver endpoint implementations should take to ensure webhook notifications are delivered reliably, see appendix A.

Creating a webhook receiver requires a name, the URL of the receiver endpoint, and a list of one or more secrets used to sign the payloads sent to the receiver. In order to receive notifications when events occur, a receiver must also subscribe to one or more event classes, as discussed in the previous section. Event class subscriptions are specified using the globbing syntax described above. Event subscriptions may be specified when the receiver is created, or can be added to an existing receiver using the webhook configuration update endpoint.

Secrets

Webhook payloads are signed with a HMAC digest ([RFC2104]) using secret keys shared between the webhook receiver and the Oxide control plane to allow the receiver to verify the authenticity of the payload. In a creation request, a user submits a list of secrets they would like to have the webhook dispatcher use for signing payloads. Each secret will be assigned an ID generated by the system that is be unique within the webhook it belongs to.

Secrets cannot be changed once created, and can only be deleted. To rotate a secret, a user first creates a new secret, verifies that their application can validate the new signature, and then deletes the old secret. When a secret is deleted, the value is be erased and the secret is marked as deleted.

While users can ignore signature verification in their receivers, it is required that a webhook configuration always has at least one secret.

See here for the API endpoints used to create, view, and manage the secrets associated with a webhook receiver.

Identity and Access Management

For a webhook receiver to receive an event, the user registering the receiver must have the viewer role for the resource the event relates to. Fault management alerts and other hardware events require the fleet.viewer role.

Currently, webhook receivers may only be created or modified by users with the fleet.admin role, and all webhook receivers operate with the fleet.viewer role. Webhook receivers with more restrictive access control roles may be implemented in future Oxide system software, as described here.

Webhook Delivery

Webhook events are delivered to receiver endpoints as a HTTP POST request with a JSON body containing the event payload. The component of the Oxide control plane responsible for delivering webhook events is referred to as the webhook dispatcher.

The webhook dispatcher attempts to provide at-least-once reliable delivery to live receivers. This means that the control plane will retry unsuccessful delivery attempts in order to ensure that the notification is observed by the receiver. In some cases, this means that a receiver may be sent a notification for the same event multiple times. Therefore, webhook receiver implementations should use the event UUIDs included in the webhook payload to de-duplicate events.

Important

Receiver implementations must treat two webhook payloads with the same event UUID as representing the same event. Two payloads with different event UUIDs should always be treated as representing distinct events.

See here for the API endpoints used to list the status of delivery attempts for a webhook, and to trigger re-delivery of an event.

Delivery Requests

All webhook delivery requests are structured as described in this section.

Request Headers

Webhook delivery requests always include the following headers:

Request Headers
NameValue

content-type

application/json

x-oxide-delivery-id

A UUID that uniquely identifies this delivery attempt for the event.

x-oxide-webhook-id

The UUID that identifies the webhook receiver to which the event was delivered.

x-oxide-event-class

The event class string of the event.

x-oxide-event-id

The event UUID of the event.

x-oxide-signature

A HMAC signature of the payload for each H each secret key assigned to this webhook receiver. See the next section for details on the format of this header.

Signature Header Format

For each secret key assigned to a webhook receiver, an x-oxide-signature header is added with the HMAC digest of the payload signed with that secret key. The values of this header includes the algorithm used to generate the HMAC digest, the UUID of the secret key, and the value of the signature for that secret key. This data is encoded in the following format:

x-oxide-signature: a={algorithm}&id={secret-id}&s={signature}

Receivers may parse this format to extract the algorithm, secret ID, and signature from each header value.

For example, if a receiver has two secrets, with IDs c06487fd-6636-435a-8ade-915b3c3b7ff5 and 72c31177-fb4d-48ad-9a33-714e02f8ab59, and the signatures for both secrets are generated using the SHA256 algorithm, the following headers would be sent:

x-oxide-signature: a=sha256&id=c06487fd-6636-435a-8ade-915b3c3b7ff5&s=d4da173d6473e40e7e0c38b9feb4dc101b362b55e8a2de936e850b8ed307badc
x-oxide-signature: a=sha256&id=72c31177-fb4d-48ad-9a33-714e02f8ab59&s=b437be7187eb33846c7839ab77e9c3ee03c0f4eee0f19aca531a7f6703f63954

Currently, only the SHA256 algorithm is supported.

Request Body

All webhook delivery requests include a JSON body that describes the event. This body always contains the following fields:

Event Object
NameTypeDescription

event_class

String

The event class of this event.

event_id

UUID

The event UUID that uniquely identifies this event.

version

u64

The version of the event payload schema. If backwards-incompatible changes are made to the structure of event payloads, this version number is incremented.

data

Object

Data payload describing the event. The schema of the data object is specific to the event class.

delivery

Delivery

An object containing metadata about this event delivery attempt.

Delivery Object
NameTypeDescription

id

UUID

A unique identifier for this delivery attempt

webhook_id

UUID

The UUID of the webhook receiver.

sent_at

String

The timestamp at which the delivery request was sent.

Example delivery request
POST https://company.example HTTP/1.1
content-type: application/json
x-oxide-delivery-id: ea94de84-d0b2-4c6a-b2e9-9dde536bc79a
x-oxide-webhook-id: 8a630640-c5cf-441c-9bdd-163f0323bc29
x-oxide-timestamp: 1672531200
x-oxide-event-class: project.create
x-oxide-event-id: 202b22e5-9d4c-4843-a8ec-a20d501d9e4a
x-oxide-signature: a=sha256&id=c06487fd-6636-435a-8ade-915b3c3b7ff5&s=d4da173d6473e40e7e0c38b9feb4dc101b362b55e8a2de936e850b8ed307badc
x-oxide-signature: a=sha256&id=72c31177-fb4d-48ad-9a33-714e02f8ab59&s=b437be7187eb33846c7839ab77e9c3ee03c0f4eee0f19aca531a7f6703f63954
{
"event_class": "project.create",
"event_id": "202b22e5-9d4c-4843-a8ec-a20d501d9e4a",
"version": 1,
"data": {
...
},
"delivery": {
"id": "ea94de84-d0b2-4c6a-b2e9-9dde536bc79a",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"sent_at": "2023-01-01T00:00:00Z",
"trigger": "event",
}
}
1The event class that this payload represents.
2The event UUID of the event that generated this webhook notification. This UUID is unique to the event, and can be used to de-duplicate multiple deliveries of the same event.
3The version of the event payload schema. At present, this will always be 1.
4The data payload of the event.
5Metadata describing this delivery attempt for the event.

Failures and Delivery Retry

If a webhook notification is not successfully delivered, the dispatcher will attempt to deliver it again, in order to ensure that the receiver receives the notification.

Success

A successful delivery is one in which the dispatcher receives a response back from the receiver endpoint with a 2xx HTTP status code. Once an event is delivered successfully to a receiver, the webhook dispatcher will not attempt to deliver that event to that receiver again.

Caution

Responding to a webhook delivery request with a 2xx HTTP status code acknowledges that event, and the Oxide control plane will not attempt to deliver that event to that receiver again.

This means that a receiver implementation should not return 200 OK until it has durably recorded the event, such as by writing it to persistent storage, or otherwise ensured that any actions in response to that event will reliably be performed. If a receiver process receives a webhook delivery request and responds with a 200 OK before it has durably recorded the event, and then crashes or otherwise loses the event, the Oxide control plane will not attempt to redeliver that event to that receiver. This means that the notification is lost.

Webhook delivery does not follow HTTP redirects. If a receiver responds with a 3xx HTTP status code, this is considered a failure.

Failure

Delivering a webhook notification can fail for all of the reasons that an HTTPS request to an arbitrary URL may fail. Failures fall into one of three broad categories:

Failure Reasons
  1. Receiver Endpoint Unreachable: A TCP connection to the receiver endpoint could not be opened.

    This may be because the DNS name for the receiver endpoint URL could not be resolved, the connection was refused, or no connection was established within a 10-second timeout.

  2. Response Timeout: A connection was successfully established, but no response was received within a 30-second timeout.

  3. Receiver HTTP Error: A connection was successfully established and a response was received, but the receiver endpoint returned a 3xx, 4xx, or 5xx HTTP status code.

Retries

Each delivery will be attempted up to three times until a successful delivery is achieved. The first retry attempt occurs one minute after a failed delivery attempt. If the retry attempt fails, a second retry is performed five minutes after the first retry. If a webhook notification cannot be delivered successfully after three attempts, the webhook dispatcher considers that delivery to have permanently failed and will not retry delivery of that event to that receiver. In the future, the health of webhook receivers will be managed through FMA (i.e., an unresponsive receiver may trigger a fault).

A new delivery can be triggered using the POST /webhooks/{webhook}/deliveries/{event_id}/resend API endpoint. The new delivery will also be retried up to three times.

Order of Delivery

There is no guarantee about the order in which messages will be sent to a receiver. At any point in time, a dispatcher may be have multiple in progress requests to a receiver. Each request may succeed, fail, and retry independently, resulting in messages appearing out of order in the context of the receiver.

Webhook messages always include a timestamp header (indicating the time at which the message was sent) to allow the receiver to reconcile the order of receipt with the order of sends. This header only provides the order in which the messages were sent, and does not make any claims about the order of the underlying events. Most event payloads will additionally include timestamps in the message body describing the time at which the event was observed.

Liveness Probes

Liveness probes are HTTP requests sent by the webhook dispatcher to a receiver endpoint to determine whether it is available.

These requests do not represent notifications for actual events, and should not be handled as events by the receiver implementation. Instead, probe requests are used to determine whether the receiver endpoint is currently available and could receive an actual event notification if one were to be sent.

Liveness probes have a number of uses:

  • They may be triggered by the receiver itself to indicate that it has become available after an outage, and to resend any missed events. See "resending failed deliveries" for details.

  • They may be triggered by an external system in order to monitor the health of the receiver, as described in "monitoring receiver availability".

  • They may be triggered manually by an operator when testing a receiver implementation.

Liveness probes are particularly valuable when a webhook receiver subscribes to infrequent but high-priority events, such as alerts. When the system is operating normally, no alerts will be generated. If probes are not sent to a receiver endpoint while the rack is operating normally and no faults have been detected, the receiver endpoint could become unavailable at any time without being detected. Should a fault then be detected in the rack, the dispatcher will attempt to deliver an alert to the receiver, but if the receiver is unavailable, the alert will be lost and operators may be unaware of the fault.

Liveness probes are event delivery requests with the "probe" event class. A probe request may be sent to a webhook receiver using the POST /webhooks/{webhook}/probe API endpoint.

Example liveness probe request
POST https://company.example HTTP/1.1
content-type: application/json
x-oxide-delivery-id: 3b04163c-d7e3-4e1a-bcca-48bd49e502f3
x-oxide-webhook-id: 8a630640-c5cf-441c-9bdd-163f0323bc29
x-oxide-timestamp: 1672531200
x-oxide-event-class: probe
x-oxide-event-id: 216cd071-0ec2-4b03-977e-7f1858af294d
x-oxide-sig-sha256-a110b40: d4da173d6473e40e7e0c38b9feb4dc101b362b55e8a2de936e850b8ed307badc
x-oxide-sig-sha256-b5d42b8: b437be7187eb33846c7839ab77e9c3ee03c0f4eee0f19aca531a7f6703f63954
{
"event_class": "probe",
"event_id": "202b22e5-9d4c-4843-a8ec-a20d501d9e4a",
"version": 1,
"data": {},
"delivery": {
"id": "216cd071-0ec2-4b03-977e-7f1858af294d",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"sent_at": "2023-01-01T00:00:00Z",
"trigger": "probe"
}
}
Important

Liveness probe requests are represented as event deliveries to ensure that they are handled by the receiver similarly to actual event notifications. This ensures that probes are routed to the same API endpoint as actual events. It also means that the receiver implementation need not be able to handle a separate JSON body schema, and is only required to parse the JSON body schema of actual event payloads.

However, receiver implementations must take care to distinguish between actual event notifications and probe events. For instance, a receiver that handles alerting should not alert operators when it receives a probe event.

The receiver endpoint must respond with a HTTP 2xx status code to indicate that it is available. If the webhook dispatcher cannot connect to the receiver endpoint, the request times out, or the receiver responds with a 3xx, 4xx, or 5xx status code, the probe is considered to have failed, similarly to the failure criteria for event delivery.

Note

If it is possible for a receiver endpoint to be reachable but unable to process events, the receiver implementation may respond to probe events with a 5xx status code to indicate that it has failed.

This can be used to report conditions where the receiver endpoint being up, but another component, such as a database to which events are written, is down.

If the receiver endpoint responds to a probe request with a 5xx status code, the response body will be included in the alert generated for the receiver failure.

HTTP API Endpoints

Webhook Events
VerbRoutePurpose

GET

/webhook-events/classes

List event classes

GET

/webhook-events/classes/{class_name}

Fetch metadata for an event class

Webhook Receivers
VerbRoutePurpose

GET

/webhooks

List webhook receivers

POST

/webhooks

Create a new webhook

GET

/webhooks/{webhook}

Get the configuration for a webhook

PUT

/webhooks/{webhook}

Update a webhook configuration

DELETE

/webhooks/{webhook}

Delete a webhook configuration

POST

/webhooks/{webhook}/probe

Send a synthesized liveness prove event to a webhook receiver.

Webhook Secrets
VerbRoutePurpose

GET

/webhooks/{webhook}/secrets

List the IDs of a webhook’s secrets

POST

/webhooks/{webhook}/secrets

Add a new secret to a webhook

DELETE

/webhooks/{webhook}/secrets/{secret_id}

Delete a secret from a webhook

Event Delivery Attempts
VerbRoutePurpose

GET

/webhooks/{webhook}/deliveries

List attempted event deliveries for a webhook

POST

/webhooks/{webhook}/deliveries/{event_id}/resend

Request re-delivery of an event

Webhook Events

List event classes (GET /webhook-events/classes)

Query Parameters
NameTypeDescription

limit

uint32

The maximum number of event classes to return for this request.

page_token

String

The next page token returned by a previous result page (if any).

sort_by

String

Supported set of sort modes for scanning by name (either name_ascending or name_descending).

filter

String

A glob pattern to filter the list of returned event classes. If this is provided, only event classes that match the pattern will be returned.

Examples
Paginated request
Request
GET /webhook-events/classes?limit=10&page_token=image.create HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"items": [
"image.delete",
"image.demote",
"image.promote",
"instance.create",
"instance.delete",
"instance.ephemeral-ip.attach",
"instance.ephemeral-ip.detach",
"instance.disks.attach",
"instance.disks.detach",
"instance.fail",
],
"next_page": "instance.reboot"
}
Request with glob pattern
Request
GET /webhook-events/classes?filter=project.* HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"items": [
"project.create",
"project.delete",
"project.update",
],
"next_page": null
}

Fetch event class metadata (GET /webhook-events/classes/{class_name})

Path Parameters
NameTypeDescription

class_name

String

The name of the event class to view

Examples
Request
GET /webhook-events/classes/project.create HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"name": "project.create",
"description": "A project was created",
}

Webhook Receivers

List webhook receivers (GET /webhooks)

This endpoint returns a list of configured webhook receivers.

This list is paginated, and may be ordered either by the receivers' names or by their UUIDs.

Query Parameters
NameTypeDescription

limit

uint32

The maximum number of webhook receivers to return for this request.

page_token

String

The next page token returned by a previous result page (if any).

sort_by

String

Supported set of sort modes for scanning by name or ID (either name_ascending, name_descending, or id_ascending).

Response Body
Response Object
NameTypeDescription

items

[Webhook]

A list of Webhook objects representing the webhook receivers returned by this request

next_page

String`

The name or ID (depending on sorting mode) of the first entry on the next page of responses, if any. This should be passed as the page_token query parameter for a subsequent request. If this is null, all webhook receivers have been listed and there are no more response pages.

Examples
Request
GET /webhook?limit=10 HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"items": [
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"secrets": [
{ "id": "2fa26178-e46a-4c1a-84d5-be2bb89d4c43" }
],
"events": [
"project.create",
"project.destroy",
],
},
{
"id": "81cb3d5d-a270-4b5a-810d-9a91d4ab23e7",
"name": "alert-o-mat",
"description": "Integration with some kind of made-up SaaS alerting thingy",
"endpoint": "https://alertomat.app/example-company/oxide",
"secrets": [
{ "id": "c0bc6bbf-ff7a-44ec-8d3d-ff6af79476c5" },
{ "id": "972afc29-6a76-4708-b393-c5caf57c018b" },
],
"events": [
"**.alert",
],
},
],
"next_page": null
}

Create a webhook (POST /webhooks)

This endpoint accepts a JSON body that describes the configuration for the new webhook receiver.

Request Body

The following fields are required:

Required Fields
NameTypeDescription

name

String

An identifier for this webhook receiver, which must be unique

description

String

Human-readable free-form text describing this webhook receiver.

endpoint

String

The URL that webhook notification requests should be sent to

secrets

[String]

A non-empty list of secret keys that are used to sign payloads

The following fields are optional:

Optional Fields
NameTypeDescription

events

[String]

A list of event classes to subscribe to

Examples
POST /webhooks HTTP/1.1
{
"name": "my-integration",
"endpoint": "https://example.company/integration/oxide",
"description": "My cool webhook receiver",
"secrets": [
"my-secret-key"
],
"events": [
"project.create",
"project.destroy",
],
}
Response
HTTP/1.1 201 Created
{ "id": "8a630640-c5cf-441c-9bdd-163f0323bc29" }

Get the configuration for a webhook (GET webhooks/{webhook})

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to view

Response Body
Response Object
NameTypeDescription

id

UUID

The UUID of the webhook receiver

name

String

An identifier for this webhook receiver provided upon creation

description

String

Human-readable free-form text describing this webhook receiver.

endpoint

String

The URL that webhook notification requests are sent to

secrets

[Secret]

A non-empty list of Secret objects that are used to sign payloads

events

[String]

A list of event classes that this webhook is subscribed to

Secret Object
NameTypeDescription

id

String

The ID that identifies this secret relative to other secrets configured for this webhook receiver

Examples
Request
GET /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29 HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"secrets": [
{ "id": "a110b40" }
],
"events": [
"project.create",
"project.destroy",
],
}

Update a webhook configuration (PUT /webhooks/{webhook})

This endpoint accepts a JSON body that describes the updated configuration for the webhook receiver.

The provided fields will replace the previous configuration set by the webhook creation endpoint or by a previous call to this endpoint. To update some fields while leaving others in place, it is recommended to first use the webhook view endpoint to retrieve the current configuration, modify the desired fields, and then use this endpoint to update any changed configuration fields.

Note

Webhook secrets are not updated using this endpoint.

To add or remove secrets used to sign payloads, use the POST /webhooks/{webhook}/secrets and DELETE /webhooks/{webhook}/secrets/{secret_id} endpoints, respectively.

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to view

Request Body

The following fields are required:

Required Fields
NameTypeDescription

name

String

An identifier for this webhook receiver, which must be unique.

description

String

Human-readable free-form text describing this webhook receiver.

endpoint

String

The URL that webhook notification requests should be sent to

events

[String]

A list of event classes to subscribe to

Response Body

This endpoint returns a Webhook object describing the properties of the webhook receiver after updating it.

Examples
Request
PUT /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29 HTTP/1.1
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"events": [
"project.create",
"project.destroy",
],
}
Response
HTTP/1.1 200 OK
 {
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"secrets": [
{ "id": "2fa26178-e46a-4c1a-84d5-be2bb89d4c43" }
],
"events": [
"project.create",
"project.destroy",
],
},

Delete a webhook configuration (DELETE /webhooks/{webhook})

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to delete

Examples
Request
DELETE /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29 HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29"
}

Send a liveness probe to a receiver (POST /webhooks/{webhook}/probe)

Calling this endpoint will send a liveness probe event to the receiver with the provided name or UUID.

A liveness probe is a synthesized event used to test whether a receiver is capable of receiving events. Liveness probe requests are similar to the delivery requests sent for actual events, but the event class will be probe, and the event payload’s data field is empty.

The HTTP status code of the response is the status code returned by the receiver endpoint. The response body contains a DeliveryAttempt object describing the probe request’s outcome.

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to send a probe request to.

Query Parameters
NameTypeDescription

resend

bool

If true, any events which failed to be delivered to this receiver will be resent if the probe request succeeds.

Response Body
Response Object
NameTypeDescription

probe

DeliveryAttempt

A DeliveryAttempt object describing the outcome of the probe request.

Examples
Request
POST /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/probe HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"probe": {
"id": "f371c094-e389-49b8-9a7a-fd15bfe29709",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "probe",
"event_id": "3aad7c56-d741-4a08-ac0a-02c5c6282757",
"state": "delivered",
"sent_at": "2023-01-01T20:00:00Z",
"trigger": "probe",
"response": {
"status": 200,
"response_time_ms": 500,
},
}
}

Secrets

Get the IDs of the secrets for a webhook (GET /webhooks/{webhook}/secrets)

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to list the secret IDs of

Response Body

NameTypeDescription

secrets

[Secret]

A list of Secret objects containing the IDs of all secrets currently assigned to this webhook receiver.

Request
GET /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/secrets HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"secrets": [
{ "id": "c06487fd-6636-435a-8ade-915b3c3b7ff5" }
]
}

Add a secret to a webhook (POST /webhooks/{webhook}/secrets)

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to add the secret to

Request Body

The following fields are required:

Required Fields
NameTypeDescription

secret

String

The value of the shared secret to add to the webhook receiver.

Response Body

This endpoint returns a Secret object containing the ID of the new secret.

Request
POST /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/secrets HTTP/1.1
{ "secret": "secret-value" }
Response
HTTP/1.1 201 Created
{ "id": "c06487fd-6636-435a-8ade-915b3c3b7ff5" }

Delete a secret from a webhook (DELETE /webhooks/{webhook}/secrets/{secret_id})

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook the secret is associated with

secret_id

Uuid

The ID of the secret to delete

Request
DELETE /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/secrets/c06487fd-6636-435a-8ade-915b3c3b7ff5 HTTP/1.1
(empty body)
Response
HTTP/1.1 200 OK
{ "id": "c06487fd-6636-435a-8ade-915b3c3b7ff5" }

Event Delivery Attempts

List attempted deliveries (GET /webhooks/{webhook}/deliveries)

This endpoint returns a list of delivery attempts sent to a webhook receiver.

In addition to debugging and monitoring purposses, this endpoint can be used by a receiver implementation to check for failed delivery attempts and resend them as needed. See [resending-failed-deliveries] for details.

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhookto view delivery attempts for

Query Parameters
NameTypeDescription

limit

uint32

The maximum number of delivery attempts to return for this request.

page_token

String

The UUID of the last query attempts returned in a previous page of results.

failed

bool

Whether to include failed delivery attempts. If true, delivery attempts in failed states are included in the results, if false, they are excluded. Default: true

pending

bool

Whether to include pending delivery attempts. If true, delivery attempts that are pending are included in the results, if false, they are excluded. Default: true

delivered

bool

Whether to include delivery attempts that were delivered successfully. If true, successful deliveries are included in the results, if false, they are excluded. Default: true

The failed, pending, and delivered query parameters can be combined to filter which delivery attempts are included in the results. For example, the query string ?failed=true&pending=false&delivered=false will return only delivery attempts that have failed, excluding those which are pending or have been delivered successfully.

Response Body
Response Object
NameTypeDescription

items

[DeliveryAttempt]

A list of delivery attempts matching the query.

next_page

Uuid

The UUID of the first delivery attempt on the page of results, if any. If this field is null, there are no more results to fetch.

DeliveryAttempt object
NameTypeDescription

id

Uuid

The UUID of this delivery attempt.

webhook_id

Uuid

The UUID of the webhook receiver to which the event was delivered.

event_class

String

The event class of the event that was delivered.

event_id

Uuid

The event UUID of the event that was delivered.

state

State

A State enum describing the state of this delivery attempt.

sent_at

String

The timestamp at which the HTTP request was sent, or null if the state field is "pending".

trigger

Trigger

A Trigger enum describing why this delivery attempt occurred.

response

Response

A Response object describing the HTTP response from the receiver endpoint, or null if the state field is "pending", "failed_unreachable", or "failed_timeout".

State enum
ValueDescription

"pending"

This delivery attempt has not yet sent a request to the receiver, or is awaiting a response.

"delivered"

A delivery request was sent to the receiver and a successful response was received.

"failed_unreachable"

This delivery attempt failed because a TCP connection to the receiver endpoint could not be established within a 30-second timeout.

"failed_timeout"

This delivery attempt failed because a response to the HTTP request was not received within a 30-second timeout.

"failed_http_error"

This delivery attempt failed because the receiver responded with an HTTP error status code.

Trigger enum
ValueDescription

"event"

This delivery was triggered by a new event being published.

"resend"

This delivery was triggered by a liveness probe with ?resend=true succeeding, or by a call to the POST /webhooks/{webhook}/deliveries/{event_id}/resend API endpoint.

"probe"

This delivery is a liveness probe.

Response object
NameTypeDescription

status

u16

The HTTP status code of the response.

response_time_ms

u64

The time elapsed between when the request was sent and the response was received, in a whole number of milliseconds (ms).

Examples
Request
GET /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/deliveries HTTP/1.1
// (empty body)
Response
HTTP/1.1 200 OK
{
"items": [
{
"id": "a81d3c93-61b6-4f5f-bf21-7c3a659c5654",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "project.create",
"event_id": "4dce8979-a97a-482f-a144-21361994c653" __CALLOUT_PLACEHOLDER_2__
"state": "pending",
"sent_at": null,
"trigger": "event"
"response": null,
},
{
"id": "f371c094-e389-49b8-9a7a-fd15bfe29709",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "project.create",
"event_id": "3aad7c56-d741-4a08-ac0a-02c5c6282757",
"state": "delivered",
"sent_at": "2023-01-01T20:00:00Z",
"trigger": "resend",
"response": {
"status": 200,
"response_time_ms": 500,
},
},
{
"id": "648cecf7-66e3-4791-9fbf-70745c30a350",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "project.create",
"event_id": "3aad7c56-d741-4a08-ac0a-02c5c6282757",
"state": "failed_unreachable",
"sent_at": "2023-01-01T01:00:00Z",
"trigger": "event",
"response": null,
},
{
"id": "e0532ce7-ddce-41eb-b2d8-0562ad677f04",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "instance.delete",
"event_id": "1f34a5ec-94e2-4115-a753-4dbbfc750860",
"sent_at": "2023-01-01T00:00:00Z",
"state": "delivered",
"trigger": "event",
"response": {
"status": 200,
"response_time_ms": 300,
},
}
],
"next_page": null
}
1An UUID uniquely identifying this delivery of the event.
2The UUID of the webhook receiver to which the event is delivered.
3The event UUID of the event that generated this webhook notification. If an event is delivered multiple times, the event ID will be the same for each delivery.
4This webhook event has not yet been sent to the receiver endpoint.
5This delivery was triggered by resending a previous failed delivery for this event.
6This delivery failed because the receiver endpoint was unreachable.
7This was a successful initial delivery attempt.

Request re-delivery of an event (POST /webhooks/{webhook}/deliveries/{event_id}/resend)

This endpoint requests that an event previously dispatched to a webhook receiver be delivered again.

If an event with the provided event UUID has previously been delivered to this webhook receiver, whether successfully or not, the dispatcher will attempt to deliver it again. A new delivery attempt is created, and the UUID of that delivery attempt is returned in the response.

Path Parameters
NameTypeDescription

webhook

Name or UUID

The name or UUID of the webhook to re-deliver the event to

event_id

UUID

The event UUID of the event that should be re-delivered

Examples
Request
POST /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/deliveries/3aad7c56-d741-4a08-ac0a-02c5c6282757/resend HTTP/1.1
// (empty body)
Response
HTTP/1.1 201 Created
{
"delivery_id": "f371c094-e389-49b8-9a7a-fd15bfe2970"
}

Appendix A: Implementing a Reliable Webhook Receiver

When webhooks are used to notify external systems of high-priority events, such as faults, it is important to ensure that event notifications are delivered reliably. Missing a webhook event for an active problem means that an alerting system could fail to generate an alert, resulting in operators not operators being notified of the fault. The Oxide webhook API provides mechanisms to ensure that webhook delivery is as reliable as possible, but receiver endpoint implementations must take care to use these mechanisms correctly. This appendex suggests some steps receiver implementations should take to avoid missing webhook events.

Resending Failed Deliveries

If a receiver endpoint is unavailable, events dispatched to that receiver may not be received. The webhook dispatcher will retry failed delivery attempts up to two times, with a one-minute backoff before the first retry and a five-minute backoff before the second retry. However, if a receiver is unavailable for long enough to miss both retry attempts, the webhook dispatcher will not attempt to deliver that request again unless explicitly asked to by the receiver.

Therefore, it is recommended that receiver endpoints request that any failed deliveries be resent when they start up. This way, if the software implementing a receiver crashes and is restarted, it can trigger a new delivery attempt for any events that it may have missed due to the crash. The simplest way to do so is for the receiver to call the POST /webhooks/{webhook}/probe API endpoint with the ?resend=true query parameter to trigger a liveness probe request to itself. When such a probe request succeeds, the webhook dispatcher will then resend all events which have not been delivered successfully to that receiver.

Alternatively, if more precise control over which events are resent is required, the GET /webhooks/{webhook}/deliveries API endpoint provides a list of event delivery attempts to a webhook receiver. To list only failed delivery attempts, the query parameters ?failed=true&pending=false&delivered=false can be added to requests to this endpoint. The receiver may then request re-delivery of specific events using the POST /webhooks/{webhook}/deliveries/{event_id}/resend API endpoint. This is more complex than triggering probe requests, but it may be useful in situations where the receiver only wishes to resend a subset of failed events. For example, it may only request re-delivery of events that occurred within a particular time window, or it may only resend events that were not processed by a redundant receiver subscribed to the same event classes.

In addition to checking for failed delivery attempts on startup, receiver implementations may periodically attempt to resend failed deliveries while they are running. Even if a receiver endpoint has not crashed, event deliveries may have been missed due to network connectivity problems or other transient issues.

Monitoring Receiver Availability

Liveness probes may be used to monitor the health of a receiver endpoint. An external system which periodically triggers liveness probe requests to a receiver using the POST /webhooks/{webhook}/probe API endpoint can detect situations where the receiver is unavailable and alert operators to the outage. This allows receiver failures to be detected and resolved proactively, reducing the likelihood of missing notifications for important events.

Important
It is strongly recommended that the POST /webhooks/{webhook}/probe API endpoint be used as the primary mechanism for monitoring a receiver endpoint’s availability, rather than a request sent by the monitoring system directly to the receiver.

While sending requests directly to the receiver process can detect failures where the receiver is completely offline, it does not exercise communication between the Oxide control plane and the receiver. Outages may still occur when the receiver endpoint is online and capable of processing requests if it is not reachable by the Oxide rack’s webhook dispatcher. Calling the POST /webhooks/{webhook}/probe API endpoint requests that the webhook dispatcher send a request to the receiver, which will fail if the Oxide control plane cannot communicate with the receiver.

It may be valuable for a monitoring system to both trigger liveness probe requests from the webhook dispatcher using the POST /webhooks/{webhook}/probe endpoint and to send its own probes directly to the webhook receiver. This provides separate signals for whether the receiver endpoint is running at all, and for whether it is reachable by the Oxide control plane. Operators can then reason about whether an outage is due to the webhook receiver process not running at all, or due to a network partition between it and the webhook dispatcher.

Redundant Receiver Endpoints

If a receiver endpoint is unavailable, webhook events may be missed. Therefore, when webhooks are used to provide notifications of critical events such as alerts, operators are encouraged to run multiple independent webhook receiver endpoints subscribed to the same event classes. When multiple receiver endpoints are subscribed to the same event classes, the same events will be delivered to all receivers. Therefore, event UUIDs should be used to de-duplicate events that are delivered to multiple receivers, the same event UUID is used for each receiver that receives the event.

Note

Both redundant receiver endpoints and periodically checking for failed deliveries (as described in the previous section) are orthogonal techniques to protect against missed events due to receiver downtime. Depending on the receiver’s reliability requirements, it may be preferable to use one or both of these mechanisms.

Tolerance for delays in event delivery due to receiver downtime is an important tradeoff to consider when selecting these mechanisms. When only single receiver endpoint is used, checking for failed deliveries when that receiver starts up will ensure that deliveries missed during receiver downtime are eventually processed. However, during the period of time when the receiver was unavailable, no events will have been processed. In contrast, if there are multiple redundant receivers, events will still be received immediately as long as at least one receiver is available. Therefore, if receiving events in a timely manner is important (e.g. for high-urgency fault-management alerts), consider the use of redundant receivers.

Nonetheless, there are some failure scenarios in which even redundant receivers could miss events. An outage may impact all redundant receivers, if they are running in the same failure domain, or if a network partition effects all communication to and from the Oxide rack. Therefore, even when redundant receiver endpoints are used, checking for failed deliveries periodically and on startup is still recommended if these classes of failures are a concern.

Zero-Downtime Secret Rotation

The Oxide webhook API permits multiple shared secrets to be configured for a single webhook receiver. If a receiver is configured with more than one secret, webhook requests will include an x-oxide-signature header for each secret. The value of these headers includes the secret’s ID in addition to the HMAC signature of the payload, allowing the receiver to determine which secret was used to generate that signature.

Multiple secrets may be used during secret key rotations in order to ensure that the webhook payload is always signed with a secret that can be verified by the receiver. When rotating secrets for a webhook receiver, first create the new secret using the POST /webhooks/{webhook}/secrets API endpoint, which returns the ID assigned to that secret. Then, the webhook receiver may be configured to verify signatures with the new secret. During this time, any webhook payload will be signed with both the new secret and the old secret, allowing it to be verified by the receiver regardless of whether it has yet been configured to accept the new secret. Finally, once the receiver is ready to accept the new secret, the old secret may be deleted using the DELETE /webhooks/{webhook}/secrets/{secret_id} API endpoint.

Appendix B: Future Work

This section discusses potential future additions to the webhook API which are considered out of scope for the MVP implementation.

Caution
Everything discussed in this section is subject to change and may not be implemented as described.

Rate Limiting

At present, webhooks are used to deliver relatively low-volume events (e.g. user alerts). In the future, as webhook events are emitted by additional subsystems, it may become necessary to implement mechanisms for rate-limiting the delivery of webhook events, in order to reduce the impact of webhook delivery on other control plane workloads. However, this is deemed out of scope for the MVP implementation.

Long-Polling

An alternative interface based on HTTP long-polling [RFC6202] could be provided for event delivery.

In a long-polling-based interface, the webhook receiver would act as a client of an endpoint served by the Oxide control plane, rather than the Oxide control plane’s webhook dispatcher initiating requests to a receiver endpoint. When such a request is received by the control plane, the request is kept open until events not yet seen by the client are published to that receiver. When events are published, the Oxide control plane would complete the request with a response representing those events. The client would then send a subsequent request, indicating which events it has successfully received, and the process begins again. The acknowledgement of received events could be implemented via parameters on the long-polling request, or through a separate acknowledgement endpoint.

Unlike the traditional webhook notification mechanism described in this RFD, a long-polling mechanism has the webhook receiver acting as a client of the Oxide control plane. This removes the need to serve a webhook receiver endpoint and ensure it is accessible to the webhook delivery client. It also avoids the need to track receiver liveness in the webhook dispatcher, as the receiver is responsible for initiating all communication between the control plane and the receiver. However, the traditional request-based approach is more commonly used for webhook delivery, so the MVP implementation will focus on that mechanism rather than long-polling. In the future, long-polling may be added as an alternative interface for webhook delivery.

Role-Based Payload Filtering

Currently, all webhook receivers are created by a user with fleet admin permissions, and operate with fleet viewer permissions. In the future, we would like to allow webhook receivers to operate with more restrictive permissions, and control what data is included in webhook payloads based on the webhook receiver’s roles. This would permit users with more restrictive permissions to create webhook receivers that only receive data that they are allowed to view.

For a webhook to receive an event, its identity must have permission to access the resource that the event relates to.

A webhook’s RBAC identity must have the viewer role on a resource for the webhook dispatcher to send it an event relating to that resource. Similar to subscriptions, if a webhook does not have any permissions configured, then it will not be sent any events. Roles also limit the data that a webhook is sent.

Examples

A webhook is subscribed to project.delete and has the silo.viewer role. When a project is deleted in the silo the webhook is a resource of, the webhook will be sent a payload that describes the silo and the project:

Payload with silo.viewer role
{
"event_class": "project.delete",
"data": {
"silo": { ... },
"project": { ... }
},
...
}

If the webhook’s RBAC identity was instead given only the project.viewer role for the project in question, then the silo data would be omitted from the payload sent by the dispatcher:

Payload with project.viewer role
{
"event_class": "project.delete",
"data": {
"project": { ... }
},
...
}

Finally, if the webhook’s RBAC identity was not given a role at all, then the dispatcher would omit sending this event to the webhook receiver at all.

Receiver Liveness

Delivery of webhook notifications cannot be guaranteed if the receiver endpoint is offline. Continuing to attempt delivery to an unreachable receiver can be costly. Therefore, it may be desirable for the webhook dispatcher to not attempt to deliver events to a receiver that has been unreachable for an extended period of time.

If delivery of an event fails permanently for a receiver (i.e., all retry attempts for that event are exhausted), the receiver could be marked as failed. Receivers could also marked as failed when a liveness probe request fails. When a receiver is marked as failed, the dispatcher will not attempt to deliver events to that receiver until it is marked as available again. This avoids exhausting the retry budget for events while the receiver is in an unavailable state, preventing those events from being delivered successfully.

Liveness probes are still sent to receivers that have been marked as failed. Once a liveness probe request to a failed receiver has succeeded, the receiver is marked as available again, and delivery of events will resume. We may also wish to provide an API endpoint to manually clear the failed status of a receiver.

Once receiver health is tracked by the fault management system, an alert could be generated when a receiver enters the failed state. Of course, a failed receiver endpoint cannot receive an alert indicating its own unavailability. However, if redundant receiver endpoints exist, other replicas could receive an alert indicating that one of their compatriots has gone offline.

Automated Liveness Probes

As part of a system for tracking receiver liveness, it may be desirable for liveness probes to be sent automatically by the webhook dispatcher, as well as being triggered externally by the POST /webhooks/{webhook}/probe endpoint. When a receiver is marked as failed, probes could be sent periodically to determine if it has become available again, clearing the failed state and allowing event delivery to resume.

Additionally, automated periodic liveness probes could be used to proactively detect receiver outages, rather than waiting for an actual event delivery to fail. This could be useful for receivers that are subscribed to important but infrequent events, such as fault management alerts. If probes are sent periodically, a receiver outage could be detected before any fault that generates an alert occurs, allowing operators to resolve the receiver failure before an urgent alert is missed.

External References