Webhooks provide a mechanism for the Oxide rack to notify external systems when events occur. In particular, notifications for fault management alerts ([rfd307]) are delivered via webhooks. This RFD describes the API used by external systems to subscribe to webhooks from the Oxide control plane, and how webhooks are delivered to receivers.
In addition, Appendex A contains guidance on how to use the mechanisms provided by the Oxide webhook API to implement a fault-tolerant, reliable webhook receiver.
The design of the internal implementation of the webhook system is described in RFD 364 Webhook Implementation.
Overview
Webhook notifications represent events that occur in the Oxide rack. Events include alerts generated by the rack’s fault management subsystem or by user-defined alerting rules, as well as audit log events for user and operator actions. When an event occurs, notifications are delivered as HTTP POST requests sent to a particular URL. An HTTP endpoint to which these requests are sent, and configuration associated with it, is referred to as a webhook receiver. Finally, the internal component of the Oxide control plane that is responsible for delivering webhook notifications to receivers is referred to as the webhook dispatcher.
Events
Events are the central concept of the webhook system’s data model. The data payloads delivered to receivers represent events recorded by the rack’s control plane. Receivers subscribe to events by specifying the event classes they wish to be notified of. The same event may result in notifications being sent to multiple receivers.
See here for a description of the format of webhook delivery requests.
Event Classes
Events are categorized into event classes.
These classes indicate the resource scope that an event relates to, and what event occurred.
An event class is a string consisting of one or more segments, with multiple separated by .
characters.
Each segment describes the category of the event with increasing specificity, with the first segment being the most general.
These classes form a hierarchy of event types.
Webhook receivers subscribe to one or more event classes to determine what events they will be notified for.
Top-level categories in the event class hierarchy will represent broad categories of entities, followed by increasingly specific subcategories.
Globbing
A receiver’s event subscriptions may include simple globs to subscribe to multiple categories of events.
Globbing is performed on a per-segment basis.
The "*"
character in an event class will match any single segment at that position in the event class string, while "**"
will match any number of segments.
For example, a webhook that subscribes to "instance.*"
will receive notifications for "instance.create"
, "instance.delete"
, "instance.start"
, and any other event class that begins with instance
.
Similarly, a webhook that subscribes to "**.delete"
will receive notifications for "project.delete"
, "instance.delete"
, and any other event class that ends with delete
.
To list all event classes currently known to the system, use the GET /webhook-events/
API endpoint.
To view metadata about a particular event class, use the GET /webhook-events/
API endpoint.
Event UUIDs
Each event that occurs within the system is uniquely identified by a UUID ([RFC9562]). These UUIDs are included in the event payload sent to webhook receivers. If an event results in notifications being delivered to multiple webhook receivers, the event UUID sent to each receiver will be the same. This may be used to correlate events across multiple receivers, and to de-duplicate repeated deliveries of the same event.
Two webhook notifications with the same event UUID refer to the same event, rather than two distinct events with equivalent data. On the other hand, two webhook notifications with different event UUIDs refer to two distinct events, even if other data in the event payload is the same.
Webhook Receivers
In order to receive webhook notifications, a receiver endpoint must be registered with the Oxide control plane. Multiple receivers, with different URLs, may be registered, and each receiver may be subscribed to different sets of events.
See here for the API endpoints used to create, update, view, and delete webhook receiver configurations.
For guidance on steps receiver endpoint implementations should take to ensure webhook notifications are delivered reliably, see appendix A.
Creating a webhook receiver requires a name, the URL of the receiver endpoint, and a list of one or more secrets used to sign the payloads sent to the receiver. In order to receive notifications when events occur, a receiver must also subscribe to one or more event classes, as discussed in the previous section. Event class subscriptions are specified using the globbing syntax described above. Event subscriptions may be specified when the receiver is created, or can be added to an existing receiver using the webhook configuration update endpoint.
Secrets
Webhook payloads are signed with a HMAC digest ([RFC2104]) using secret keys shared between the webhook receiver and the Oxide control plane to allow the receiver to verify the authenticity of the payload. In a creation request, a user submits a list of secrets they would like to have the webhook dispatcher use for signing payloads. Each secret will be assigned an ID generated by the system that is be unique within the webhook it belongs to.
Secrets cannot be changed once created, and can only be deleted. To rotate a secret, a user first creates a new secret, verifies that their application can validate the new signature, and then deletes the old secret. When a secret is deleted, the value is be erased and the secret is marked as deleted.
While users can ignore signature verification in their receivers, it is required that a webhook configuration always has at least one secret.
See here for the API endpoints used to create, view, and manage the secrets associated with a webhook receiver.
Identity and Access Management
For a webhook receiver to receive an event, the user registering the receiver must have the viewer
role for the resource the event relates to.
Fault management alerts and other hardware events require the fleet.viewer
role.
Currently, webhook receivers may only be created or modified by users with the fleet.admin
role, and all webhook receivers operate with the fleet.viewer
role.
Webhook receivers with more restrictive access control roles may be implemented in future Oxide system software, as described here.
Webhook Delivery
Webhook events are delivered to receiver endpoints as a HTTP POST request with a JSON body containing the event payload. The component of the Oxide control plane responsible for delivering webhook events is referred to as the webhook dispatcher.
The webhook dispatcher attempts to provide at-least-once reliable delivery to live receivers. This means that the control plane will retry unsuccessful delivery attempts in order to ensure that the notification is observed by the receiver. In some cases, this means that a receiver may be sent a notification for the same event multiple times. Therefore, webhook receiver implementations should use the event UUIDs included in the webhook payload to de-duplicate events.
Receiver implementations must treat two webhook payloads with the same event UUID as representing the same event. Two payloads with different event UUIDs should always be treated as representing distinct events.
See here for the API endpoints used to list the status of delivery attempts for a webhook, and to trigger re-delivery of an event.
Delivery Requests
All webhook delivery requests are structured as described in this section.
Request Headers
Webhook delivery requests always include the following headers:
Name | Value |
---|---|
|
|
| A UUID that uniquely identifies this delivery attempt for the event. |
| The UUID that identifies the webhook receiver to which the event was delivered. |
| The event class string of the event. |
| The event UUID of the event. |
| A HMAC signature of the payload for each H each secret key assigned to this webhook receiver. See the next section for details on the format of this header. |
Signature Header Format
For each secret key assigned to a webhook receiver, an x-oxide-signature
header is added with the HMAC digest of the payload signed with that secret key.
The values of this header includes the algorithm used to generate the HMAC digest, the UUID of the secret key, and the value of the signature for that secret key.
This data is encoded in the following format:
x-oxide-signature: a={algorithm}&id={secret-id}&s={signature}
Receivers may parse this format to extract the algorithm, secret ID, and signature from each header value.
For example, if a receiver has two secrets, with IDs c06487fd-6636-435a-8ade-915b3c3b7ff5
and 72c31177-fb4d-48ad-9a33-714e02f8ab59
, and the signatures for both secrets are generated using the SHA256 algorithm, the following headers would be sent:
x-oxide-signature: a=sha256&id=c06487fd-6636-435a-8ade-915b3c3b7ff5&s=d4da173d6473e40e7e0c38b9feb4dc101b362b55e8a2de936e850b8ed307badc
x-oxide-signature: a=sha256&id=72c31177-fb4d-48ad-9a33-714e02f8ab59&s=b437be7187eb33846c7839ab77e9c3ee03c0f4eee0f19aca531a7f6703f63954
Currently, only the SHA256 algorithm is supported.
Request Body
All webhook delivery requests include a JSON body that describes the event. This body always contains the following fields:
Name | Type | Description |
---|---|---|
|
| The event class of this event. |
|
| The event UUID that uniquely identifies this event. |
|
| The version of the event payload schema. If backwards-incompatible changes are made to the structure of event payloads, this version number is incremented. |
|
| Data payload describing the event. The schema of the data object is specific to the event class. |
| An object containing metadata about this event delivery attempt. |
Name | Type | Description |
---|---|---|
|
| A unique identifier for this delivery attempt |
|
| The UUID of the webhook receiver. |
|
| The timestamp at which the delivery request was sent. |
POST https://company.example HTTP/1.1
content-type: application/json
x-oxide-delivery-id: ea94de84-d0b2-4c6a-b2e9-9dde536bc79a
x-oxide-webhook-id: 8a630640-c5cf-441c-9bdd-163f0323bc29
x-oxide-timestamp: 1672531200
x-oxide-event-class: project.create
x-oxide-event-id: 202b22e5-9d4c-4843-a8ec-a20d501d9e4a
x-oxide-signature: a=sha256&id=c06487fd-6636-435a-8ade-915b3c3b7ff5&s=d4da173d6473e40e7e0c38b9feb4dc101b362b55e8a2de936e850b8ed307badc
x-oxide-signature: a=sha256&id=72c31177-fb4d-48ad-9a33-714e02f8ab59&s=b437be7187eb33846c7839ab77e9c3ee03c0f4eee0f19aca531a7f6703f63954
{
"event_class": "project.create",
"event_id": "202b22e5-9d4c-4843-a8ec-a20d501d9e4a",
"version": 1,
"data": {
...
},
"delivery": {
"id": "ea94de84-d0b2-4c6a-b2e9-9dde536bc79a",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"sent_at": "2023-01-01T00:00:00Z",
"trigger": "event",
}
}
1 | The event class that this payload represents. |
2 | The event UUID of the event that generated this webhook notification. This UUID is unique to the event, and can be used to de-duplicate multiple deliveries of the same event. |
3 | The version of the event payload schema. At present, this will always be 1 . |
4 | The data payload of the event. |
5 | Metadata describing this delivery attempt for the event. |
Failures and Delivery Retry
If a webhook notification is not successfully delivered, the dispatcher will attempt to deliver it again, in order to ensure that the receiver receives the notification.
Success
A successful delivery is one in which the dispatcher receives a response back from the receiver endpoint with a 2xx HTTP status code. Once an event is delivered successfully to a receiver, the webhook dispatcher will not attempt to deliver that event to that receiver again.
Responding to a webhook delivery request with a 2xx HTTP status code acknowledges that event, and the Oxide control plane will not attempt to deliver that event to that receiver again.
This means that a receiver implementation should not return 200 OK
until it has durably recorded the event, such as by writing it to persistent storage, or otherwise ensured that any actions in response to that event will reliably be performed.
If a receiver process receives a webhook delivery request and responds with a 200 OK
before it has durably recorded the event, and then crashes or otherwise loses the event, the Oxide control plane will not attempt to redeliver that event to that receiver.
This means that the notification is lost.
Webhook delivery does not follow HTTP redirects. If a receiver responds with a 3xx HTTP status code, this is considered a failure.
Failure
Delivering a webhook notification can fail for all of the reasons that an HTTPS request to an arbitrary URL may fail. Failures fall into one of three broad categories:
Receiver Endpoint Unreachable: A TCP connection to the receiver endpoint could not be opened.
This may be because the DNS name for the receiver endpoint URL could not be resolved, the connection was refused, or no connection was established within a 10-second timeout.
Response Timeout: A connection was successfully established, but no response was received within a 30-second timeout.
Receiver HTTP Error: A connection was successfully established and a response was received, but the receiver endpoint returned a 3xx, 4xx, or 5xx HTTP status code.
Retries
Each delivery will be attempted up to three times until a successful delivery is achieved. The first retry attempt occurs one minute after a failed delivery attempt. If the retry attempt fails, a second retry is performed five minutes after the first retry. If a webhook notification cannot be delivered successfully after three attempts, the webhook dispatcher considers that delivery to have permanently failed and will not retry delivery of that event to that receiver. In the future, the health of webhook receivers will be managed through FMA (i.e., an unresponsive receiver may trigger a fault).
A new delivery can be triggered using the POST /webhooks/
API endpoint.
The new delivery will also be retried up to three times.
Order of Delivery
There is no guarantee about the order in which messages will be sent to a receiver. At any point in time, a dispatcher may be have multiple in progress requests to a receiver. Each request may succeed, fail, and retry independently, resulting in messages appearing out of order in the context of the receiver.
Webhook messages always include a timestamp header (indicating the time at which the message was sent) to allow the receiver to reconcile the order of receipt with the order of sends. This header only provides the order in which the messages were sent, and does not make any claims about the order of the underlying events. Most event payloads will additionally include timestamps in the message body describing the time at which the event was observed.
Liveness Probes
Liveness probes are HTTP requests sent by the webhook dispatcher to a receiver endpoint to determine whether it is available.
These requests do not represent notifications for actual events, and should not be handled as events by the receiver implementation. Instead, probe requests are used to determine whether the receiver endpoint is currently available and could receive an actual event notification if one were to be sent.
Liveness probes have a number of uses:
They may be triggered by the receiver itself to indicate that it has become available after an outage, and to resend any missed events. See "resending failed deliveries" for details.
They may be triggered by an external system in order to monitor the health of the receiver, as described in "monitoring receiver availability".
They may be triggered manually by an operator when testing a receiver implementation.
Liveness probes are particularly valuable when a webhook receiver subscribes to infrequent but high-priority events, such as alerts. When the system is operating normally, no alerts will be generated. If probes are not sent to a receiver endpoint while the rack is operating normally and no faults have been detected, the receiver endpoint could become unavailable at any time without being detected. Should a fault then be detected in the rack, the dispatcher will attempt to deliver an alert to the receiver, but if the receiver is unavailable, the alert will be lost and operators may be unaware of the fault.
Liveness probes are event delivery requests with the "probe"
event class.
A probe request may be sent to a webhook receiver using the POST /webhooks/
API endpoint.
POST https://company.example HTTP/1.1
content-type: application/json
x-oxide-delivery-id: 3b04163c-d7e3-4e1a-bcca-48bd49e502f3
x-oxide-webhook-id: 8a630640-c5cf-441c-9bdd-163f0323bc29
x-oxide-timestamp: 1672531200
x-oxide-event-class: probe
x-oxide-event-id: 216cd071-0ec2-4b03-977e-7f1858af294d
x-oxide-sig-sha256-a110b40: d4da173d6473e40e7e0c38b9feb4dc101b362b55e8a2de936e850b8ed307badc
x-oxide-sig-sha256-b5d42b8: b437be7187eb33846c7839ab77e9c3ee03c0f4eee0f19aca531a7f6703f63954
{
"event_class": "probe",
"event_id": "202b22e5-9d4c-4843-a8ec-a20d501d9e4a",
"version": 1,
"data": {},
"delivery": {
"id": "216cd071-0ec2-4b03-977e-7f1858af294d",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"sent_at": "2023-01-01T00:00:00Z",
"trigger": "probe"
}
}
Liveness probe requests are represented as event deliveries to ensure that they are handled by the receiver similarly to actual event notifications. This ensures that probes are routed to the same API endpoint as actual events. It also means that the receiver implementation need not be able to handle a separate JSON body schema, and is only required to parse the JSON body schema of actual event payloads.
However, receiver implementations must take care to distinguish between actual event notifications and probe events. For instance, a receiver that handles alerting should not alert operators when it receives a probe event.
The receiver endpoint must respond with a HTTP 2xx
status code to indicate that it is available.
If the webhook dispatcher cannot connect to the receiver endpoint, the request times out, or the receiver responds with a 3xx
, 4xx
, or 5xx
status code, the probe is considered to have failed, similarly to the failure criteria for event delivery.
If it is possible for a receiver endpoint to be reachable but unable to process events, the receiver implementation may respond to probe events with a 5xx
status code to indicate that it has failed.
This can be used to report conditions where the receiver endpoint being up, but another component, such as a database to which events are written, is down.
If the receiver endpoint responds to a probe request with a 5xx
status code, the response body will be included in the alert generated for the receiver failure.
HTTP API Endpoints
Verb | Route | Purpose |
---|---|---|
| List event classes | |
| Fetch metadata for an event class |
Verb | Route | Purpose |
---|---|---|
| List webhook receivers | |
| Create a new webhook | |
| Get the configuration for a webhook | |
| Update a webhook configuration | |
| Delete a webhook configuration | |
| Send a synthesized liveness prove event to a webhook receiver. |
Verb | Route | Purpose |
---|---|---|
| List the IDs of a webhook’s secrets | |
| Add a new secret to a webhook | |
| Delete a secret from a webhook |
Verb | Route | Purpose |
---|---|---|
| List attempted event deliveries for a webhook | |
| Request re-delivery of an event |
Webhook Events
List event classes (GET /webhook-events/ classes
)
Name | Type | Description |
---|---|---|
|
| The maximum number of event classes to return for this request. |
|
| The next page token returned by a previous result page (if any). |
|
| Supported set of sort modes for scanning by name (either |
|
| A glob pattern to filter the list of returned event classes. If this is provided, only event classes that match the pattern will be returned. |
Examples
GET /webhook-events/classes?limit=10&page_token=image.create HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"items": [
"image.delete",
"image.demote",
"image.promote",
"instance.create",
"instance.delete",
"instance.ephemeral-ip.attach",
"instance.ephemeral-ip.detach",
"instance.disks.attach",
"instance.disks.detach",
"instance.fail",
],
"next_page": "instance.reboot"
}
GET /webhook-events/classes?filter=project.* HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"items": [
"project.create",
"project.delete",
"project.update",
],
"next_page": null
}
Fetch event class metadata (GET /webhook-events/ classes/ {class_name}
)
Name | Type | Description |
---|---|---|
|
| The name of the event class to view |
Examples
GET /webhook-events/classes/project.create HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"name": "project.create",
"description": "A project was created",
}
Webhook Receivers
List webhook receivers (GET /webhooks
)
This endpoint returns a list of configured webhook receivers.
This list is paginated, and may be ordered either by the receivers' names or by their UUIDs.
Name | Type | Description |
---|---|---|
|
| The maximum number of webhook receivers to return for this request. |
|
| The next page token returned by a previous result page (if any). |
|
| Supported set of sort modes for scanning by name or ID (either |
Response Body
Name | Type | Description |
---|---|---|
| [ | A list of |
|
| The name or ID (depending on sorting mode) of the first entry on the next page of responses, if any. This should be passed as the |
Examples
GET /webhook?limit=10 HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"items": [
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"secrets": [
{ "id": "2fa26178-e46a-4c1a-84d5-be2bb89d4c43" }
],
"events": [
"project.create",
"project.destroy",
],
},
{
"id": "81cb3d5d-a270-4b5a-810d-9a91d4ab23e7",
"name": "alert-o-mat",
"description": "Integration with some kind of made-up SaaS alerting thingy",
"endpoint": "https://alertomat.app/example-company/oxide",
"secrets": [
{ "id": "c0bc6bbf-ff7a-44ec-8d3d-ff6af79476c5" },
{ "id": "972afc29-6a76-4708-b393-c5caf57c018b" },
],
"events": [
"**.alert",
],
},
],
"next_page": null
}
Create a webhook (POST /webhooks
)
This endpoint accepts a JSON body that describes the configuration for the new webhook receiver.
Request Body
The following fields are required:
Name | Type | Description |
---|---|---|
|
| An identifier for this webhook receiver, which must be unique |
|
| Human-readable free-form text describing this webhook receiver. |
|
| The URL that webhook notification requests should be sent to |
|
| A non-empty list of secret keys that are used to sign payloads |
The following fields are optional:
Name | Type | Description |
---|---|---|
|
| A list of event classes to subscribe to |
Examples
POST /webhooks HTTP/1.1
{
"name": "my-integration",
"endpoint": "https://example.company/integration/oxide",
"description": "My cool webhook receiver",
"secrets": [
"my-secret-key"
],
"events": [
"project.create",
"project.destroy",
],
}
HTTP/1.1 201 Created
{ "id": "8a630640-c5cf-441c-9bdd-163f0323bc29" }
Get the configuration for a webhook (GET webhooks/ {webhook}
)
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to view |
Response Body
Name | Type | Description |
---|---|---|
|
| The UUID of the webhook receiver |
|
| An identifier for this webhook receiver provided upon creation |
|
| Human-readable free-form text describing this webhook receiver. |
|
| The URL that webhook notification requests are sent to |
| [ | A non-empty list of |
|
| A list of event classes that this webhook is subscribed to |
Name | Type | Description |
---|---|---|
|
| The ID that identifies this secret relative to other secrets configured for this webhook receiver |
Examples
GET /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29 HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"secrets": [
{ "id": "a110b40" }
],
"events": [
"project.create",
"project.destroy",
],
}
Update a webhook configuration (PUT /webhooks/ {webhook}
)
This endpoint accepts a JSON body that describes the updated configuration for the webhook receiver.
The provided fields will replace the previous configuration set by the webhook creation endpoint or by a previous call to this endpoint. To update some fields while leaving others in place, it is recommended to first use the webhook view endpoint to retrieve the current configuration, modify the desired fields, and then use this endpoint to update any changed configuration fields.
Webhook secrets are not updated using this endpoint.
To add or remove secrets used to sign payloads, use the POST /webhooks/
and DELETE /webhooks/
endpoints, respectively.
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to view |
Request Body
The following fields are required:
Name | Type | Description |
---|---|---|
|
| An identifier for this webhook receiver, which must be unique. |
|
| Human-readable free-form text describing this webhook receiver. |
|
| The URL that webhook notification requests should be sent to |
|
| A list of event classes to subscribe to |
Response Body
This endpoint returns a Webhook
object describing the properties of the webhook receiver after updating it.
Examples
PUT /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29 HTTP/1.1
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"events": [
"project.create",
"project.destroy",
],
}
HTTP/1.1 200 OK
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"name": "my-integration",
"description": "My cool webhook receiver",
"endpoint": "https://example.company/integration/oxide",
"secrets": [
{ "id": "2fa26178-e46a-4c1a-84d5-be2bb89d4c43" }
],
"events": [
"project.create",
"project.destroy",
],
},
Delete a webhook configuration (DELETE /webhooks/ {webhook}
)
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to delete |
Examples
DELETE /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29 HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"id": "8a630640-c5cf-441c-9bdd-163f0323bc29"
}
Send a liveness probe to a receiver (POST /webhooks/ {webhook}/ probe
)
Calling this endpoint will send a liveness probe event to the receiver with the provided name or UUID.
A liveness probe is a synthesized event used to test whether a receiver is capable of receiving events.
Liveness probe requests are similar to the delivery requests sent for actual events, but the event class will be probe
, and the event payload’s data
field is empty.
The HTTP status code of the response is the status code returned by the receiver endpoint.
The response body contains a DeliveryAttempt
object describing the probe request’s outcome.
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to send a probe request to. |
Name | Type | Description |
---|---|---|
|
| If |
Response Body
Name | Type | Description |
---|---|---|
| A |
Examples
POST /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/probe HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"probe": {
"id": "f371c094-e389-49b8-9a7a-fd15bfe29709",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "probe",
"event_id": "3aad7c56-d741-4a08-ac0a-02c5c6282757",
"state": "delivered",
"sent_at": "2023-01-01T20:00:00Z",
"trigger": "probe",
"response": {
"status": 200,
"response_time_ms": 500,
},
}
}
Secrets
Get the IDs of the secrets for a webhook (GET /webhooks/ {webhook}/ secrets
)
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to list the secret IDs of |
Response Body
Name | Type | Description |
---|---|---|
| [ | A list of |
GET /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/secrets HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"secrets": [
{ "id": "c06487fd-6636-435a-8ade-915b3c3b7ff5" }
]
}
Add a secret to a webhook (POST /webhooks/ {webhook}/ secrets
)
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to add the secret to |
Request Body
The following fields are required:
Name | Type | Description |
---|---|---|
|
| The value of the shared secret to add to the webhook receiver. |
Response Body
This endpoint returns a Secret
object containing the ID of the new secret.
POST /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/secrets HTTP/1.1
{ "secret": "secret-value" }
HTTP/1.1 201 Created
{ "id": "c06487fd-6636-435a-8ade-915b3c3b7ff5" }
Delete a secret from a webhook (DELETE /webhooks/ {webhook}/ secrets/ {secret_id}
)
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook the secret is associated with |
|
| The ID of the secret to delete |
DELETE /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/secrets/c06487fd-6636-435a-8ade-915b3c3b7ff5 HTTP/1.1
(empty body)
HTTP/1.1 200 OK
{ "id": "c06487fd-6636-435a-8ade-915b3c3b7ff5" }
Event Delivery Attempts
List attempted deliveries (GET /webhooks/ {webhook}/ deliveries
)
This endpoint returns a list of delivery attempts sent to a webhook receiver.
In addition to debugging and monitoring purposses, this endpoint can be used by a receiver implementation to check for failed delivery attempts and resend them as needed. See [resending-failed-deliveries] for details.
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhookto view delivery attempts for |
Name | Type | Description |
---|---|---|
|
| The maximum number of delivery attempts to return for this request. |
|
| The UUID of the last query attempts returned in a previous page of results. |
|
| Whether to include failed delivery attempts. If |
|
| Whether to include pending delivery attempts. If |
|
| Whether to include delivery attempts that were delivered successfully. If |
The failed
, pending
, and delivered
query parameters can be combined to filter which delivery attempts are included in the results.
For example, the query string ?failed=true&pending=false&delivered=false
will return only delivery attempts that have failed, excluding those which are pending or have been delivered successfully.
Response Body
Name | Type | Description |
---|---|---|
| A list of delivery attempts matching the query. | |
|
| The UUID of the first delivery attempt on the page of results, if any. If this field is |
Name | Type | Description |
---|---|---|
|
| The UUID of this delivery attempt. |
|
| The UUID of the webhook receiver to which the event was delivered. |
|
| The event class of the event that was delivered. |
|
| The event UUID of the event that was delivered. |
| A | |
|
| The timestamp at which the HTTP request was sent, or |
| A | |
| A |
Value | Description |
---|---|
| This delivery attempt has not yet sent a request to the receiver, or is awaiting a response. |
| A delivery request was sent to the receiver and a successful response was received. |
| This delivery attempt failed because a TCP connection to the receiver endpoint could not be established within a 30-second timeout. |
| This delivery attempt failed because a response to the HTTP request was not received within a 30-second timeout. |
| This delivery attempt failed because the receiver responded with an HTTP error status code. |
Value | Description |
---|---|
| This delivery was triggered by a new event being published. |
| This delivery was triggered by a liveness probe with |
| This delivery is a liveness probe. |
Name | Type | Description |
---|---|---|
|
| The HTTP status code of the response. |
|
| The time elapsed between when the request was sent and the response was received, in a whole number of milliseconds (ms). |
Examples
GET /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/deliveries HTTP/1.1
// (empty body)
HTTP/1.1 200 OK
{
"items": [
{
"id": "a81d3c93-61b6-4f5f-bf21-7c3a659c5654",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "project.create",
"event_id": "4dce8979-a97a-482f-a144-21361994c653" __CALLOUT_PLACEHOLDER_2__
"state": "pending",
"sent_at": null,
"trigger": "event"
"response": null,
},
{
"id": "f371c094-e389-49b8-9a7a-fd15bfe29709",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "project.create",
"event_id": "3aad7c56-d741-4a08-ac0a-02c5c6282757",
"state": "delivered",
"sent_at": "2023-01-01T20:00:00Z",
"trigger": "resend",
"response": {
"status": 200,
"response_time_ms": 500,
},
},
{
"id": "648cecf7-66e3-4791-9fbf-70745c30a350",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "project.create",
"event_id": "3aad7c56-d741-4a08-ac0a-02c5c6282757",
"state": "failed_unreachable",
"sent_at": "2023-01-01T01:00:00Z",
"trigger": "event",
"response": null,
},
{
"id": "e0532ce7-ddce-41eb-b2d8-0562ad677f04",
"webhook_id": "8a630640-c5cf-441c-9bdd-163f0323bc29",
"event_class": "instance.delete",
"event_id": "1f34a5ec-94e2-4115-a753-4dbbfc750860",
"sent_at": "2023-01-01T00:00:00Z",
"state": "delivered",
"trigger": "event",
"response": {
"status": 200,
"response_time_ms": 300,
},
}
],
"next_page": null
}
1 | An UUID uniquely identifying this delivery of the event. |
2 | The UUID of the webhook receiver to which the event is delivered. |
3 | The event UUID of the event that generated this webhook notification. If an event is delivered multiple times, the event ID will be the same for each delivery. |
4 | This webhook event has not yet been sent to the receiver endpoint. |
5 | This delivery was triggered by resending a previous failed delivery for this event. |
6 | This delivery failed because the receiver endpoint was unreachable. |
7 | This was a successful initial delivery attempt. |
Request re-delivery of an event (POST /webhooks/ {webhook}/ deliveries/ {event_id}/ resend
)
This endpoint requests that an event previously dispatched to a webhook receiver be delivered again.
If an event with the provided event UUID has previously been delivered to this webhook receiver, whether successfully or not, the dispatcher will attempt to deliver it again. A new delivery attempt is created, and the UUID of that delivery attempt is returned in the response.
Name | Type | Description |
---|---|---|
|
| The name or UUID of the webhook to re-deliver the event to |
|
| The event UUID of the event that should be re-delivered |
Examples
POST /webhooks/8a630640-c5cf-441c-9bdd-163f0323bc29/deliveries/3aad7c56-d741-4a08-ac0a-02c5c6282757/resend HTTP/1.1
// (empty body)
HTTP/1.1 201 Created
{
"delivery_id": "f371c094-e389-49b8-9a7a-fd15bfe2970"
}
Appendix A: Implementing a Reliable Webhook Receiver
When webhooks are used to notify external systems of high-priority events, such as faults, it is important to ensure that event notifications are delivered reliably. Missing a webhook event for an active problem means that an alerting system could fail to generate an alert, resulting in operators not operators being notified of the fault. The Oxide webhook API provides mechanisms to ensure that webhook delivery is as reliable as possible, but receiver endpoint implementations must take care to use these mechanisms correctly. This appendex suggests some steps receiver implementations should take to avoid missing webhook events.
Resending Failed Deliveries
If a receiver endpoint is unavailable, events dispatched to that receiver may not be received. The webhook dispatcher will retry failed delivery attempts up to two times, with a one-minute backoff before the first retry and a five-minute backoff before the second retry. However, if a receiver is unavailable for long enough to miss both retry attempts, the webhook dispatcher will not attempt to deliver that request again unless explicitly asked to by the receiver.
Therefore, it is recommended that receiver endpoints request that any failed deliveries be resent when they start up.
This way, if the software implementing a receiver crashes and is restarted, it can trigger a new delivery attempt for any events that it may have missed due to the crash.
The simplest way to do so is for the receiver to call the POST /webhooks/
API endpoint with the ?resend=true
query parameter to trigger a liveness probe request to itself.
When such a probe request succeeds, the webhook dispatcher will then resend all events which have not been delivered successfully to that receiver.
Alternatively, if more precise control over which events are resent is required, the GET /webhooks/
API endpoint provides a list of event delivery attempts to a webhook receiver.
To list only failed delivery attempts, the query parameters ?failed=true&pending=false&delivered=false
can be added to requests to this endpoint.
The receiver may then request re-delivery of specific events using the POST /webhooks/
API endpoint.
This is more complex than triggering probe requests, but it may be useful in situations where the receiver only wishes to resend a subset of failed events.
For example, it may only request re-delivery of events that occurred within a particular time window, or it may only resend events that were not processed by a redundant receiver subscribed to the same event classes.
In addition to checking for failed delivery attempts on startup, receiver implementations may periodically attempt to resend failed deliveries while they are running. Even if a receiver endpoint has not crashed, event deliveries may have been missed due to network connectivity problems or other transient issues.
Monitoring Receiver Availability
Liveness probes may be used to monitor the health of a receiver endpoint.
An external system which periodically triggers liveness probe requests to a receiver using the POST /webhooks/
API endpoint can detect situations where the receiver is unavailable and alert operators to the outage.
This allows receiver failures to be detected and resolved proactively, reducing the likelihood of missing notifications for important events.
POST /webhooks/ {webhook}/ probe
API endpoint be used as the primary mechanism for monitoring a receiver endpoint’s availability, rather than a request sent by the monitoring system directly to the receiver.While sending requests directly to the receiver process can detect failures where the receiver is completely offline, it does not exercise communication between the Oxide control plane and the receiver.
Outages may still occur when the receiver endpoint is online and capable of processing requests if it is not reachable by the Oxide rack’s webhook dispatcher.
Calling the POST /webhooks/
API endpoint requests that the webhook dispatcher send a request to the receiver, which will fail if the Oxide control plane cannot communicate with the receiver.
It may be valuable for a monitoring system to both trigger liveness probe requests from the webhook dispatcher using the POST /webhooks/
endpoint and to send its own probes directly to the webhook receiver.
This provides separate signals for whether the receiver endpoint is running at all, and for whether it is reachable by the Oxide control plane.
Operators can then reason about whether an outage is due to the webhook receiver process not running at all, or due to a network partition between it and the webhook dispatcher.
Redundant Receiver Endpoints
If a receiver endpoint is unavailable, webhook events may be missed. Therefore, when webhooks are used to provide notifications of critical events such as alerts, operators are encouraged to run multiple independent webhook receiver endpoints subscribed to the same event classes. When multiple receiver endpoints are subscribed to the same event classes, the same events will be delivered to all receivers. Therefore, event UUIDs should be used to de-duplicate events that are delivered to multiple receivers, the same event UUID is used for each receiver that receives the event.
Both redundant receiver endpoints and periodically checking for failed deliveries (as described in the previous section) are orthogonal techniques to protect against missed events due to receiver downtime. Depending on the receiver’s reliability requirements, it may be preferable to use one or both of these mechanisms.
Tolerance for delays in event delivery due to receiver downtime is an important tradeoff to consider when selecting these mechanisms. When only single receiver endpoint is used, checking for failed deliveries when that receiver starts up will ensure that deliveries missed during receiver downtime are eventually processed. However, during the period of time when the receiver was unavailable, no events will have been processed. In contrast, if there are multiple redundant receivers, events will still be received immediately as long as at least one receiver is available. Therefore, if receiving events in a timely manner is important (e.g. for high-urgency fault-management alerts), consider the use of redundant receivers.
Nonetheless, there are some failure scenarios in which even redundant receivers could miss events. An outage may impact all redundant receivers, if they are running in the same failure domain, or if a network partition effects all communication to and from the Oxide rack. Therefore, even when redundant receiver endpoints are used, checking for failed deliveries periodically and on startup is still recommended if these classes of failures are a concern.
Zero-Downtime Secret Rotation
The Oxide webhook API permits multiple shared secrets to be configured for a single webhook receiver.
If a receiver is configured with more than one secret, webhook requests will include an x-oxide-signature
header for each secret.
The value of these headers includes the secret’s ID in addition to the HMAC signature of the payload, allowing the receiver to determine which secret was used to generate that signature.
Multiple secrets may be used during secret key rotations in order to ensure that the webhook payload is always signed with a secret that can be verified by the receiver.
When rotating secrets for a webhook receiver, first create the new secret using the POST /webhooks/
API endpoint, which returns the ID assigned to that secret.
Then, the webhook receiver may be configured to verify signatures with the new secret.
During this time, any webhook payload will be signed with both the new secret and the old secret, allowing it to be verified by the receiver regardless of whether it has yet been configured to accept the new secret.
Finally, once the receiver is ready to accept the new secret, the old secret may be deleted using the DELETE /webhooks/
API endpoint.
Appendix B: Future Work
This section discusses potential future additions to the webhook API which are considered out of scope for the MVP implementation.
Rate Limiting
At present, webhooks are used to deliver relatively low-volume events (e.g. user alerts). In the future, as webhook events are emitted by additional subsystems, it may become necessary to implement mechanisms for rate-limiting the delivery of webhook events, in order to reduce the impact of webhook delivery on other control plane workloads. However, this is deemed out of scope for the MVP implementation.
Long-Polling
An alternative interface based on HTTP long-polling [RFC6202] could be provided for event delivery.
In a long-polling-based interface, the webhook receiver would act as a client of an endpoint served by the Oxide control plane, rather than the Oxide control plane’s webhook dispatcher initiating requests to a receiver endpoint. When such a request is received by the control plane, the request is kept open until events not yet seen by the client are published to that receiver. When events are published, the Oxide control plane would complete the request with a response representing those events. The client would then send a subsequent request, indicating which events it has successfully received, and the process begins again. The acknowledgement of received events could be implemented via parameters on the long-polling request, or through a separate acknowledgement endpoint.
Unlike the traditional webhook notification mechanism described in this RFD, a long-polling mechanism has the webhook receiver acting as a client of the Oxide control plane. This removes the need to serve a webhook receiver endpoint and ensure it is accessible to the webhook delivery client. It also avoids the need to track receiver liveness in the webhook dispatcher, as the receiver is responsible for initiating all communication between the control plane and the receiver. However, the traditional request-based approach is more commonly used for webhook delivery, so the MVP implementation will focus on that mechanism rather than long-polling. In the future, long-polling may be added as an alternative interface for webhook delivery.
Role-Based Payload Filtering
Currently, all webhook receivers are created by a user with fleet admin permissions, and operate with fleet viewer permissions. In the future, we would like to allow webhook receivers to operate with more restrictive permissions, and control what data is included in webhook payloads based on the webhook receiver’s roles. This would permit users with more restrictive permissions to create webhook receivers that only receive data that they are allowed to view.
For a webhook to receive an event, its identity must have permission to access the resource that the event relates to.
A webhook’s RBAC identity must have the viewer
role on a resource for the webhook dispatcher to send it an event relating to that resource.
Similar to subscriptions, if a webhook does not have any permissions configured, then it will not be sent any events.
Roles also limit the data that a webhook is sent.
Examples
A webhook is subscribed to project.delete
and has the silo.viewer
role. When a project is deleted in the silo the webhook is a resource of, the webhook will be sent a payload that describes the silo and the project:
silo.viewer
role{
"event_class": "project.delete",
"data": {
"silo": { ... },
"project": { ... }
},
...
}
If the webhook’s RBAC identity was instead given only the project.viewer
role for the project in question, then the silo data would be omitted from the payload sent by the dispatcher:
project.viewer
role{
"event_class": "project.delete",
"data": {
"project": { ... }
},
...
}
Finally, if the webhook’s RBAC identity was not given a role at all, then the dispatcher would omit sending this event to the webhook receiver at all.
Receiver Liveness
Delivery of webhook notifications cannot be guaranteed if the receiver endpoint is offline. Continuing to attempt delivery to an unreachable receiver can be costly. Therefore, it may be desirable for the webhook dispatcher to not attempt to deliver events to a receiver that has been unreachable for an extended period of time.
If delivery of an event fails permanently for a receiver (i.e., all retry attempts for that event are exhausted), the receiver could be marked as failed. Receivers could also marked as failed when a liveness probe request fails. When a receiver is marked as failed, the dispatcher will not attempt to deliver events to that receiver until it is marked as available again. This avoids exhausting the retry budget for events while the receiver is in an unavailable state, preventing those events from being delivered successfully.
Liveness probes are still sent to receivers that have been marked as failed. Once a liveness probe request to a failed receiver has succeeded, the receiver is marked as available again, and delivery of events will resume. We may also wish to provide an API endpoint to manually clear the failed status of a receiver.
Once receiver health is tracked by the fault management system, an alert could be generated when a receiver enters the failed state. Of course, a failed receiver endpoint cannot receive an alert indicating its own unavailability. However, if redundant receiver endpoints exist, other replicas could receive an alert indicating that one of their compatriots has gone offline.
Automated Liveness Probes
As part of a system for tracking receiver liveness, it may be desirable for liveness probes to be sent automatically by the webhook dispatcher, as well as being triggered externally by the POST /webhooks/
endpoint.
When a receiver is marked as failed, probes could be sent periodically to determine if it has become available again, clearing the failed state and allowing event delivery to resume.
Additionally, automated periodic liveness probes could be used to proactively detect receiver outages, rather than waiting for an actual event delivery to fail. This could be useful for receivers that are subscribed to important but infrequent events, such as fault management alerts. If probes are sent periodically, a receiver outage could be detected before any fault that generates an alert occurs, allowing operators to resolve the receiver failure before an urgent alert is missed.
External References
[webhooks] Jeff Lindsay. Web hooks to revolutionize the web. https://web.archive.org/web/20180630220036/http://progrium.com/blog/2007/05/03/web-hooks-to-revolutionize-the-web/
[rfd307] Oxide Computer Company. RFD 307 Alert delivery mechanism. https://rfd.shared.oxide.computer/rfd/0307
[rfd523] Oxide Computer Company. RFD 523 Audit Logging Implementation. https://rfd.shared.oxide.computer/rfd/0523
[rfd364] Oxide Computer Company. RFD 364 Webhook Implementation. https://rfd.shared.oxide.computer/rfd/0364
[RFC2104] H. Krawczyk, M. Bellare, R. Canetti. RFC 2104 HMAC: Keyed-Hashing for Message Authentication. https://datatracker.ietf.org/doc/html/rfc2104
[RFC6202] S. Loreto, P. Saint-Andre, S. Salsano, G. Wilkins. RFC 6202 Known Issues and Best Practices for the Use of Long Polling and Streaming in Bidirectional HTTP. https://datatracker.ietf.org/doc/html/rfc6202
[RFC9562] K. Davis, B. Peabody, P. Leach. RFC9562 Universally Unique IDentifiers (UUIDs). https://datatracker.ietf.org/doc/html/rfc9562