Telemetry and metric data forms an essential feature of the Oxide rack. Software
and hardware components throughout the rack produce data, which is collected by
a program called oximeter
and stored in a ClickHouse database. This RFD
describes the Oximeter Query Language (OxQL), a domain-specific language
(DSL) for querying and processing the stored data. It is intended to be used as
a building block for both Oxide-internal and customer-facing software.
The following content is (mostly) intended to be appropriate for end-user documentation. After publication, we expect that the majority of this RFD can be re-published as user documentation, which then become the document of record for OxQL. This RFD will not be updated further.
Background
Motivation
Oxide is in a unique position. Our product takes a holistic approach to rack-scale computing infrastructure, with hardware and software co-designed and sold to customers as a unit. We therefore have a unique opportunity to provide those customers deep insight into their infrastructure, with telemetry data sourced from the smallest hardware sensors all the way to the largest programs. While collecting data is a necessary first step, it’s critical that we also provide a powerful system with which to access this data. This is the purpose of OxQL.
OxQL should enable users to understand their Oxide deployments. In addition to extracting and exporting raw data, OxQL can process the data to provide insight and serve as a foundation for observability tools ranging from graphical dashboards to alerting mechanisms.
Why a new language?
There already exist many languages for querying metric data. Why do we need a new one? The short answer is that there is no existing language that obviously satisfies our needs. Many telemetry systems have implemented their own DSL, and each tends to be tailored (intentionally or not) to the underlying data model and storage mechanisms. Standard languages like SQL enjoy wide usage in relational databases, but are not as well suited to timeseries data. SQL is also complex and arcane, having accrued a wide array of syntax and semantics over its many decades of use. Adopting SQL would require deciding which subset of SQL to support, and how to translate that into an efficient query on the underlying data storage, an area of continual research and improvement even for battle-tested database engines. That work is likely to be expensive and time-consuming.
Additionally, it’s important that our query system integrate well with other parts of the Oxide rack, both hardware and software. Incorporating an off-the-shelf language requires either tailoring the language to Oxide systems, or the other way around. Neither of these is particularly attractive.
Finally, we should also note that adopting any language will require effort to interpret and translate statements into queries on the underlying tables in ClickHouse, which reduces the differential effort between using an existing language and building our own.
In other words, while we could adopt an existing language, it would probably be a similar amount of work to writing our own; provide only superficial compatibility and familiarity; and like constrain us in ways that drive us to evolve the language or break compatibility further.
Other background
OxQL is designed to query the timeseries data produced by various components in the Oxide rack. The data model for such data is described in detail in [rfd161]. [rfd162] describes the architecture of the collection system itself. This RFD includes a brief overview of the data model, with a focus on the pieces relevant to OxQL. [rfd442] explores how other, related platforms expose metric data. That survey led us to adopt our own DSL.
Key examples
To motivate the design of OxQL, it is helpful to look at a few key examples. These will be used throughout the document to illustrate OxQL’s main concepts. These queries are accessible enough to give a general sense of the language and its core elements.
Selecting raw data
Let’s start by looking at a simple query. The query below selects the number of bytes sent out sleds' data links over the last minute.
get sled_data_link:bytes_sent
| filter timestamp > @now() - 1m
This query shows a few key features of OxQL:
1 | First, we can select raw data from a named table, using the get table
operation. In this case, we’re selecting data from the
sled_data_link:bytes_sent table. |
2 | Data can be piped between table operations, using the familiar Unix pipe
symbol | . |
3 | We use the filter table operation to limit the returned data, in this case
based on the timestamp of the data. Data may also be filtered by comparing
field values, which we’ll see later. |
Running this query on a live system, we might see data like the following:
oxide
CLI command below returns raw data as a detailed JSON object.
For clarity and exposition purposes, a Python script is used to format the
output. See the details below for more information.Details
OxQL queries can be submitted to the Oxide API through the
timeseries_query
API call,
or the /
endpoint. This returns data as JSON, which has a
rich structure with both data and metadata. The full schema for that object can
be found in the OpenAPI document, starting
here.
The output is quite verbose, however, and so the following Python script is used
in the examples to simplify and format it. It will accept the output of the
command oxide experimental timeseries query
on standard input, and print the
formatted results shown in this document on standard output. The script uses
only modules from the standard library for portability. The examples assume it
is in the current directory, executable, and named fmt-query.py
.
fmt-query.py
#!/usr/bin/env python3
import json
import sys
def format_points(maybe_start_times, timestamps, values):
if maybe_start_times is None:
for (timestamp, value) in zip(timestamps, values):
value = list(value) if len(value) > 1 else value[0]
print(f" {timestamp}: {value}")
else:
for (start_time, timestamp, value) in zip(maybe_start_times, timestamps, values):
value = list(value) if len(value) > 1 else value[0]
print(f" {start_time} - {timestamp}: {value}")
def format_one_timeseries(ts):
fields = ts["fields"]
for field_name in sorted(fields.keys()):
data = fields[field_name]
typ = data["type"]
value = data["value"]
print(f" {field_name} ({typ}): {value}")
points = ts["points"]
format_points(
points.get("start_times"),
points["timestamps"],
zip(*(pts["values"]["values"] for pts in points["values"])),
)
def format_timeseries(table: str, timeseries: list[object]):
print(f"\n{table}\n")
for ts in timeseries.values():
format_one_timeseries(ts)
print()
if __name__ == '__main__':
raw = []
for line in sys.stdin:
raw.append(line)
data = json.loads("".join(raw))
for table in data["tables"]:
table_name = table["name"]
timeseries = table["timeseries"]
format_timeseries(table_name, timeseries)
$ oxide experimental timeseries query --query \ 'get sled_data_link:bytes_sent | filter timestamp > @now() - 1m' | \ ./fmt-query.py sled_data_link:bytes_sent kind (string): vnic link_name (string): oxControlService10 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 5f6720b8-8a31-45f8-8c94-8e699218f28b sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220017 zone_name (string): oxz_crucible_2e32fdcc-737a-4430-8290-cb7028ea4d50 2024-08-14T03:17:43.131022517Z - 2024-08-14T03:37:58.138871843Z: 4086 2024-08-14T03:37:58.138871843Z - 2024-08-14T03:38:08.143299789Z: 172 2024-08-14T03:38:08.143299789Z - 2024-08-14T03:38:18.146477647Z: 0 2024-08-14T03:38:18.146477647Z - 2024-08-14T03:38:28.151284452Z: 0 kind (string): physical link_name (string): cxgbe1 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM44220011 zone_name (string): global 1986-12-28T00:00:35.461760437Z - 2024-08-14T03:37:53.041474188Z: 6683986815 2024-08-14T03:37:53.041474188Z - 2024-08-14T03:38:03.047111912Z: 105039519 2024-08-14T03:38:03.047111912Z - 2024-08-14T03:38:14.132695318Z: 117618346 2024-08-14T03:38:14.132695318Z - 2024-08-14T03:38:24.138699820Z: 106764171 2024-08-14T03:38:24.138699820Z - 2024-08-14T03:38:34.144176654Z: 105105116 <... output continues ...>
Let’s break down this output.
1 | The entire output from this query is a table.
It has a name, shown here. In this case, we’re selecting raw data from a
table in the database, and so the output has the same name as that table,
sled_data_link:bytes_sent . We’ll see in later examples that this isn’t always
the case. |
2 | This paragraph shows one timeseries in the
sled_data_link:bytes_sent table. Every timeseries in the table has the same
field names and field types. For example, every
timeseries in this table has a field named kind with type string . |
3 | The field values for this timeseries. While every timeseries has the
same field names and types, the set of all field values for this timeseries is
unique. The values unique identify the timeseries within its table, similar to a
primary key in a relational database. This particular timeseries has the value
vnic for its kind field. |
4 | After the field values are the data points, one on each line. We can see three columns here: a start time, a timestamp, and a data value. The start time reflects the fact that this is a cumulative counter, where the data measures the total number of bytes sent between the start time and timestamp. Note that the start time of each sample is equal to the timestamp of the preceding sample, if there is one. |
5 | A different timeseries in the same table. Note that the names and types of
the fields are the same as in (2), but the values for the fields differ (e.g.,
the sled_serial field has value BRM44220011 here). |
Selecting raw data can be quite useful, especially when first examining a system or its behavior. It can illuminate the basic shape and structure of the data, such as fields and their values; data and timestamp ranges; etc. Still, we often want to process the raw data in various ways to better to understand a system’s behavior and diagnose issues. The next queries will explore how to do that.
Average network usage
As we saw above, data can be piped from one table operation into another. In addition to filters, table operations can also be used to compute averages or other summaries of the data. The query below computes the number of bytes sent over the last hour, averaged into 5 minute windows.
get sled_data_link:bytes_sent
| filter timestamp > @now() - 1h
| align mean_within(5m)
Its output might look like this:
sled_data_link:bytes_sent kind (string): vnic link_name (string): oxControlService5 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): dd83e75a-1edf-4aa1-89a0-cd8b2091a7cd sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220016 zone_name (string): oxz_crucible_0e708ee3-b7a6-4993-a88a-4489add33e29 2024-08-15T20:01:53.730368377Z: 41571.83377696665 2024-08-15T20:06:53.730368377Z: 10126.403066282965 2024-08-15T20:11:53.730368377Z: 10375.428410271988 2024-08-15T20:16:53.730368377Z: 35976.89723356015 2024-08-15T20:21:53.730368377Z: 15632.43617543621 2024-08-15T20:26:53.730368377Z: 10540.75845058832 2024-08-15T20:31:53.730368377Z: 9825.903589579711 2024-08-15T20:36:53.730368377Z: 10653.3210599372 2024-08-15T20:41:53.730368377Z: 9844.633858761896 2024-08-15T20:46:53.730368377Z: 9554.318609179172 2024-08-15T20:51:53.730368377Z: 10030.136672115084 2024-08-15T20:56:53.730368377Z: 10705.870660262928 kind (string): physical link_name (string): cxgbe1 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): db183874-65b5-4263-a1c1-ddb2737ae0e9 sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220009 zone_name (string): global 2024-08-15T20:01:53.730368377Z: 3851101.689250712 2024-08-15T20:06:53.730368377Z: 3123494.0195476883 2024-08-15T20:11:53.730368377Z: 3134962.1962845605 2024-08-15T20:16:53.730368377Z: 3331401.7208908484 2024-08-15T20:21:53.730368377Z: 3322688.084577879 2024-08-15T20:26:53.730368377Z: 3143185.7570474516 2024-08-15T20:31:53.730368377Z: 3227492.9576822957 2024-08-15T20:36:53.730368377Z: 3098943.827306392 2024-08-15T20:41:53.730368377Z: 3163852.61823991 2024-08-15T20:46:53.730368377Z: 3103095.5405027973 2024-08-15T20:51:53.730368377Z: 3181145.248283609 2024-08-15T20:56:53.730368377Z: 5273885.797166327 <... output continues ...>
1 | It’s important to notice that the initial portion of the query here is the
same as the previous example. This shows a strength of OxQL: building complex
processing pipelines incrementally by piping table operations into one another.
However, we also see some new table operations, such as the align and
group_by operations on the following lines. |
2 | The align table operation. When merging data between timeseries, it’s
important that the timestamps be aligned, so that all timestamps occur at
regular intervals. The align table operation uses an alignment function to
determine (1) the size of the output time window and (2) how to combine data
in each timeseries that falls within a window. Here, the mean_within alignment
function is used. This alignment function computes the average of the input data
in each timeseries within non-overlapping windows. It takes one argument, which
is the window size, here 5 minutes, which is exactly the interval of the output
samples shown below. |
3 | We can see that each timeseries in the output table has exactly the same fields as the earlier query. We haven’t modified which timeseries are returned by this query, only what data points are returned. |
4 | The data now shows the average number of bytes sent over 5 minute windows.
There are two important things to note here. First, the start time is now
gone, because it’s implied by the timestamp and the 5 minute windows to the
mean_within alignment function. Second, the values are now floating point
numbers, because we’ve averaged many values. (This is reflected in the full JSON
output as well, not just the numerical values, which explicitly lists the type
of data returned in each query.) |
Average network usage by sled
In the preceding example, we saw that we can average data over time within one timeseries. (The number of timeseries in the table didn’t change.) But in many situations one actually wants to average data between timeseries. For example, one might be interested in the aggregate network usage for each sled, regardless of which data link was used to actually send data. That can be done by grouping timeseries together.
The query below computes the number of bytes sent in the last hour, averaged over 5 minute windows, for every sled.
get sled_data_link:bytes_sent
| filter timestamp > @now() - 1h
| align mean_within(5m)
| group_by [sled_serial]
1 | As in the previous examples, we can see that the first portion of this query is identical. We’re piping outputs together, and so we can extend the query by adding the new table operation at the end. |
2 | This is a group_by table operation. It accepts a list of field names. For
each timeseries in the input table, it groups together all those which have the
same values for the listed fields. So in this case, all timeseries with the same
sled serial number will be grouped. |
The data for this query might look like this:
sled_data_link:bytes_sent
sled_serial (string): BRM42220031
2024-08-15T20:22:27.190710536Z: 496949.09648107865
2024-08-15T20:27:27.190710536Z: 109857.58221837238
2024-08-15T20:32:27.190710536Z: 109232.30921474376
2024-08-15T20:37:27.190710536Z: 105512.25538612343
2024-08-15T20:42:27.190710536Z: 626023.2184284237
2024-08-15T20:47:27.190710536Z: 106813.24027380218
2024-08-15T20:52:27.190710536Z: 105864.14432594774
2024-08-15T20:57:27.190710536Z: 107299.78825178408
2024-08-15T21:02:27.190710536Z: 105852.9834382634
2024-08-15T21:07:27.190710536Z: 111337.96568275186
2024-08-15T21:12:27.190710536Z: 108153.90428668828
2024-08-15T21:17:27.190710536Z: 110668.32619822676
sled_serial (string): BRM42220014
2024-08-15T20:22:27.190710536Z: 568670.8494093848
2024-08-15T20:27:27.190710536Z: 510732.8644817208
2024-08-15T20:32:27.190710536Z: 507523.0107900741
2024-08-15T20:37:27.190710536Z: 525524.4641971652
2024-08-15T20:42:27.190710536Z: 514333.71314438555
2024-08-15T20:47:27.190710536Z: 512395.6909131015
2024-08-15T20:52:27.190710536Z: 508051.0772952132
2024-08-15T20:57:27.190710536Z: 506067.695052197
2024-08-15T21:02:27.190710536Z: 792672.5493766736
2024-08-15T21:07:27.190710536Z: 511381.2428469829
2024-08-15T21:12:27.190710536Z: 501101.7476959403
2024-08-15T21:17:27.190710536Z: 516705.9677027045
<... output continues ...>
We can see that each timeseries in the output table just has one field,
sled_serial
, which is the field listed in the group_by
operation. Each
timeseries in the output table is generated by averaging all the timeseries in
the group, i.e., with the value shown here for the sled_serial
field. See
below for more details on the group_by
table operation.
Joining network input and output
The above example shows how one might look at average network output over time. However, understanding system behavior or diagnosing issues often requires correlating multiple pieces of data at once. In this case, one might also want to know the number of bytes received during this same time period.
This requires merging data from two different tables, not just two timeseries
within the same table. We can do that with a subquery and the join
table
operation, which looks like this:
{
get sled_data_link:bytes_sent;
get sled_data_link:bytes_received
}
| filter timestamp > @now() - 5m
| align mean_within(1m)
| join
The output of the above query might look like this:
sled_data_link:bytes_sent,sled_data_link:bytes_received kind (string): vnic link_name (string): oxControlService5 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 7b862eb6-7f50-4c2f-b9a6-0d12ac913d3c sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM44220010 zone_name (string): oxz_crucible_0e38475e-b8b2-4813-bf80-3c170081081a 2024-08-14T04:35:17.063247129Z: [0.3079514963268704, 1.1129475130409705] 2024-08-14T04:36:17.063247129Z: [0.0, 57.333333333333336] 2024-08-14T04:37:17.063247129Z: [21.0, 0.0] 2024-08-14T04:38:17.063247129Z: [28.666666666666668, 286.6666666666667] 2024-08-14T04:39:17.063247129Z: [25.2, 0.0] kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88 sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220006 zone_name (string): global 2024-08-14T04:35:17.063247129Z: [72891.30744207311, 57202.59302643966] 2024-08-14T04:36:17.063247129Z: [85104.66198650173, 67958.87151877217] 2024-08-14T04:37:17.063247129Z: [83203.29649990669, 66604.95842044336] 2024-08-14T04:38:17.063247129Z: [84199.90222011972, 66680.09252436645] 2024-08-14T04:39:17.063247129Z: [79358.75400917938, 63100.706405006815] <... output continues ...>
1 | A subquery is shown here, nested in curl braces ({} ). A subquery can
contain any number of queries (or subqueries!), each separated by semicolons
(; ). In this case there are two: we first select number of bytes sent and then
the bytes received on a compute sled data link. Note that this means the output
of the subquery has two tables, one for each of its queries. |
2 | As we’ve seen in the previous examples, we are reducing the amount of data with
the filter table operation. Note that in this case, the filter applies to the
entire output of the subquery, so to both tables! (One can also apply filters
inside the suqbuery, if needed.) |
3 | As we’ve also see before, we’re aligning the tables, this time using a 1
minute window. Just like with group_by , tables must be aligned in order to
join them. |
4 | Finally, the query uses the join table operation to combine the results
from the two tables. The join operation combines the timeseries from each
table that have the same field values. Note that this works because the two
tables in the subquery have the same field names and types! |
5 | The table output from the join table operation has a name that’s derived
from the two input table names. In this case, by joining them with a , . |
6 | OxQL combines the data values from two tables in a join operation by
concatenating the values that occur at the same time. (This is why the tables
must be aligned first.) In this case, since there are two tables in the
subquery, the output also has two data values at each timestamp. It is now
two-dimensional, showing the number of bytes sent and received at every time
point, respectively. |
join
operation is equivalent to a
natural
inner join. Other kinds of joins (like outer joins) may be supported in the
future, which would allow combining tables with different schema.Intro summary
By examining a few different example queries, we’ve seen many of the major features of OxQL:
OxQL operates on tables, which are a set of timeseries with the same schema; they have the same field names, field types, and data type.
Each table contains some number of timeseries, which are identified by their unique field values.
Table operations are used to select or process the timeseries in a table.
Table operations can be piped together to build up complex queries.
Subqueries are written by enclosing multiple queries in braces.
After this quick tour of OxQL, the remaining sections are a detailed reference of the language. We’ll start with the main concepts and terminology.
Terminology
Tables and timeseries
A table is a set of data corresponding to a fixed schema. The schema includes the table name; a set of typed key-value pairs called fields; a data type>; and a <<_metric_type,metric type.
A timeseries is one element in a table, with unique values for the fields. There may be any number of timeseries in a table. All of them have the same schema, but each is uniquely identified by these field values. The values for the fields are used to generate a timeseries key, which is essentially a hash of the unique data identifying the timeseries. Note that the key is used during query processing and to organize the output data, but it is rarely useful in the query output.
Fields
A field is a typed key-value pair. It consists of a name and a value, which is one of the following types:
1-, 2-, 4-, or 8-byte integers, both signed and unsigned
Booleans (
true
orfalse
)Strings (UTF-8)
UUIDs (16-octet values, often random, used for unique resource identifiers)
IP addresses (v4 or v6)
Fields are used to provide context for the data in a timeseries. They are also used to identify a timeseries, as discussed above, by generating the timeseries key.
In our examples above, the field sled_serial
is a string
serial number for a sled, while the sled_id
field is a UUID. These identify
the compute sled from which the data is derived, for example. Other fields
provide additional context; for example, the link_name
identifies the
individual network data link in question.
Datum types
The datum refers to the measured value of a timeseries. It is always associated with a timestamp, and may be associated with a start time if the datum is cumulative.
A datum may have one of the following types:
1-, 2-, 4-, or 8-byte integers, both signed and unsigned
Booleans (
true
orfalse
)Strings (UTF-8)
4- or 8-byte IEEE-754 floating point numbers
Histogram (see the following section)
Histograms
In addition to the types above, a datum may also be a histogram. Histograms
let us compactly represent a large amount of data. Rather than individual
values, we store a set of bins and counts within those bins. At each
bin, we keep track of the total number of samples whose value falls between the
upper and lower edges of the bin. For example, one might track the I/O latency
on a disk using a histogram, because there may be a huge number of total I/Os in
any time interval. For oximeter
histograms, the bins may be any integer or
floating-point type, and the counts are always 64-bit unsigned integers.
Let’s look at an example histogram, one tracking those I/O latencies on a disk.
Start time | Timestamp | Bins (milliseconds) | Counts |
---|---|---|---|
17:44:22 | 17:44:22 | [1, 2, 4, 8, … ] | [0, 0, 0, 0, … ] |
17:44:22 | 17:44:32 | [1, 2, 4, 8, … ] | [0, 2, 9, 13, … ] |
17:44:22 | 17:44:42 | [1, 2, 4, 8, … ] | [1, 8, 24, 78, … ] |
The column "Bins" shows the right edges of the first few bins. Each is a time in milliseconds. As I/Os are completed, their latency is measured, and we look up the smallest bin in this array that is larger than the latency, and increment the count in that bin by 1.
Histograms greatly compress the underlying data. There may be thousands or millions of I/Os every second on modern SSDs, however our histogram may only have a few tens or hundreds of buckets. We lose information, since we don’t know the exactly latency for any I/O, only that it was within the appropriate bin. In exchange, we may accurate capture information about the distribution of a large number of I/Os, while using a fixed amount of memory per datum. The memory usage is a function of the number of bins, not the number of samples.
Missing data points
Sometimes we fail to collect a sample for a timeseries. For example, we might fail to access the hardware containing a temperature sensor, or be unable to reach a peer over the internal network to determine its state. In these situations, we may generate a missing datum.
Missing data points are represented by the JSON value null
in the output of a
query. Missing points still contain lots of the information that a valid sample
does, including a timestamp, datum type, and start time for cumulative counters
or histograms. It is often important to consider missing data. By providing an
explicit null
as the value along with the timing and type information
present for other points, one can accurately distinguish the absence of valid
data from the absence of any measured activity at all.
Metric type
The datum type explored in the previous section describes the range of values that a datum can take. Data in a timeseries also has a metric type, which captures how data at different times relates to each other. The metric type takes one of three values:
Gauges
Cumulative counters
Deltas
Gauges
A gauge represents an instantaneous measurement. It captures the value of an underlying metric at one point in time. Each data point is completely independent of one another, different measurements of some underlying quantity taken at different timestamps. For example, the number of items resident in a queue at any time is a gauge, shown below.
Timestamp | Value |
---|---|
17:44:22 | 3 |
17:44:32 | 7 |
17:44:42 | 0 |
17:44:54 | 10 |
At each timestamp, a new, independent sample is recorded. The values may go up or down arbitrarily — in this case, as the queue receives and processes items at different rates.
Cumulative counters
In contrast to gauges, which are instantaneous measurements, a cumulative counter tracks a total value over a range of time. It has start time in addition to its timestamp, which is the beginning of its time range. Its value represents the accumulated total encountered from the start time up to the recorded timestamp.
Cumulative counters are widely used in the Oxide product. It’s always possible that an attempt to collect a sample fails — software may be unable to access a hardware sensor or reach a peer on the internal network. Cumulative counters still provide information about the period of time we missed a sample, because the next acquired sample will count events that occurred during that time. Cumulative counters are also useful for identifying program crashes, denoted when the start time for the counter resets. See the section on missing data points for more information.
Here is an example cumulative counter.
Start time | Timestamp | Value |
---|---|---|
17:44:22 | 17:44:22 | 0 |
17:44:22 | 17:44:32 | 1 |
17:44:22 | 17:44:42 | 8 |
17:44:44 | 17:44:52 | 1 |
Continuing with our example of a queue, suppose this cumulative counter tracks the total number of items sent on the queue. (The gauge above tracked the number of items in the queue at any one time.) The counter begins with a start time of 17:44:22, and a sample is taken immediately. The value is 0, meaning no items have been pushed onto the queue.
Another sample is taken 10 seconds later. The start time is the same, and the value has increased to 1. The queue received 1 new item in this 10 second window, and has now seen 1 item total.
After another 10 seconds, an additional sample is taken with the value 8. The queue has received 8 items in total now, and so 7 new items in this time window.
10 seconds later we see a new sample, but the start time has also changed! This could mean that the program managing the queue has restarted, for example. The value is now 1, which means between 17:44:44 and 17:44:52, there was 1 item sent on the queue.
Deltas
The last metric type in OxQL is a delta. In a timeseries with a delta metric type, the values at each point in time represent the change over a time range. Similar to cumulative counters, deltas have a start time in addition to a timestamp. In contrast, the start times are always non-overlapping, with the start time of one bin exactly equal to the timestamp of the one before it.
Start time | Timestamp | Value |
---|---|---|
17:44:22 | 17:44:32 | 2 |
17:44:32 | 17:44:42 | 7 |
17:44:42 | 17:44:52 | 1 |
17:44:52 | 17:45:02 | 3 |
Here is an example delta timeseries. We can see that each timestamp is equal to the start time of the following sample (if it exists). The value counts the total number of items just within that window.
Deltas are common in OxQL queries. That is mainly because, when selecting data from a cumulative table, it is automatically converted into a delta! Why is that?
As mentioned above, much of the data collected in the Oxide rack is cumulative, whether a counter or a histogram. Cumulative metrics have a number of nice properties that make the collected data more robust and useful in a complex distributed system. (See the section on cumulatives for details.) Despite their wide use, cumulative data is actually quite tricky to work with. The samples in a cumulative timeseries are not independent, since later values contain overlapping information with earlier values. This makes it difficult to combine data from multiple sources, either within the same timeseries (for example, temporally averaging) or across timeseries (say, averaging during a grouping operation). If one were to temporally average a cumulative counter into large time windows, it would weight earlier samples more heavily, since an incremement in an early time range is counted again for each later time range in which it also appears.
To address this, both cumulative counters and histograms are automatically and implicitly converted to deltas when they are selected from the database. (There is currently no way to select raw, cumulative data, although this will likely be supported in the future.) After retrieving the data, OxQL computes the adjacent differences between each sample to convert a cumulative into a delta metric. This lets us aggregate, average, join, group, and align timeseries, since each data point is independent of its neighbors.
Original cumulative
| Derived delta
|
The cumulative data shown above is similar to the previous cumulative counter example. The first three rows all share a start time, and the timestamps increment regularly. The value is also monotonically increasing. In the fourth row, we see a restart of the counter, with the start time and value both resetting.
The delta data on the right is derived from the cumulative, depending on whether the counter reset. If it did not reset, the start time of each row in the delta is set to the timestamp of the previous sample (if one exists), and the value is difference between the current value and the previous (if it exists). If the cumulative counter did reset, then the start time, timestamp, and value are all equal to the corresponding value from the cumulative counter.
The Oximeter Query Language
Now that we’ve seen several examples of OxQL queries, and defined the key terms and concepts, this section will serve as a more formal reference to the elements of the language. We’ll first cover the syntax of the language, focusing on how strings are parsed into queries, and then cover the key semantic components of a query.
Lexical items
Lexical items are the basic words or units that form the vocabulary of a formal language. This section describes the recognized elements for OxQL. This includes components like tables or field names; functions or table operations; comparison operators; and literals. The grammar of each of these items is described here.
Identifiers
An identifier is a name, such as for a table operation or the field of a table.
IDENTIFIER = { [a-z]+ [a-z0-9]* ("_" [a-z0-9]+)* | SPECIAL_IDENTIFIER } SPECIAL_IDENTIFIER = { "timestamp" | "start_time" | "datum" | "bins" | "counts" }
For example, in the table operation filter link_name == "foo"
, both the
operation name filter
and the field name link_name
are identifiers.
Identifiers are formatted as lowercase, ASCII characters or digits, possibly delimited by underscores, and have the following structure:
Identifiers may not start with a digit, though the underscore-delimited words after the first may start that way. The special identifiers are implicitly added to a table by the OxQL query engine, and are used to refer to the common elements that all tables and timeseries share, such as their timestamp or the datum value itself. See [_terminology] for details on the anatomy of a timeseries.
timestamp
sled_serial
time0
time_new_0
123
0time
a-b-c
!
Table names
A table name is used to identify a table in the database or the output of an OxQL query.
TARGET_NAME = { [a-z]+ [a-z0-9]* ("_" [a-z0-9]+)* } METRIC_NAME = TARGET_NAME TABLE_NAME = { TARGET_NAME ":" METRIC_NAME }
The initial data input to OxQL queries is a table
stored in the telemetry database. These are identified by a name with a
particular structure: a target resource and a metric name, separated by a :
.
For example, the sled_data_link:bytes_sent
table lists the number of bytes
sent on a single sled’s NIC.
Table names are also used to name the output of a query. In many cases, this is
the same as the input table name, but may change during processing. For example,
the name of the output of a join
operation is a concatenation of the input
table names.
Literals
OxQL supports the common types of literals, usually used in filtering expressions or as arguments to table operations. This includes things like booleans, timestamps, numeric values, and strings. Where possible and helpful, the regular expression matching each literal has been included below.
Booleans
The literals true
and false
are used for boolean values.
BOOLEAN = { true | false }
Integers
OxQL supports signed and unsigned integers.
DECIMAL_INTEGER = { ["-"]? (0 | ([1-9] [0-9]+)) } HEX_INTEGER = { ["-"]? "0x" (0 | ([1-9] [0-9]+)) } INTEGER = { DECIMAL_INTEGER | HEX_INTEGER }
Signed and unsigned integers are supported. These are sequences of digits, delimited by whitespace. Both base-10 and base-16 (hexadecimal) integers are currently supported.
Floats
OxQL supports floating point literals, in either decimal or scientific notation
DIGIT = { [0-9] } SIGN = { [+-] } EXPONENT = { ('e' | 'E') SIGN? DIGIT+ } NUMBER = { (DIGIT+ | DIGIT + '.' DIGIT* | DIGIT* '.' DIGIT+) EXPONENT? } FLOAT = { SIGN? ( 'inf' | 'infinity" | 'nan' | NUMBER ) }
Floating point values are supported, written in either decimal
notation, with a point, such as 3.0
, or in scientific notation, such as
3.0e-3
. Floats parsed from queries are stored as IEEE-754 64-bit doubles.
(Floats for data stored in the database maybe be single- or
double-precision, which are widened if needed during a query.) Values that are
not exactly representable as IEEE-754 values will be rounded to the nearest
representable value.
IP addresses
OxQL supports IP addresses of either the V4 or V6 family.
IPv4 and IPv6 addresses are supported, written in the familiar dotted-decimal
notation for IPv4, e.g. "127.0.0.1"
, and colon-separated hex notation for
IPv6, e.g., "2001:adcd::1"
. Addresses are surrounded by double-quotes, "
.
IPv6 addresses should not be surrounded with brackets ([]
), as is sometimes
common when they are followed by a port or scope ID.
UUIDs
Universally-unique identifiers (UUIDs) are 128-bit values, commonly used to identify resources in the Oxide control plane
HEX_CHAR = { 'a-f' | 'A-F' | [0-9] } DASHED_UUID = { HEX_CHAR{8} '-' ( HEX_CHAR{4} '-' ){3} HEX_CHAR{12} } BARE_UUID = { HEX_CHAR{32} } UUID = { '"' (DASHED_UUID | BARE_UUID) '"' }
UUIDs are written as double-quoted strings of hexadecimal characters. Supported
formats include the common 8-4-4-4-12, where groups of 8, 4, 4, 4, and 12 hex
characters are separated by dashes, such as
"116068ca-dcc7-4c0d-9a24-82dc7e0a0bc1"
. The dashes may also be omitted, so
that the previous value is equivalent to "116068cadcc74c0d9a2482dc7e0a0bc1"
.
The ASCII characters in a UUID are case-insensitive,
"116068ca-dcc7-4c0d-9a24-82dc7e0a0bc1"
is equivalent to
"116068CA-DCC7-4C0D-9A24-82DC7E0A0BC1"
.
Durations
An OxQL literal to denote a duration of time.
YEAR = "Y" MONTH = "M" WEEK = "w" DAY = "d" HOUR = "h" SECOND = "s" MILLISECOND = "ms" MICROSECOND = "us" NANOSECOND = "ns" UNIT = { YEAR | MONTH | WEEK | DAY | HOUR | SECOND | MILLISECOND | MICROSECOND | NANOSECOND } DURATION = { DECIMAL_INTEGER UNIT }
A literal duration can be written as an unsigned integer count, followed by a duration unit. Supported units, in decreasing size, are:
Y
: an approximate year, 365 daysM
: an approximate month, 30 daysw
: an approximate week, 7 daysd
: an approximate day, 1 24-hour periodh
: an hour, 3600 secondsm
: a minute, 60 secondss
: secondsms
: millisecondsus
: microsecondsns
: nanoseconds
For example, the duration 7d
is equal to 7 24-hour periods. Note that the
larger durations are all approximate. One year is always 365 24-hour periods,
and there is no accounting for leap years.
M
refers
to a month and lowercase m
to a minute, for example.Timestamps
A timestamp literal is used to specify an instant in time.
YEAR = { [1-9] [0-9]+ } MONTH = { [1-9] | 10 | 11 | 12 } DAY = { [1-9] | [1-2][0-9] | 3[0-1] } HOUR = { [0-9] | 1[0-9] | 2[0-4] } MINUTE = { [0-9] | [1-5][0-9] } SECOND = { [0-9] | [1-5][0-9] } SUBSEC = { "." [0-9]{1,6} } DATE = { "@" YEAR "-" MONTH "-" DAY } TIME = { "@" HOUR ":" MINUTE ":" SECOND } DATETIME = { "@" DATE "T" TIME SUBSEC? | NOW } NOW = { "@now()" } TIMESTAMP = { DATE | TIME | DATETIME | NOW }
Timestamps may be expressed in a few formats, all beginning with the @
symbol.
They can be written as just a date, such as @2024-01-02
. In this case,
midnight UTC on the specified date (2 Jan 2024) is used as the time, so the
above value is equal to 2024-01-02T00:00:00 UTC
. All timestamps are in UTC,
and there is no mechanism to express a particular timezone or offset.
Supported formats are:
@%Y-%m-%d
: A single date, with a time given by midnight.@%H:%M:%S
: A time within the current day (UTC). Note that this may differ from the current day of the user’s local timezone.@%Y-%m-%dT%H:%M:%S.%f
: A date and time, including fractions of a second up to 9 significant digits. This is very similar to the RFC 3339 format, except that neither an offset nor timezone is allowed.@now()
: The special timestamp literal@now()
can be used to represent the current time, when the query is parsed. In this case, it’s also possible to add or subtract a duration literal from the value. For example,@now() - 1m
represents one minute in the past. This kind of arithmetic is exclusive to timestamps expressed with@now()
, and may not be used with other timestamp formats.
Strings
String literals are UTF-8 strings, quoted in either single- or double-quotes.
For example, "this"
and 'this'
are equivalent. If the string contains
quotes itself, then they should use the opposite kind of quote to surround it.
For example "this contains 'single quotes'"
, or 'this contains "double
quotes"'
are both acceptable. If you need to include both kinds of quotes,
escape-sequences are also recognized. For example, 'this string contains
\'single quotes\'"
is acceptable. The full list of supported escape-sequences
is:
\n
\r
\t
\\
\0
\u
The last escape-sequence is used to support full Unicode codepoints. Those must
be written as sequences of hex digits, enclosed in \u{}
. For example, the
value \u{1234}
will decode to the codepoint ሴ
— those two representations
are equivalent.
Comparison and logical operators
Comparison operators are used to compare two values, and return a boolean. Most familiar operators are supported, including equality, less-than, etc. Here is the full list:
EQ = { "==" } NE = { "!=" } GT = { ">" } GE = { ">=" } LT = { "<" } LE = { "<=" } REGEX_MATCH = { "~=" } COMPARISON_OP = { EQ | NE | GT | GE | LT | LE | REGEX_MATCH }
The last operator, ~=
implements regular expression matching, and is only
supported for strings. Expressions such as "[a-f]+0$
can be used, for example,
to match any string ending with one or more character a-f
and a
single digit 0
. The full supported regular expression syntax is described
here.
Logical operators can also be used to combine one or two boolean values together. The supported operators are:
LOGICAL_OP = { AND | OR | XOR | NOT } AND = { "&&" } OR = { "||" } XOR = { "^" } NOT = { "!" }
Note that the XOR
operator is logical, not bitwise. The precendece of the
operators follows the common programming order: in order of increasing
precedence, they are OR
, AND
, XOR
, and NOT
.
Basic query structure
An OxQL query is composed of a pipeline of table operations, each of which applies a kind of data transformation to its inputs. Queries can be either flat, composed of simple table operations, or nested, composed of subqueries themselves.
QUERY = { FLAT_QUERY | NESTED_QUERY } FLAT_QUERY = TABLE_OP [ | TABLE_OP | ... ] NESTED_QUERY = '{' QUERY; QUERY; [ QUERY; ... ] '}'
A nested query is written as ;
-delimited subqueries, all contained within
braces ({}
).
Table operations
The fundamental unit of an OxQL query is a table operation. Each table is zero or more timeseries, all with the same field names, field types, and data types. In this sense, a table is analogous to a table in a traditional relational database: it contains entries all with the same schema. In this case, an entry is a timeseries, which has a unique set of field values along with its timestamped data points. (See [_terminology] for more details.) Below is a summary of each supported table operation, and more details on each are in the following sections.
Name | # input tables | # output tables | Description | Constraints |
---|---|---|---|---|
| 0 | 1 | Retrieve a table by name | None |
| N > 0 | N | Filter timeseries in a table | None |
| N > 0 | N | Temporally align a tables’s samples | None |
| 1 | 1 | Group timeseries in a table by some fields, reducing along the unnamed fields | Timeseries must aligned |
| N > 1 | 1 | Natural inner join between tow or more tables, matching up timeseries by their field names, and data points by timestamp | Timeseries must be aligned |
| N > 0 | N | Limit the data points from each timeseries to
the first or last | None |
Each table operation may take zero or more tables, and produce zero or more new tables after processing their input. The number of supported tables, and any restrictions on data types, are listed in the section describing each operation.
All queries or subqueries must begin by extracting some data from the timeseries
database, on which the rest of the query operates. This is done using the get
table operation, described in detail here.
Constructing pipelines
Complex queries are constructed by stringing together table operations with the
familiar pipe operator, |
. The output of each table operation is fed into
the next. For example, the query
get sled_data_link:bytes_sent | filter timestamp > @now() - 1m
will first get the raw data from the sled_data_link:bytes_sent
table, and then
restrict it to the last minute from the time the query is run. This will result
in one table, since there is one table named in the query. There may be zero,
one, or many timeseries in the table, depending on the data available in the
database when the query is run.
The above query’s output might look like this:
sled_data_link:bytes_sent kind (string): vnic link_name (string): oxBootstrap0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 71def415-55ad-46b4-ba88-3ca55d7fb287 sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220051 zone_name (string): oxz_switch 1986-12-28T00:00:40.081399425Z - 2024-08-17T21:54:42.930353607Z: 22456314166 2024-08-17T21:54:42.930353607Z - 2024-08-17T21:54:52.936228303Z: 0 2024-08-17T21:54:52.936228303Z - 2024-08-17T21:55:02.939660391Z: 7296 2024-08-17T21:55:02.939660391Z - 2024-08-17T21:55:13.234281307Z: 0 2024-08-17T21:55:13.234281307Z - 2024-08-17T21:55:23.240183102Z: 0 2024-08-17T21:55:23.240183102Z - 2024-08-17T21:55:33.243404269Z: 7296 kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM44220011 zone_name (string): global 1986-12-28T00:00:35.747836119Z - 2024-08-17T21:54:41.542113525Z: 19729551115 2024-08-17T21:54:41.542113525Z - 2024-08-17T21:54:51.547409164Z: 855335 2024-08-17T21:54:51.547409164Z - 2024-08-17T21:55:02.333056352Z: 332582 2024-08-17T21:55:02.333056352Z - 2024-08-17T21:55:12.335451187Z: 561008 <... output continues ...>
Subqueries
As mentioned above, queries can be either flat or
nested. A query is nested if it contains subqueries, two or more queries
inside braces ({}
), separated by a semicolon (;
). Each subquery itself can
be flat or nested, so that quite complex processing pipelines can be built up
using these basic elements. For example the query:
{
get sled_data_link:bytes_sent;
get sled_data_link:bytes_received
}
| filter timestamp > @now() - 1m && link_name == "cxgbe0"
will get the number of bytes sent and received on all sleds' physical data links, over the last minute. Further filtering can be done to limit the data, described in detail below. Here is an example output:
sled_data_link:bytes_sent kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): bd96ef7c-4941-4729-b6f7-5f47feecbc4b sled_model (string): 913-0000019 sled_revision (u32): 13 sled_serial (string): BRM27230045 zone_name (string): global 1986-12-28T00:00:35.608843070Z - 2024-08-17T21:55:57.796047457Z: 6232264822 2024-08-17T21:55:57.796047457Z - 2024-08-17T21:56:07.803023952Z: 89948 2024-08-17T21:56:07.803023952Z - 2024-08-17T21:56:17.808466667Z: 74393 kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM44220011 zone_name (string): global 1986-12-28T00:00:35.747836119Z - 2024-08-17T21:55:52.346916391Z: 19733555641 2024-08-17T21:55:52.346916391Z - 2024-08-17T21:56:02.349027176Z: 303074 2024-08-17T21:56:02.349027176Z - 2024-08-17T21:56:12.353425030Z: 559113 2024-08-17T21:56:12.353425030Z - 2024-08-17T21:56:22.358928927Z: 783220 2024-08-17T21:56:22.358928927Z - 2024-08-17T21:56:32.364230588Z: 296332 2024-08-17T21:56:32.364230588Z - 2024-08-17T21:56:42.752143536Z: 310945 <... output continues ...> sled_data_link:bytes_received kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 71def415-55ad-46b4-ba88-3ca55d7fb287 sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220051 zone_name (string): global 1986-12-28T00:00:35.890800854Z - 2024-08-17T21:55:54.258302609Z: 248829949334 2024-08-17T21:55:54.258302609Z - 2024-08-17T21:56:04.260650837Z: 542105 2024-08-17T21:56:04.260650837Z - 2024-08-17T21:56:14.263301567Z: 539631 2024-08-17T21:56:14.263301567Z - 2024-08-17T21:56:24.265977975Z: 260104 kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM44220011 zone_name (string): global 1986-12-28T00:00:35.747836119Z - 2024-08-17T21:55:52.346916391Z: 1715906069 2024-08-17T21:55:52.346916391Z - 2024-08-17T21:56:02.349027176Z: 60795 2024-08-17T21:56:02.349027176Z - 2024-08-17T21:56:12.353425030Z: 89646 2024-08-17T21:56:12.353425030Z - 2024-08-17T21:56:22.358928927Z: 93582 2024-08-17T21:56:22.358928927Z - 2024-08-17T21:56:32.364230588Z: 58751 2024-08-17T21:56:32.364230588Z - 2024-08-17T21:56:42.752143536Z: 90325
Note that there are two tables here, printed one after the other. The first
shows the table sled_data_link:bytes_sent
, with its timeseries. This is first
because the subquery listed it first. Next is the
sled_data_link:bytes_received
table, with all its timeseries.
Supported table operations
Table operations form the main processing component of an OxQL query. Each operation receives some number of tables from the operation before it; processes those tables; and generates some number of tables for the next operation to consume.
get
The primordial table operation is get
. This retrieves a single table by its
name name from the database. It takes zero tables as input, and produces exactly
one table as output. That output table may have any number of timeseries in it.
GET = { "get" TIMESERIES_NAME }
Example:
get sled_data_link:bytes_sent
filter
A filter
table operation passes through data which matches the provided
filtering expression. Filters take an identifier, a comparison
operator, and a literal. (In the future, more complex expressions beyond
literals may be supported.)
FILTER = { "filter" FILTER_EXPR } FILTER_EXPR = { SIMPLE_FILTER | COMPOUND_FILTER } SIMPLE_FILTER = { IDENTIFIER COMPARISON_OP LITERAL } COMPOUND_FILTER = { NOT? "("? FILTER LOGICAL_OP FILTER ")"? }
See [_comparison_and_logical_operators] and [_literals] for details on those components.
The provided identifier may name any one of the
fields in the table, or refer to a special identifier, such
as timestamp
. For cumulative or delta metric types, the start_time
identifier is also supported. For scalar (non-histogram) timeseries, the datum
identifier is used to refer to the value of the data point. For histograms, the
bins
and counts
identifiers refer to the values of the data point instead.
Filters may include compound expressions, joined with the logical operators
defined above. The precedence rules are
also defined in that section, but note that comparison operators always bind
more tightly than logical operators. In other words, foo == 0 && bar == 1
is
equivalent to (foo == 0) && (bar == 1)
. Parentheses may be used to group
expressions.
Below are several examples of filter
table operations.
Pass data within the last 1 minute:
filter timestamp > @now() - 1m`
Data within the last 1 minute, where a field named
link_name
is exactly equal to the string"foo"
:
filter timestamp > @now() - 1m && link_name == "foo"
Filter points to those in the last day, with a 500 status code; or in the day before, with a status code between 400 and 500:
filter
(timestamp > @now() - 1d && status_code == 500) ||
(timestamp > @now() - 2d && timestamp < @now() - 1d && status_code > 400 && status_code < 500)
Note that the literals used to compare against a field must be comparable to the
actual data type of that field. It is an error, for example, to write filter
foo == 0
if the field "foo"
of the input table doesn’t have an integer type.
align
The align
table operation is used to generate output samples that are evenly
spaced in time. This is critical for combining data from more than one
timeseries or table. Timeseries data is usually sampled on an approximately
regular interval, but almost never exactly so. The align
operation can be used
to generate exactly-spaced samples. This allows points from one timeseries to be
confidently matched up with those of another, letting us perform grouping,
aggregation, or other multi-timeseries operations.
ALIGN = { "align" ALIGNMENT_METHOD } ALIGNMENT_METHOD = { "mean_within(" DURATION ")" }
The align
table operation takes one argument, which is the alignment method.
Currently the only supported method is mean_within
, which computes an average
of the input timeseries over the provided duration. For example:
get sled_data_link:bytes_sent
| filter timestamp > @now() - 1h
| align mean_within(5m)
will compute the number of bytes sent on a data link over the last one hour, averaged in five minute windows. The output samples will be spaced exactly five minutes apart.
The alignment method also specifies what it means for an input sample to
"overlap" with the output period. For the mean_within
method, any input
timeseries that has any part of its sample period within the output period is
considered to overlap, with the amount of overlap determining the
contribution of that value to the output. This effectively prorates each point
by the amount its time range overlaps with the output window.
For example, suppose there is an input sample over the range [0.1s, 1.1s]
, and
the output period is exactly [0.0s, 1.0s]
. Then the input sample will
contribute its value weighted by 0.8
, since 80% of its interval overlaps with
the output window. That is, the input is linearly weighted in the output, where
the weight is the amount of its interval which overlaps with the output period.
If there are zero data points in an output period, then a missing value is
produced in that period, represented by the JSON value null
in the output
data array.
group_by
The group_by
table operation can be used to group more than one timeseries in
a table together. One specifies a list of field names on which to group, and
optional a method for reducing all the points within a group.
GROUP_BY = { "group_by" "[" (FIELD_NAME),* "]" ("," REDUCER)? } REDUCER = { "mean" | "sum" }
Note that the input timeseries must be aligned first, so that samples from the input timeseries can be correlated across time. An error is generated if the inputs are not aligned. Example:
get sled_data_link:bytes_received
| align mean_within(1m)
| group_by [sled_id], sum
The above query will compute the average number of bytes received on each sled’s
NICs, within one minute periods, and then group by the sled’s UUID. This will
remove all the other fields, leaving only the sled_id
field. Within each
group, all data points at the same timestamp will be summed to generate the
output result. In other words, this computes the total number of bytes received
for all links on a compute sled.
In contrast, the query:
get sled_data_link:bytes_received
| align mean_within(1m)
| group_by [sled_id], mean
Will compute the average number of bytes received in each group.
join
Grouping is one way to merge data from multiple timeseries. However, one may also wish to correlate data from more than one timeseries over time. For example, one might want to know how many bytes are sent or received for a given sled.
One way to do this is the join
table operation, which combines two or more
tables together. For example:
{
get sled_data_link:bytes_sent;
get sled_data_link:bytes_received
}
| align mean_within(1m)
| join
This query will first retrieve data for both the bytes sent and bytes received
tables. They will separately be aligned to one minute periods by averaging,
since the align
operation takes any number of tables as input, and produces
the same number as its output.
Finally the join
table operation (which takes no arguments), performs a
natural
inner join one two or more tables. This merges data from timeseries in each
table with the exact same values for their fields. It combines data points by
concatenating those values which occur at the same timestamp.
For example, the query:
{
get sled_data_link:bytes_sent;
get sled_data_link:bytes_received
}
| filter timestamp > @2024-03-16T00:00:00 && timestamp < @2024-03-16T00:02:00
| align mean_within(20s)
| join
Will produce data at 20s intervals, each with two values for the point: the first will give the average number of bytes sent in that interval, and the second the average bytes received. For example, entries might look like this:
2024-03-16 00:00:20.000000000: [1296.3357361535034,1347.6610151429818] 2024-03-16 00:00:40.000000000: [1174.90794741706,1191.0918657310574] 2024-03-16 00:01:00.000000000: [1069.655113621735,1214.166748853574] 2024-03-16 00:01:20.000000000: [1409.5862705234695,1411.122044253881] 2024-03-16 00:01:40.000000000: [1163.3038173144982,1158.1461762530694] 2024-03-16 00:02:00.000000000: [609.0538622363117,642.6238774860385]
group_by
, the join
table operation requires that its input
timeseries be aligned. An error is generated if that’s not the case.first
and last
The first
and last
table operations are used to limit the number of data
points from each query. They both take an unsigned integer argument k
,
limiting the output to at most k
samples, independently for each timeseries in
each table.
For example:
get sled_data_link:bytes_sent
| filter kind == "physical"
| last 1
will select the timeseries from the sled_data_link:bytes_sent
table
corresponding to the physical data links, and take the very last sample from
each. The output might look like this:
sled_data_link:bytes_sent kind (string): physical link_name (string): cxgbe0 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 71def415-55ad-46b4-ba88-3ca55d7fb287 sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM42220051 zone_name (string): global 1986-12-28T00:00:35.449571704Z - 2024-08-23T16:49:51.282315601Z: 23912605713 kind (string): physical link_name (string): cxgbe1 rack_id (uuid): de608e01-b8e4-4d93-b972-a7dbed36dd22 sled_id (uuid): 2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa sled_model (string): 913-0000019 sled_revision (u32): 6 sled_serial (string): BRM44220011 zone_name (string): global 1986-12-28T00:00:35.212682494Z - 2024-08-23T16:49:41.356651510Z: 18131229015 <... output continues ...>
Query output
All OxQL queries output a list of tables. Each table includes some number of timeseries, and each timeseries includes metadata about the data types; field names and values; and the data itself. (See [_terminology] for more details on these terms.)
As mentioned in the examples, the outputs shown above have been simplified and formatted to show the most salient components. The query output returned from the Oxide API does contain all the metadata needed to understand it, including the timeseries names, details about the fields, and data and metric types.
See the OpenAPI spec for the full JSON schema describing a query output.
OxQL cookbook
This section is a quick reference of useful queries. It’s a sort of cookbook for answering common questions, with links to detail sections where useful.
Last sample
get sled_data_link:bytes_sent | last 1
Select the last sample from all timeseries in the sled_data_link:bytes_sent
table. See this section for more details.
Last 5 minutes
get sled_data_link:bytes_sent | filter timestamp > @now() - 5m
Select the last 5 minutes of data from all timeseries in the
sled_data_link:bytes_sent
table. See the section on the filter
table operation and the section on timestamps.
Average in 5 minute windows
get sled_data_link:bytes_sent | align mean_within(5m)
Average all timeseries in the sled_data_link:bytes_sent
table into
non-overlapping 5 minute time windows. See the section on the align
table operation.
Group by a unique ID
get sled_data_link:bytes_sent | align mean_with(5m) | group_by [sled_id]
Average all timeseries in the sled_data_link:bytes_sent
table into
non-overlapping 5 minute time windows, and also average all timeseries with the
same value for the sled_id
field. See the section on the
group_by
table operation.
Correlate two timeseries
{
get sled_data_link:bytes_sent;
get sled_data_link:bytes_received
}
| align mean_with(5m)
| join
Average all timeseries independently from the sled_data_link:bytes_sent
and
sled_data_link:bytes_received
tables into 5 minute windows, and then
concatenate the samples from timeseries with the same field values, at the same
timestamps. See the section on the join
table operation.
Security Considerations
Ideally, queries in OxQL would take the exact same security stance as the rest of Oxide’s public API. We should avoid leaking information in query results that users cannot otherwise see. As an example, it would be nice if users could not see the vCPU utilization of instances in a project for which they don’t have read access. Ideally, we would effectively show the user only the subset of the metric database that refers to objects they can read in other parts of the API.
However, this approach seems complex and expensive, and possibly intractable. Rather than answer this particular question here, we defer to future RFDs to discuss the goals for authorizing metric data, and how we might implement it efficiently in the control plane.
External References
[RFD 125] RFD 125, Telemetry building blocks and requirements
[RFD 161] RFD 161, Metrics data model
[RFD 162] RFD 162, Metrics collection architecture and design
[RFD 304] RFD 304, You CAN Have Your Metric and Query it Too: A Proposal for Future Telemetry APIs
[RFD 442] RFD 442, Requirements for metrics API, and analysis of endpoints from cloud vendors (and VMware)