12 - Host CPU Evaluation / RFD / Oxide

RFD

Authors

Updated

The host CPU determination is an early and important decision for Oxide. As this document will eludicate, this determination is both essential and nuanced: there are two options that seem to have very different tradeoffs — each one carrying with it high levels of risk and uncertainty.

The following image relates this RFD to other ones in a hierarchical fashion.

RFD Relationships

Constraints

Instruction set architecture

While we laud alternative instruction set architectures, server-side computing remains dominated by x86. This dominance stems from multiple historical factors, not least that it triumphed in personal computing nearly four decades ago. Now, there is reason to believe that this dominance is more fragile than ever: with ARM’s Ares core (the basis for Amazon’s Graviton 2) there is at last a non-x86 core that can meaningfully compete with (Intel) x86 on the server — but it is not clear to what degree this will get adoption in the long tail of software development. (For example, will software developers who are increasingly returning to a binary model embrace cross compilation and two different instruction set targets?) In our target market of the enterprise space, it seems likely that any adoption of hybrid instruction sets will be slower still; Oxide will have x86-based host CPUs for the foreseeable future.

Evaluation Criteria

Performance

The most obvious criteria for the host CPU is performance. While quantifiable, performance is also incredibly nuanced: with so many different ways of describing the performance of a system, it’s easy to measure (or emphasize) an aspect of the system that is practically irrelevant. Ultimately, the performance that matters is the performance of a customer workload — which may or may not be what vendors measure or design around! That said, there remain obvious axes of microprocessor performance: clock rate; number of cores; cache size; cache architecture. (In terms of quantifying these, SPECrate is generally reasonable.) These axes are directly affected by density/process: barring architectural differences, one cannot expect (say) a 14nm part to ever be broadly competitive with a 7nm one. There are more architectural aspects to performance as well: on-die acceleration (e.g., SIMD extensions), compiler optimization, cache coherence architecture, etc. And then there are system aspects of performance that can be implied by processor choice: bus speed, memory speed, memory sizing etc. Performance certainly isn’t our only criteria, and its importance must be kept in perspective, but it also must be viewed as central: Oxide racks will compete with extant infrastructure; the higher their performance (however measured, observed or perceived), the more those racks will enjoy a tail wind in our customer base.

Price

The second obvious criteria for any microprocessor is price. Historically, the CPU is the most expensive single component in the BOM, but the price can vary widely within a single family. For example, when it launched, Intel’s Cascade Lake ranged in price from $1600 to $17900 for a single socket — and that only includes the "Gold" and "Platinum" classes of Xeon. At the quantities that we are seeking (on the order of ~96 sockets per rack), these price differences have enormous impact on the economics of the entire rack. In this regard, price almost always needs to be denominated — e.g., as $/core or $/SPECrate.

Thermal design point (TDP)

Critical for us (or anyone looking to achieve any kind of density) is the thermal design point (TDP). This is — more or less — the draw of the part that the needs to be cooled. Historically, dense form factors like the OCP Tioga Pass form factor have allowed TDP per socket of around ~165W; beyond that, a different form factor would need to be considered. Note that 250W is generally the threshold for air cooling; beyond TDP of 250W, a CPU must be water cooled. Moreover, power for the rack must be considered: assuming an Open Rack, we should assume no more than ~15kW at ~32OU — or about 450W per OU. This must be all in (DRAM, NVMe, NIC, etc.), so the CPU power budget must be less than this, although to what degree is to be determined.

Transistor density/process

Transistor density has many implications for the system, in terms of performance, power and price. Regrettably, for historical reasons, transistor density (which ideally should be expressed in terms of transistors per fixed unit area) is expressed by the industry in terms of feature size — which doesn’t actually correspond to anything. (That is, in a 7nm process — and especially in the forthcoming 5nm processes — there is nothing that is in fact as small as the putative feature size.) This has given rise to absurd process nodes like Intel’s "14nm+++" (yes, three plusses). Making things worse, vendors making claims that are difficult to verify ("our 10nm is equivalent to their 7nm"). Density is critically important (there is unquestionably a density difference between, say, 14nm and 7nm), but it is likely best measured through its manifestation in the artifact: core count, cache sizing, TDP, etc.

Core density

Closely related to TDP and transistor density is core density. We are seeking a dense product offering, with more than 1344 cores in a 15 kW rack. Core density should be thought of per U — and if achieving that desired core density necessitates multiple sockets, issues around multiple sockets must be considered (see, e.g. "Cross-connect", below).

Yield

Related to transistor density is yield. We have no control over yield and yield often isn’t disclosed, but there are architectural decisions that can materially affect yield. In particular, chiplet designs can result in (much) higher yield because of their (much) smaller size. Similarly, tiled designs can circumvent some of the yield problems that may be present in a large monolithic design.

Platform

CPUs do not exist on their own; they exist in a larger system that must support them. CPUs are mated to specific platforms in several key attributes, but especially via I/O (i.e., generation of PCIe) and memory (i.e., generation of DDR SDRAM). This is particularly germane for us because we happen to be targeting a product that is arguably at the tail end of one generation or the leading edge of the next. Specifically, both Intel and AMD have roadmap CPUs (Sapphire Rapids and Genoa, respectively) that are targeting PCIe Gen 5 and DDR5; to build on either of these parts is to build a dependency on both of these technologies — each of which has economic, performance, complexity, and schedule risk implications.

Roadmap/schedule

Microprocessors take a long time to develop — especially given dense processes like 7nm. A roadmap can therefore be a concrete way of predicting the future in that one can often known years ahead of time if a particular CPU is going to be competitive or not. Roadmaps should be taken as a best case; schedules certainly can slip. In general, late projects get later, and when there is a history of slip, more can be reasonably expected.

Strategic relationship

Our strategic relationship with a microprocessor vendor can have an outsized effect: the ability to get the support we need when we need it can often come down to the depth of our relationship. As a startup, it is a challenge for us to be strategically relevant to much larger companies.

Security

Microprocessor security was brought to the fore with Spectre and Meltdown — which presaged a deluge of microprocessor vulnerabilities. There are many aspects to security — some practical (degree of historical vulnerability, history of effective embargoed communication) and some more architectural (role of secure enclaves, secure virtualization, etc.)

Firmware

Microprocessors require firmware to boot and operate correctly. Firmware considerations include: degree of openness, degree of support, degree of platform specificity, etc. The most important single access for us with respect to firmware is our ability to control and attest our entire stack: we want to minimize our dependency on third-parties (that is, firmware delivered neither by us nor by the component vendor). In particular, nearly all computing systems have a dependency on third-party BIOS vendors such as AMI (formerly American Megatrends, Inc.); we view it as a constraint to avoid this spurious dependency, for myriad reasons.

Hyperprivileged software

x86 architectures contain software that runs in a hyperprivileged state: for Intel, this is the Intel Management Engine (ME); for AMD, it is the Platform Security Processor (PSP). Understanding this software — its scope, its firmware, the conditions under which it operates — can be critical for understanding many different aspects of the system (not least its security).

Cross-connect

Where multi-socket systems are to be considered, cross-connect architecture (protocol, bandwidth, latency) should be understood. Intel and AMD have different architectures for cache coherence (PCIe Gen 4 vs. UPI in Rome and Cascade Lake, respectively) — which can introduce asymmetries in the system. Specifically, in two socket systems, the I/O path must be understood to know if sockets are asymmetric with respect to I/O.

ISA extensions

Both x86 microprocessors extend the ISAs in different ways. Some of these can be significant (e.g., AVX-512) and can have an outsized impact on performance.

Top-of-rack considerations

The Oxide rack integrates a top-of-rack controller. It is desirable for this controller to have the same CPU architecture as the rest of the rack. Some microprocessors may lend themselves better to this integration than others.

Root-of-trust implications

Different microprocessors have different mechanisms for executing the first instruction. We believe that our root-of-trust will be implementable with either Intel or AMD, but the effort involved in each path will surely not be identical.

RAS features

Features for reliability, availability and serviceability differ in ways that can affect the ability for system software to be resilient to certain kinds of failure.

Microcode architecture

Microprocessors have microcode that the host can dynamically load. How the microprocessor is able to load microcode can have ramifications for security and availability.

Evaluation

Given the constraints, the evaluation boils down to AMD (Rome, Milan, Genoa) versus Intel (Cascade Lake, Ice Lake, Sapphire Rapids).

Density/Performance/TDP/Price

This table compares publicly available CPUs: AMD’s Rome (EPYC) to Intel’s Cascade Lake (CLX):

Processor	Cores	Thr	Base Freq	L3	TDP	W/core	SPEC	Price	$/core
EPYC 7702P	64	128	2	256 MB	200	3.13	249.6	$4,425	$69
EPYC 7702	64	128	2	256 MB	200	3.13	234.9	$6,450	$101
EPYC 7742	64	128	2.25	256 MB	225	3.52	261.7	$6,950	$109
EPYC 7552	48	96	2.2	192 MB	200	4.17	201.8	$4,025	$84
EPYC 7642	48	96	2.3	256 MB	225	4.69	230.5	$4,775	$99
EPYC 7452	32	64	2.35	128 MB	155	4.84	175.9	$2,025	$63
CLX 6262V	24	48	1.9	33 MB	135	5.63	103.8	$2,900	$121
CLX 6238T	22	44	1.9	30.25 MB	125	5.68	104.5	$2,742	$125
CLX 6222V	20	40	1.8	27.5 MB	115	5.75	91.6	$1,600	$80
CLX 5220T	18	36	1.9	24.75 MB	105	5.83	90.1	$1,727	$96
CLX 8276	28	56	2.2	38.5 MB	165	5.89	127.2	$8,719	$311
CLX 8276L	28	56	2.2	38.5 MB	165	5.89	128	$11,722	$419
CLX 8276M	28	56	2.2	38.5 MB	165	5.89	127	$11,722	$419
CLX 4216	16	32	2.1	22 MB	100	6.25	83.9	$1,002	$63
EPYC 7502P	32	64	2.5	128 MB	200	6.25	184.1	$2,300	$72
EPYC 7502	32	64	2.5	128 MB	200	6.25	183.5	$2,600	$81
CLX 6252	24	48	2.1	35.75 MB	150	6.25	118.4	$3,655	$152
CLX 6252N	24	48	2.3	35.75 MB	150	6.25	119	$3,984	$166
CLX 6230	20	40	2.1	27.5 MB	125	6.25	102.5	$1,894	$95
CLX 6230T	20	40	2.1	27.5 MB	125	6.25	101.5	$1,988	$99
CLX 6230N	20	40	2.3	27.5 MB	125	6.25	102.8	$2,046	$102
EPYC 7402	24	48	2.8	128 MB	155	6.46	169.8	$1,783	$74
CLX 6238	22	44	2.1	30.25 MB	140	6.36	109.1	$2,612	$119
CLX 6238M	22	44	2.1	30.25 MB	140	6.36	110.9	$5,615	$255
CLX 6238L	22	44	2.1	30.25 MB	140	6.36	110.3	$5,615	$255
CLX 5218T	16	32	2.1	22 MB	105	6.56	85.2	$1,349	$84
CLX 5218N	16	32	2.3	22 MB	110	6.88	86.5	$1,375	$86
CLX 8260	24	48	2.4	35.75 MB	165	6.88	119.3	$4,702	$196
CLX 8260Y	24	48	2.4	35.75 MB	165	6.88	122.5	$5,320	$222
CLX 8260L	24	48	2.4	35.75 MB	165	6.88	122.7	$7,705	$321
CLX 8260M	24	48	2.4	35.75 MB	165	6.88	112.8	$7,705	$321
CLX 5220	18	36	2.2	24.75 MB	125	6.94	94.5	$1,555	$86
CLX 5220S	18	36	2.7	24.75 MB	125	6.94	95.5	$2,000	$111
EPYC 7542	32	64	2.9	128 MB	225	7.03	183.2	$3,400	$106
CLX 4214	12	24	2.2	16.5 MB	85	7.08	68	$694.00	$58
CLX 4214Y	12	24	2.2	16.5 MB	85	7.08	67.4	$768.00	$64
CLX 8280	28	56	2.7	38.5 MB	205	7.32	138	$10,009	$357
CLX 8280L	28	56	2.7	38.5 MB	205	7.32	138	$13,012	$465
CLX 8280M	28	56	2.7	38.5 MB	205	7.32	136.6	$13,012	$465
EPYC 7282	16	32	2.8	64 MB	120	7.50	92.8	$650	$41
EPYC 7352	24	48	2.3	128 MB	180	7.50	160.3	$1,350	$56
CLX 6248	20	40	2.5	27.5 MB	150	7.50	110	$3,072	$154
CLX 5218B	16	32	2.3	22 MB	125	7.81	89	$1,273	$80
CLX 5218	16	32	2.3	22 MB	125	7.81	87.5	$1,273	$80
CLX 8253	16	32	2.2	22 MB	125	7.81	79.1	$3,115	$195
CLX 8270	26	52	2.7	35.75 MB	205	7.88	131.6	$7,405	$285
EPYC 7402P	24	48	2.8	128 MB	200	8.33	169.1	$1,250	$52
CLX 6240	18	36	2.6	24.75 MB	150	8.33	103.5	$2,445	$136
CLX 6240Y	18	36	2.6	24.75 MB	150	8.33	103.6	$2,726	$151
CLX 6240M	18	36	2.6	24.75 MB	150	8.33	104.1	$5,448	$303
CLX 6240L	18	36	2.6	24.75 MB	150	8.33	103.4	$5,448	$303
CLX 4210	10	20	2.2	13.75 MB	85	8.50	57.9	$501.00	$50
CLX 5215	10	20	2.5	13.75 MB	85	8.50	61.8	$1,221	$122
CLX 5215M	10	20	2.5	13.75 MB	85	8.50	62.1	$4,224	$422
CLX 5215L	10	20	2.5	13.75 MB	85	8.50	61.8	$4,224	$422
CLX 8268	24	48	2.9	35.75 MB	205	8.54	129.2	$6,302	$263
CLX 4209T	8	16	2.2	11 MB	70	8.75	46.4	$501.00	$63
CLX 6242	16	32	2.8	22 MB	150	9.38	99.1	$2,529	$158
EPYC 7302P	16	32	3	128 MB	155	9.69	141.1	$825	$52
EPYC 7302	16	32	3	128 MB	155	9.69	140.8	$978	$61
CLX 6226	12	24	2.7	19.25 MB	125	10.42	82.8	$1,776	$148
CLX 4208	8	16	2.1	11 MB	85	10.63	45.3	$417.00	$52
CLX 4215	8	16	2.5	11 MB	85	10.63	52.4	$794.00	$99
CLX 6254	18	36	3.1	24.75 MB	200	11.11	112.9	$3,803	$211
EPYC 7272	12	24	2.9	64 MB	155	12.92	83.5	$625	$52
CLX 6246	12	24	3.3	24.75 MB	165	13.75	92.1	$3,286	$274
CLX 3204	6	6	1.9	8.25 MB	85	14.17	27	$213.00	$36
CLX 5217	8	16	3	11 MB	115	14.38	56.5	$1,522	$190
EPYC 7232P	8	16	3.1	32 MB	120	15.00	?	$450	$56
EPYC 7252	8	16	3.1	64 MB	120	15.00	67	$475	$59
EPYC 7262	8	16	3.2	128 MB	120	15.00	89	$575	$72
CLX 6234	8	16	3.3	24.75 MB	130	16.25	68.9	$2,214	$277
CLX 6244	8	16	3.6	24.75 MB	150	18.75	73.4	$2,925	$366
CLX 5222	4	8	3.8	16.5 MB	105	26.25	38.6	$1,221	$305
CLX 8256	4	8	3.8	16.5 MB	105	26.25	38.6	$7,007	$1,752

Intel’s forthcoming Ice Lake will likely have a 6262V equivalent (dubbed 6362V) with better density (32 or 36 cores), but with no better TDP/core than Cascade Lake. In essentially every conceivable way of looking at this data, Rome is superior to Cascade Lake: in price, in performance, in density, in TDP — and in all of these things when viewed in terms of the other.

Security

The following table depicts known vulnerabilities, and the degree that they historically affected Intel CPUs vs. AMD CPUs circa February 2020:

Vulnerability	Intel	AMD
Spectre Variant 1	Vulnerable	Vulnerable
Spectre Variant 1.2	Vulnerable	Not vulnerable
Spectre Variant 2	Vulnerable	Vulnerable
Spectre Variant 3 (Meltdown)	Vulnerable	Not vulnerable
Spectre Variant 3a	Vulnerable	Not vulnerable
Spectre Variant 4 (Speculative Store Bypass)	Vulnerable	Vulnerable
Spectre Variant 5 (SpectreRSB)	Vulnerable	Not vulnerable
LazyFPU	Vulnerable	Not vulnerable
TLBleed	Vulnerable	Not vulnerable
L1TF/Foreshadow	Vulnerable	Not vulnerable
Spoiler	Vulnerable	Not vulnerable
MDS (ZombieLoad, Fallout, RIDL)	Vulnerable	Not vulnerable
SWAPGS	Vulnerable	Not vulnerable
PortSmash	Vulnerable	Vulnerable

Vulnerability

Intel

AMD

Spectre Variant 1

Vulnerable

Spectre Variant 1.2

Vulnerable

Not vulnerable

Spectre Variant 2

Vulnerable

Spectre Variant 3 (Meltdown)

Vulnerable

Not vulnerable

Spectre Variant 3a

Vulnerable

Not vulnerable

Spectre Variant 4 (Speculative Store Bypass)

Vulnerable

Spectre Variant 5 (SpectreRSB)

Vulnerable

Not vulnerable

LazyFPU

Vulnerable

Not vulnerable

TLBleed

Vulnerable

Not vulnerable

L1TF/Foreshadow

Vulnerable

Not vulnerable

Spoiler

Vulnerable

Not vulnerable

MDS (ZombieLoad, Fallout, RIDL)

Vulnerable

Not vulnerable

SWAPGS

Vulnerable

Not vulnerable

PortSmash

Vulnerable

Determinations

Based on the comparison table and its subsequent analysis, we are opting to go with AMD’s Milan CPUs. As we are trying to maximize the W/core and $/core, while still retaining a reasonable amount of frequency, we have chosen to build a platform that targets the SP3 Group D CPUs (§2.2.1 [amd-irm]) and a configurable TDP (cTDP) and package power limit (PPL) up to 240 W in a single socket (1P aka 1S) configuration. 64-core processors are the desired launch target.

Although AMD’s 64-core product stack has changed somewhat from Rome to our previous expectations for Milan to the documented Milan set at launch, this determination has not changed. As shown in [amd-rome-ptds] and [amd-milan-ptds], AMD offered 4 different 64-core Rome parts, of which we considered the 7702P (group A) and 7742 (group D) most likely to represent the type of processor we want to offer in our product. There is at present no Milan analogue to the 7742; AMD are planning a 7763 (group X, analogous to the 7H12 in group Z) and the two 7713 variants (the 1P 7713P and the 2P 7713, both in group D and analogous to the 7702P/7702 that were both in group A).

Therefore, the launch target is the 7713P. Due to availability constraints, bringup and early validation phases may utilize the 7713 and/or inexpensive Rome processors as required.

Non-Goals

Group X/Z processor support.
This does leave out processors such as the slightly higher-clocked 7763 and the frequency-optimized processors such as the 32-core 2.95 GHz 75F3; however, these all run hot with a default TDP and PPL of 280W and are designed for more specialty use cases that we do not believe make sense for us to target in this version in the product. Group X (Milan-only) parts also include configurable EDC up to 300 A. Support for these processors in our platform is mildly desirable for future flexibility but not a requirement and we do not expect to offer configurations with these processors.
2S configurations.
A 2S SP3 platform is not planned; however, every 1S platform automatically provides hardware support for both 1P and 2P processors of the same power/thermal infrastructure groups.
Other group D and lower processors.
By virtue of AMD’s infrastructure design, every group D platform will also support all group A, B, and C processors solely by virtue of meeting the group D requirements. Software support for and testing of the platform with group D or lower processors other than the 7713P (and 7713, due to availability constraints) is not planned but may be undertaken later with business justification.

Documentation and Sources

[amd-irm] Advanced Micro Devices. Infrastructure Roadmap (IRM) for Socket SP3 Processors. Publication number 55418, revision 1.18. 2020. Distributed only under NDA.
[amd-rome-ptds] Advanced Micro Devices. Power and Thermal Data Sheet for AMD Family 17h Models 30h-3Fh Socket SP3 Processors. Publication number 56585, revision 0.87. 2019. Distributed only under NDA.
[amd-milan-ptds] Advanced Micro Devices. Power and Thermal Data Sheet for AMD Family 19h Models 00h-0Fh Socket SP3 Processors. Publication number 56958, revision 0.88. January 2021. Distributed only under NDA.

RFD 12 Host CPU Evaluation

Table of Contents