RFD 12
Host CPU Evaluation
RFD
12
Updated

The host CPU determination is an early and important decision for Oxide. As this document will eludicate, this determination is both essential and nuanced: there are two options that seem to have very different tradeoffs — each one carrying with it high levels of risk and uncertainty.


The following image relates this RFD to other ones in a hierarchical fashion.

0012
Figure 1. RFD Relationships

Constraints

Instruction set architecture

While we laud alternative instruction set architectures, server-side computing remains dominated by x86. This dominance stems from multiple historical factors, not least that it triumphed in personal computing nearly four decades ago. Now, there is reason to believe that this dominance is more fragile than ever: with ARM’s Ares core (the basis for Amazon’s Graviton 2) there is at last a non-x86 core that can meaningfully compete with (Intel) x86 on the server — but it is not clear to what degree this will get adoption in the long tail of software development. (For example, will software developers who are increasingly returning to a binary model embrace cross compilation and two different instruction set targets?) In our target market of the enterprise space, it seems likely that any adoption of hybrid instruction sets will be slower still; Oxide will have x86-based host CPUs for the foreseeable future.

Evaluation Criteria

Performance

The most obvious criteria for the host CPU is performance. While quantifiable, performance is also incredibly nuanced: with so many different ways of describing the performance of a system, it’s easy to measure (or emphasize) an aspect of the system that is practically irrelevant. Ultimately, the performance that matters is the performance of a customer workload — which may or may not be what vendors measure or design around! That said, there remain obvious axes of microprocessor performance: clock rate; number of cores; cache size; cache architecture. (In terms of quantifying these, SPECrate is generally reasonable.) These axes are directly affected by density/process: barring architectural differences, one cannot expect (say) a 14nm part to ever be broadly competitive with a 7nm one. There are more architectural aspects to performance as well: on-die acceleration (e.g., SIMD extensions), compiler optimization, cache coherence architecture, etc. And then there are system aspects of performance that can be implied by processor choice: bus speed, memory speed, memory sizing etc. Performance certainly isn’t our only criteria, and its importance must be kept in perspective, but it also must be viewed as central: Oxide racks will compete with extant infrastructure; the higher their performance (however measured, observed or perceived), the more those racks will enjoy a tail wind in our customer base.

Price

The second obvious criteria for any microprocessor is price. Historically, the CPU is the most expensive single component in the BOM, but the price can vary widely within a single family. For example, when it launched, Intel’s Cascade Lake ranged in price from $1600 to $17900 for a single socket — and that only includes the "Gold" and "Platinum" classes of Xeon. At the quantities that we are seeking (on the order of ~96 sockets per rack), these price differences have enormous impact on the economics of the entire rack. In this regard, price almost always needs to be denominated — e.g., as $/core or $/SPECrate.

Thermal design point (TDP)

Critical for us (or anyone looking to achieve any kind of density) is the thermal design point (TDP). This is — more or less — the draw of the part that the needs to be cooled. Historically, dense form factors like the OCP Tioga Pass form factor have allowed TDP per socket of around ~165W; beyond that, a different form factor would need to be considered. Note that 250W is generally the threshold for air cooling; beyond TDP of 250W, a CPU must be water cooled. Moreover, power for the rack must be considered: assuming an Open Rack, we should assume no more than ~15kW at ~32OU — or about 450W per OU. This must be all in (DRAM, NVMe, NIC, etc.), so the CPU power budget must be less than this, although to what degree is to be determined.

Transistor density/process

Transistor density has many implications for the system, in terms of performance, power and price. Regrettably, for historical reasons, transistor density (which ideally should be expressed in terms of transistors per fixed unit area) is expressed by the industry in terms of feature size — which doesn’t actually correspond to anything. (That is, in a 7nm process — and especially in the forthcoming 5nm processes — there is nothing that is in fact as small as the putative feature size.) This has given rise to absurd process nodes like Intel’s "14nm+++" (yes, three plusses). Making things worse, vendors making claims that are difficult to verify ("our 10nm is equivalent to their 7nm"). Density is critically important (there is unquestionably a density difference between, say, 14nm and 7nm), but it is likely best measured through its manifestation in the artifact: core count, cache sizing, TDP, etc.

Core density

Closely related to TDP and transistor density is core density. We are seeking a dense product offering, with more than 1344 cores in a 15 kW rack. Core density should be thought of per U — and if achieving that desired core density necessitates multiple sockets, issues around multiple sockets must be considered (see, e.g. "Cross-connect", below).

Yield

Related to transistor density is yield. We have no control over yield and yield often isn’t disclosed, but there are architectural decisions that can materially affect yield. In particular, chiplet designs can result in (much) higher yield because of their (much) smaller size. Similarly, tiled designs can circumvent some of the yield problems that may be present in a large monolithic design.

Platform

CPUs do not exist on their own; they exist in a larger system that must support them. CPUs are mated to specific platforms in several key attributes, but especially via I/O (i.e., generation of PCIe) and memory (i.e., generation of DDR SDRAM). This is particularly germane for us because we happen to be targeting a product that is arguably at the tail end of one generation or the leading edge of the next. Specifically, both Intel and AMD have roadmap CPUs (Sapphire Rapids and Genoa, respectively) that are targeting PCIe Gen 5 and DDR5; to build on either of these parts is to build a dependency on both of these technologies — each of which has economic, performance, complexity, and schedule risk implications.

Roadmap/schedule

Microprocessors take a long time to develop — especially given dense processes like 7nm. A roadmap can therefore be a concrete way of predicting the future in that one can often known years ahead of time if a particular CPU is going to be competitive or not. Roadmaps should be taken as a best case; schedules certainly can slip. In general, late projects get later, and when there is a history of slip, more can be reasonably expected.

Strategic relationship

Our strategic relationship with a microprocessor vendor can have an outsized effect: the ability to get the support we need when we need it can often come down to the depth of our relationship. As a startup, it is a challenge for us to be strategically relevant to much larger companies.

Security

Microprocessor security was brought to the fore with Spectre and Meltdown — which presaged a deluge of microprocessor vulnerabilities. There are many aspects to security — some practical (degree of historical vulnerability, history of effective embargoed communication) and some more architectural (role of secure enclaves, secure virtualization, etc.)

Firmware

Microprocessors require firmware to boot and operate correctly. Firmware considerations include: degree of openness, degree of support, degree of platform specificity, etc. The most important single access for us with respect to firmware is our ability to control and attest our entire stack: we want to minimize our dependency on third-parties (that is, firmware delivered neither by us nor by the component vendor). In particular, nearly all computing systems have a dependency on third-party BIOS vendors such as AMI (formerly American Megatrends, Inc.); we view it as a constraint to avoid this spurious dependency, for myriad reasons.

Hyperprivileged software

x86 architectures contain software that runs in a hyperprivileged state: for Intel, this is the Intel Management Engine (ME); for AMD, it is the Platform Security Processor (PSP). Understanding this software — its scope, its firmware, the conditions under which it operates — can be critical for understanding many different aspects of the system (not least its security).

Cross-connect

Where multi-socket systems are to be considered, cross-connect architecture (protocol, bandwidth, latency) should be understood. Intel and AMD have different architectures for cache coherence (PCIe Gen 4 vs. UPI in Rome and Cascade Lake, respectively) — which can introduce asymmetries in the system. Specifically, in two socket systems, the I/O path must be understood to know if sockets are asymmetric with respect to I/O.

ISA extensions

Both x86 microprocessors extend the ISAs in different ways. Some of these can be significant (e.g., AVX-512) and can have an outsized impact on performance.

Top-of-rack considerations

The Oxide rack integrates a top-of-rack controller. It is desirable for this controller to have the same CPU architecture as the rest of the rack. Some microprocessors may lend themselves better to this integration than others.

Root-of-trust implications

Different microprocessors have different mechanisms for executing the first instruction. We believe that our root-of-trust will be implementable with either Intel or AMD, but the effort involved in each path will surely not be identical.

RAS features

Features for reliability, availability and serviceability differ in ways that can affect the ability for system software to be resilient to certain kinds of failure.

Microcode architecture

Microprocessors have microcode that the host can dynamically load. How the microprocessor is able to load microcode can have ramifications for security and availability.

Evaluation

Given the constraints, the evaluation boils down to AMD (Rome, Milan, Genoa) versus Intel (Cascade Lake, Ice Lake, Sapphire Rapids).

Density/Performance/TDP/Price

This table compares publicly available CPUs: AMD’s Rome (EPYC) to Intel’s Cascade Lake (CLX):

ProcessorCoresThrBase FreqL3TDPW/coreSPECPrice$/core

EPYC 7702P

64

128

2

256 MB

200

3.13

249.6

$4,425

$69

EPYC 7702

64

128

2

256 MB

200

3.13

234.9

$6,450

$101

EPYC 7742

64

128

2.25

256 MB

225

3.52

261.7

$6,950

$109

EPYC 7552

48

96

2.2

192 MB

200

4.17

201.8

$4,025

$84

EPYC 7642

48

96

2.3

256 MB

225

4.69

230.5

$4,775

$99

EPYC 7452

32

64

2.35

128 MB

155

4.84

175.9

$2,025

$63

CLX 6262V

24

48

1.9

33 MB

135

5.63

103.8

$2,900

$121

CLX 6238T

22

44

1.9

30.25 MB

125

5.68

104.5

$2,742

$125

CLX 6222V

20

40

1.8

27.5 MB

115

5.75

91.6

$1,600

$80

CLX 5220T

18

36

1.9

24.75 MB

105

5.83

90.1

$1,727

$96

CLX 8276

28

56

2.2

38.5 MB

165

5.89

127.2

$8,719

$311

CLX 8276L

28

56

2.2

38.5 MB

165

5.89

128

$11,722

$419

CLX 8276M

28

56

2.2

38.5 MB

165

5.89

127

$11,722

$419

CLX 4216

16

32

2.1

22 MB

100

6.25

83.9

$1,002

$63

EPYC 7502P

32

64

2.5

128 MB

200

6.25

184.1

$2,300

$72

EPYC 7502

32

64

2.5

128 MB

200

6.25

183.5

$2,600

$81

CLX 6252

24

48

2.1

35.75 MB

150

6.25

118.4

$3,655

$152

CLX 6252N

24

48

2.3

35.75 MB

150

6.25

119

$3,984

$166

CLX 6230

20

40

2.1

27.5 MB

125

6.25

102.5

$1,894

$95

CLX 6230T

20

40

2.1

27.5 MB

125

6.25

101.5

$1,988

$99

CLX 6230N

20

40

2.3

27.5 MB

125

6.25

102.8

$2,046

$102

EPYC 7402

24

48

2.8

128 MB

155

6.46

169.8

$1,783

$74

CLX 6238

22

44

2.1

30.25 MB

140

6.36

109.1

$2,612

$119

CLX 6238M

22

44

2.1

30.25 MB

140

6.36

110.9

$5,615

$255

CLX 6238L

22

44

2.1

30.25 MB

140

6.36

110.3

$5,615

$255

CLX 5218T

16

32

2.1

22 MB

105

6.56

85.2

$1,349

$84

CLX 5218N

16

32

2.3

22 MB

110

6.88

86.5

$1,375

$86

CLX 8260

24

48

2.4

35.75 MB

165

6.88

119.3

$4,702

$196

CLX 8260Y

24

48

2.4

35.75 MB

165

6.88

122.5

$5,320

$222

CLX 8260L

24

48

2.4

35.75 MB

165

6.88

122.7

$7,705

$321

CLX 8260M

24

48

2.4

35.75 MB

165

6.88

112.8

$7,705

$321

CLX 5220

18

36

2.2

24.75 MB

125

6.94

94.5

$1,555

$86

CLX 5220S

18

36

2.7

24.75 MB

125

6.94

95.5

$2,000

$111

EPYC 7542

32

64

2.9

128 MB

225

7.03

183.2

$3,400

$106

CLX 4214

12

24

2.2

16.5 MB

85

7.08

68

$694.00

$58

CLX 4214Y

12

24

2.2

16.5 MB

85

7.08

67.4

$768.00

$64

CLX 8280

28

56

2.7

38.5 MB

205

7.32

138

$10,009

$357

CLX 8280L

28

56

2.7

38.5 MB

205

7.32

138

$13,012

$465

CLX 8280M

28

56

2.7

38.5 MB

205

7.32

136.6

$13,012

$465

EPYC 7282

16

32

2.8

64 MB

120

7.50

92.8

$650

$41

EPYC 7352

24

48

2.3

128 MB

180

7.50

160.3

$1,350

$56

CLX 6248

20

40

2.5

27.5 MB

150

7.50

110

$3,072

$154

CLX 5218B

16

32

2.3

22 MB

125

7.81

89

$1,273

$80

CLX 5218

16

32

2.3

22 MB

125

7.81

87.5

$1,273

$80

CLX 8253

16

32

2.2

22 MB

125

7.81

79.1

$3,115

$195

CLX 8270

26

52

2.7

35.75 MB

205

7.88

131.6

$7,405

$285

EPYC 7402P

24

48

2.8

128 MB

200

8.33

169.1

$1,250

$52

CLX 6240

18

36

2.6

24.75 MB

150

8.33

103.5

$2,445

$136

CLX 6240Y

18

36

2.6

24.75 MB

150

8.33

103.6

$2,726

$151

CLX 6240M

18

36

2.6

24.75 MB

150

8.33

104.1

$5,448

$303

CLX 6240L

18

36

2.6

24.75 MB

150

8.33

103.4

$5,448

$303

CLX 4210

10

20

2.2

13.75 MB

85

8.50

57.9

$501.00

$50

CLX 5215

10

20

2.5

13.75 MB

85

8.50

61.8

$1,221

$122

CLX 5215M

10

20

2.5

13.75 MB

85

8.50

62.1

$4,224

$422

CLX 5215L

10

20

2.5

13.75 MB

85

8.50

61.8

$4,224

$422

CLX 8268

24

48

2.9

35.75 MB

205

8.54

129.2

$6,302

$263

CLX 4209T

8

16

2.2

11 MB

70

8.75

46.4

$501.00

$63

CLX 6242

16

32

2.8

22 MB

150

9.38

99.1

$2,529

$158

EPYC 7302P

16

32

3

128 MB

155

9.69

141.1

$825

$52

EPYC 7302

16

32

3

128 MB

155

9.69

140.8

$978

$61

CLX 6226

12

24

2.7

19.25 MB

125

10.42

82.8

$1,776

$148

CLX 4208

8

16

2.1

11 MB

85

10.63

45.3

$417.00

$52

CLX 4215

8

16

2.5

11 MB

85

10.63

52.4

$794.00

$99

CLX 6254

18

36

3.1

24.75 MB

200

11.11

112.9

$3,803

$211

EPYC 7272

12

24

2.9

64 MB

155

12.92

83.5

$625

$52

CLX 6246

12

24

3.3

24.75 MB

165

13.75

92.1

$3,286

$274

CLX 3204

6

6

1.9

8.25 MB

85

14.17

27

$213.00

$36

CLX 5217

8

16

3

11 MB

115

14.38

56.5

$1,522

$190

EPYC 7232P

8

16

3.1

32 MB

120

15.00

?

$450

$56

EPYC 7252

8

16

3.1

64 MB

120

15.00

67

$475

$59

EPYC 7262

8

16

3.2

128 MB

120

15.00

89

$575

$72

CLX 6234

8

16

3.3

24.75 MB

130

16.25

68.9

$2,214

$277

CLX 6244

8

16

3.6

24.75 MB

150

18.75

73.4

$2,925

$366

CLX 5222

4

8

3.8

16.5 MB

105

26.25

38.6

$1,221

$305

CLX 8256

4

8

3.8

16.5 MB

105

26.25

38.6

$7,007

$1,752

Intel’s forthcoming Ice Lake will likely have a 6262V equivalent (dubbed 6362V) with better density (32 or 36 cores), but with no better TDP/core than Cascade Lake. In essentially every conceivable way of looking at this data, Rome is superior to Cascade Lake: in price, in performance, in density, in TDP — and in all of these things when viewed in terms of the other.

Security

The following table depicts known vulnerabilities, and the degree that they historically affected Intel CPUs vs. AMD CPUs circa February 2020:

VulnerabilityIntelAMD

Spectre Variant 1

Vulnerable

Vulnerable

Spectre Variant 1.2

Vulnerable

Not vulnerable

Spectre Variant 2

Vulnerable

Vulnerable

Spectre Variant 3 (Meltdown)

Vulnerable

Not vulnerable

Spectre Variant 3a

Vulnerable

Not vulnerable

Spectre Variant 4 (Speculative Store Bypass)

Vulnerable

Vulnerable

Spectre Variant 5 (SpectreRSB)

Vulnerable

Not vulnerable

LazyFPU

Vulnerable

Not vulnerable

TLBleed

Vulnerable

Not vulnerable

L1TF/Foreshadow

Vulnerable

Not vulnerable

Spoiler

Vulnerable

Not vulnerable

MDS (ZombieLoad, Fallout, RIDL)

Vulnerable

Not vulnerable

SWAPGS

Vulnerable

Not vulnerable

PortSmash

Vulnerable

Vulnerable

Determinations

Based on the comparison table and its subsequent analysis, we are opting to go with AMD’s Milan CPUs. As we are trying to maximize the W/core and $/core, while still retaining a reasonable amount of frequency, we have chosen to build a platform that targets the SP3 Group D CPUs (ยง2.2.1 [amd-irm]) and a configurable TDP (cTDP) and package power limit (PPL) up to 240 W in a single socket (1P aka 1S) configuration. 64-core processors are the desired launch target.

Although AMD’s 64-core product stack has changed somewhat from Rome to our previous expectations for Milan to the documented Milan set at launch, this determination has not changed. As shown in [amd-rome-ptds] and [amd-milan-ptds], AMD offered 4 different 64-core Rome parts, of which we considered the 7702P (group A) and 7742 (group D) most likely to represent the type of processor we want to offer in our product. There is at present no Milan analogue to the 7742; AMD are planning a 7763 (group X, analogous to the 7H12 in group Z) and the two 7713 variants (the 1P 7713P and the 2P 7713, both in group D and analogous to the 7702P/7702 that were both in group A).

Therefore, the launch target is the 7713P. Due to availability constraints, bringup and early validation phases may utilize the 7713 and/or inexpensive Rome processors as required.

Non-Goals

  • Group X/Z processor support.

    This does leave out processors such as the slightly higher-clocked 7763 and the frequency-optimized processors such as the 32-core 2.95 GHz 75F3; however, these all run hot with a default TDP and PPL of 280W and are designed for more specialty use cases that we do not believe make sense for us to target in this version in the product. Group X (Milan-only) parts also include configurable EDC up to 300 A. Support for these processors in our platform is mildly desirable for future flexibility but not a requirement and we do not expect to offer configurations with these processors.

  • 2S configurations.

    A 2S SP3 platform is not planned; however, every 1S platform automatically provides hardware support for both 1P and 2P processors of the same power/thermal infrastructure groups.

  • Other group D and lower processors.

    By virtue of AMD’s infrastructure design, every group D platform will also support all group A, B, and C processors solely by virtue of meeting the group D requirements. Software support for and testing of the platform with group D or lower processors other than the 7713P (and 7713, due to availability constraints) is not planned but may be undertaken later with business justification.

Documentation and Sources

  • [amd-irm] Advanced Micro Devices. Infrastructure Roadmap (IRM) for Socket SP3 Processors. Publication number 55418, revision 1.18. 2020. Distributed only under NDA.

  • [amd-rome-ptds] Advanced Micro Devices. Power and Thermal Data Sheet for AMD Family 17h Models 30h-3Fh Socket SP3 Processors. Publication number 56585, revision 0.87. 2019. Distributed only under NDA.

  • [amd-milan-ptds] Advanced Micro Devices. Power and Thermal Data Sheet for AMD Family 19h Models 00h-0Fh Socket SP3 Processors. Publication number 56958, revision 0.88. January 2021. Distributed only under NDA.