363 - Minibar / RFD / Oxide

RFD

363

Authors

Updated

General Description

Minibar is a manufacturing tester for Oxide’s Socket SP3 (Gimlet) and Socket SP5 (Cosmo) compute sleds. Minibar plugs into the sled’s backplane connectors and provides an Oxide Rack-like mechanical and electrical interface between the sled and the programming station. Minibar breaks out the sled’s backplane PCIe interface to a x16 PCIe slot and converts the sled’s SGMII management network Ethernet links to BASE-T Ethernet, allowing users to connect to the sled’s management network using commodity networking hardware. Minibar also contains an Ignition controller which the programming station can access through Minibar to test the sled’s Ignition target.

Oxide plans to use Minibar in both manufacturing and lab environments. On the manufacturing floor, Minibar allows Oxide’s programming stations to program, test, and lock compute sleds in a single step, eliminating the need to test compute sleds in a rack between programming and locking. In lab environments, Minibar allows users to connect compute sleds to commodity networking equipment and PCIe devices.

Block Diagram

Top level block diagram

Motivation

Oxide realized that in order to scale up compute sled manufacturing, we needed to be able to program, test, and lock compute sleds in a single step while the sled is connected to the programming station. The ability to program, test, and lock sleds at the programming station would reduce the amount of handling each compute sled experienced during the manufacturing process, and would allow Oxide to test compute sleds without needing an Oxide Rack dedicated for testing on the manufacturing floor.

Manufacturing Test Needs

Oxide identified five specific manufacturing test capabilities Minibar needed to allow programming stations from being able to program, test, and lock sleds in a single step:

The programming station needs to verify that the sled’s Ignition target is functioning and that both Ignition LVDS links are working.
The programming station needs to test that the sled’s backplane PCIe interface operates at PCIe Gen3 x4 speed
The programming station needs to be able to connect an Ethernet NIC to the sled’s backplane PCIe interface to upload the host OS image
The programming station needs to be able to verify that both of the sled’s 200G/100GBASE-KR4 Ethernet links come up at full speed
The programming station needs to be able to verify that both of the sled’s management network Ethernet links are operating correctly

Test the Ignition Target

The Ignition subsystem, described in RFD 142, provides rack switches with physical presence detection, device identification, power control, and low level fault reporting before devices are available on the management network. Each compute sled contains an Ignition target, and each rack switch contains an Ignition controller. Ignition targets and Ignition controllers communicate through the backplane over 8B10B-encoded LVDS.

In order to test the sled’s Ignition target at the programming station, Minibar needs to contain an Ignition controller and connect to both of the compute sled’s Ignition LVDS links.

Test the Backplane PCIe Interface and Load the Host OS

Each compute sled exposes a PCIe Gen3/Gen4 x4 interface on a dedicated backplane connector. In the Oxide Rack, the compute sleds in cubby 14 and cubby 16 use this interface to manage the Tofino 2 ASICs in the rack switches. In all other cubbies, the backplane PCIe interface is left unconnected.

Note

This RFD refers to the sleds in cubby 14 and cubby 16 that manage the rack switches as Management Sleds.

Minibar’s hardware needs to satisfy two needs:

In order to test the backplane PCIe interface, Minibar needed to include a PCIe Gen3/Gen4 x4 endpoint.
The programming station also needs to be able to connect an Ethernet NIC to the sled’s backplane PCIe interface instead of one of the U.2 SSD slots so that we can program and test sleds with all ten SSDs installed. Previously, since we couldn’t connect anything to the backplane PCIe interface, operators had to remove one of the U.2 SSDs for programming so we could use a K.2 adapter card to plug a PCIe Ethernet NIC into the U.2 slot and load the host OS image onto the internal M.2 SSDs. We needed to be able to program sleds with all ten U.2 SSDs installed.

Our solution to both problems is to have Minibar break out the sled’s PCIe backplane interface to a CEM slot. We then install a PCIe Gen3 x4 NIC in Minibar’s PCIe slot to allow the programming station to communicate with the sled’s host CPU and load the host OS image without having to remove one of the U.2 SSDs. The programming station can also use the NIC as a PCIe endpoint to perform link tests beyond "can the sled see Ethernet traffic". For example,if the link comes up in Gen3 x4 at 8GT/S, the PCIe interface is probably working up to the backplane interface. See [section_pcie] for more information.

Test the Management Network

The programming station needs to be able to verify that both of the sled’s management network Ethernet links are working up to the sled’s backplane interface. Normally, this would require installing the sled in an Oxide Rack and checking the link status in the rack switch.To test the sled’s management network Ethernet links, Minibar needed hardware to convert the sled’s management network Ethernet from SGMII into something like BASE-T that we could connect to the programming station. We initially considered using two seperate MAX24287 Parallel-to-Serial MII Converters to convert the sled’s SGMII links to MII/RMII/RGMII, then using a KSZ8463 or another PHY to convert MII/RMII/RGMII to BASE-T Ethernet. However, we discovered two potential drawbacks with this architecture during the design process:

The MAX24287 can convert SGMII to either RGMII or MII, but not RMII. We would either need to find a new IC to convert from RGMII to BASE-T Ethernet or order the KSZ8463ML/FML version of the KSZ8463. Both options require more software work than using devices Oxide already has code for.
The MAX24287 and KSZ8463 data sheets contained inconsistencies or errors that made us concerned we wouldn’t be able to configure the interface for our desired use case.

Eventually, we realized Oxide had already solved this problem in the Sidecar rack switch, which uses a Microchip VSC7448 to handle management network Ethernet switching alongside the Tofino 2 ASIC. It would be easier for Minibar to just include a copy of Sidecar’s management network switch than it would be to design a new hardware and software solution from scratch. From the sled’s perspective, Minibar’s management network interface would be indistinguishable from Sidecar’s, and Minibar could leverage Oxide’s existing software and drivers to configure the management network. Similar to how Sidecar’s technician ports work, Minibar includes a PHY that drives three BASE-T Ethernet ports - one for each of the sled’s management network Ethernet links, and one port dedicated to Minibar’s service processor. The programming station can use Minibar’s management network switch to check link status and see that both of the sled’s management network links are operating at full speed, and it can talk directly to the sled’s SP. This configuration has the added benefit of allowing the programming station to control Minibar over Ethernet. See [section_management_network] for more information.

Test the 200G/100G Ethernet

Compute sleds have two 200G/100GBASE-KR4 Ethernet links - one to each rack switch - exposed on the backplane interface. The programming station needs to observe both links come up at full speed.

We initially considered having Minibar break out the sled’s two 200G/100GBASE-KR4 Ethernet links to QSFP ports. Our intention was that we could connect the two ports together with a passive DAC cable to test connectivity at the programming station. For a while, we also thought that the QSFP ports would allow us to plug in optical transceivers and connect the sled to other 100G/200G network equipment. However, the sled’s T6/T7 NIC requires that the QSFP port control signals (ModPrs, ModSel, Reset, LpMode, and the two-wire serial interface) be wired up to its GPIO pins so that it can manage the transceiver, and because the sled will only ever speak BASE-KR4 Ethernet over a copper backplane in the Oxide Rack, we don’t expose those signals on the sled’s backplane connectors. Simply put, a T6/T7 NIC in a sled would be physically unable to manage a QSFP port on the other side of the backplane interface. Even if we had exposed the module control signals on the sled’s backplane connectors, there aren’t any other devices that speak BASE-KR4 Ethernet, and the T6/T7 would need a manufacturing firmware update from Chelsio to speak BASE-CR4 or other protocols that are supported by active optical modules.

Instead, we decided to have Minibar electrically connect the sled’s two BASE-KR4 Ethernet links in loopback as shown in [figure_ethernet_loopback]. When a sled is plugged in to Minibar, Minibar connects the TX lanes of one link to the RX lanes of the other link, and vice versa. If the links come up at full speed, we have confidence that the data plane Ethernet links are functioning correctly up to the sled’s backplane interface. This approach is extremely simple to implement in hardware, but requires a special host OS image for testing. The BASE-KR4 Ethernet loopback configuration is described in [section_data_plane].

Lab Needs

As Minibar’s feature set developed, Oxide realized that Minibar’s backplane breakout capabilities would also enable easier compute sled development and debugging in a lab environment. Users could use Minibar to connect commodity network equipment to the sled’s management network links, communicate with the sled’s Ignition target without having to use a Sidecar rack switch, and test sleds with different PCIe add-in cards. However, to support Socket SP5 compute sled development, Minibar needed two additional capabilities:

Minibar’s PCIe slot needed to support Chelsio T6 and T7 PCIe Ethernet NICs, which require more power than the Ethernet NICs we planned to use on the programming stations. This required Minibar to have a 75 W capable PCIe slot, and clarified the requirement that Minibar needed to support PCIe Gen4 speeds.
Minibar needed to break out the PCIe auxiliary signals - PRSNT#, PERST#, PWRFLT#, and the I2C interface - to debug headers where they can be monitored with a logic analyzer.

Oxide plans to leverage these capabilities to develop system software for future Ethernet ASICs and PCIe accelerator cards.

Architecture

A close reader may have noticed that Minibar is essentially a two-channel Sidecar rack switch, minus the front-panel QSFP ports and with a PCIe slot instead of a Tofino 2 switch ASIC. Minibar leverages Sidecar’s management network switch and Ignition controller architecture, as well as the hardware root of trust (RoT) and service processor (SP) used by all Oxide products. This section describes each hardware function in more detail.

Power

Note

Add a block diagram showing how power control functions are divided between the SP and FPGA.

Ignition Controller

The programming station needs to test that the sled’s Ignition target is functioning properly and confirm that both Ignition LVDS links are functional up to the sled’s backplane interface. The easiest way to perform this test is to put an Ignition controller in Minibar.

Minibar’s FPGA implements two channels of an Ignition controller and connects to both of the sled’s Ignition LVDS links. The Ignition controller can be accessed over SPI from Minibar’s service processor.

Ignition Controller

Initially, Minibar’s Ignition controller will have the same functionality as Sidecar’s Ignition controller. Eventually, however, we intend for Minibar’s Ignition controller to be able to send arbitrary Ignition packets so we can fuzz the Ignition target. However, we do not currently have a roadmap for when this will be implemented.

PCIe

Minibar breaks the sled’s PCIe Gen3/Gen4 x4 backplane interface out to a 75 W PCIe Gen3/Gen4 x16 PCIe slot as shown in [figure_pcie_breakout]. Although the PCIe interface still only operates in x4 mode, the x16 PCIe slot allows users to plug in full-size PCIe add-in cards.

PCIe Breakout Interface

Minibar routes the PCIe data lanes directly from the sled to the CEM connector; however, the PCIe auxiliary signals (PERST#, PRSNT#, and PWRFLT#) and the PCIe auxiliary I2C interface pass through Minibar’s FPGA. In normal operation, a PCIe Gen3 x4 Ethernet NIC is installed in Minibar’s PCIe slot. The PCIe Ethernet NIC acts as a PCIe endpoint for link testing (i.e., did the interface come up in Gen3 x4 mode at 8 GT/S?) and allows the programming station to connect to the sled’s host CPU over PCIe to load the host OS onto the internal M.2 SSDs. Minibar’s FPGA passes the PCIe auxiliary signals back and forth between the sled and the PCIe slot, and allows users to observe and control the PCIe auxiliary signals in real time for debugging.

The compute sled generates the 100 MHz PCIe reference clock and sends it to Minibar over the backplane interface. In the Oxide Rack, the PCIe reference clock is ac coupled and terminated at the clock generator in the rack switch. To keep the electrical interface as close to the Oxide Rack as possible, Minibar ac-couples and terminates the PCIe reference clock, then uses a PCIe Gen4 reference clock buffer to redrive a DC-coupled HSCL reference clock to the CEM slot.

Note

In the Oxide Rack, management sleds and rack switch are configured to use Separate Reference with Independent Spread (SRIS) and Spread Spectrum Clocking (SSC) instead of having the management sled send the PCIe reference clock over the backplane cable. However, Minibar doesn’t support SRIS or SSC because very few PCIe add-in cards support SRIS or SSC, and the benefits of supporting SRIS
SSC did not offset the additional hardware complexity and cost of including a clock generator in every Minibar. As a result, compute sleds needs to use a special testing host OS image that enables the PCIe reference clock output on the backplane PCIe interface. Alternatively, the host OS can use the PCIe auxiliary I2C interface to detect that it is connected to a Minibar instead of a rack switch and decide whether to enable the PCIe reference clock based on the PCIe endpoint it detects.

Management Network Ethernet

[figure_management_network_block_diagram] illustrates Minibar’s management network switch archtecture. The two SGMII management network links from the sled and the two SGMII management network links from Minibar’s SP are routed to a VSC7448 switch, which uses a VSC8504 PHY over QSGMII to break out the three Ethernet links to three 100BASE-T ports - one for each of the sled’s two links, and one for Minibar’s SP. Like in Sidecar, the VSC7448 is responsible for isolating Minibar’s SP from the sled’s SP.

Management Network Block Diagram

Using a VSC7448 switch makes the management network interface between Minibar and the sled as electrically similar to the Oxide Rack as possible and requires the smallest amount of software development, but it adds an extra $150 to Minibar’s cost per board. We believe the small cost increase per board is justified given Minibar’s small projected build quantities and the need for the management network architecture to work on the first spin.

Note that both of the SGMII links from Minibar’s SP are connected to the VSC7448, even though Minibar’s SP is only given a single BASE-T Ethernet port on the back panel. The decision to wire up both of the SP’s SGMII links was actually motivated by software - it allows Minibar’s SP to reuse the same KSZ8463/VSC8562 configuration as any other SP without us having to go in and disable one link. We can simply have the VSC7448 - which already needs an updated configuration - blackhole one of the ports. It’s also ea

200G/100G Ethernet

The Minibar PCB wires the sled’s two 200G/100GBASE-KR4 Ethernet links in a loopback configuration as shown in [figure_ethernet_loopback]. If the host OS checks the T6 link status and both links are up at full speed, the sled’s BASE-KR4 Ethernet links are working correctly up to the sled’s backplane interface.

200G/100G Ethernet Loopback

Note

Minibar should support loopback testing at 200G even if the current Oxide Rack backplane only operates at 100G.

Mechanical Design

Minibar comes in two mechanical form factors:

The Minibar production tester looks like a single cubby partition in the Oxide Rack with a blind-mate power and backplane interface. The Minibar production tester is rugged enough to withstand continuous use in a production environment and includes additional mechanical safeguards to prevent damage to the compute sled, as well as safeguards to protect the user from electrical hazards.
Minibar Lite, a small form factor version of Minibar, omits the blind-mate backplane interface and repackages the Minibar electronics in a 3D-printed case that installs on top of the compute sled. Minibar Lite is designed to be deployed in laboratory enviromnents where it will be semi-permanently installed on compute sleds that are being used for software development.

Both Minibar form factors use the same electronics and similar cables of different length.

Minibar

Minibar’s production tester configuration mimics a single cubby partition in an Oxide Rack, including the blind-mate backplane. Compute sleds slide into the Minibar cubby and lock in place with the same latching handle and hardware users would encounter in the Oxide Rack. The Minibar electronics are installed in the back of the cubby and connect to the compute sled using the same floating backplane cartridge and backplane cables used in the Oxide Rack. Instead of more backplane cables, Minibar’s rear panel has a power button, a power connector, three Ethernet ports, and a PCIe slot.

Minibar’s cubby and enclosure are constructed from extruded aluminum framing and 3D-printed plastic parts rather than a bulk sheet metal design to minimize design time, cost, mass, and tooling NRE. A fully-assembled minibar production tester measures 1.07 meter long, 305 mm wide, and 135 mm tall.

Minibar Manufacturing Tester

Driven by the need for a blind-mateable interface and the desire for a long-term tester solution, the signal and power interconnect between the Minibar tester PCBA and the compute sled PCBA occurs via patch cables. This supports a longer-term use of the tester PCBAs, as the Samtec Examax connectors used are only rated for 250 mating cycles. Without an intermediate cable to act as a sacrificial link in the interconnect chain which can be rotated or replaced as part of periodic tester maintenance, the tester PCBAs would conservatively only last long enough to build approximately 5 full racks, assuming 1-2 mating cycles per test. While it is unlikely that any lab-use versions would see this number of mating cycles, the desire for a singular PCBA and interconnect design drives this decision. Additionally, the use of a flexible mating mechanism allows flexibility in placement in the lab-use designs, which should allow for a design with a smaller overall footprint.

Minibar Lite

At over 1 meter long, Minibar manufacturing testers were too large to fit on the shelves in Oxide’s lab, so Minibar’s lab configuration, called Minibar Lite, packages the Minibar electronics in a 3D printed enclosure that mounts on top of the compute sled. Minibar Lite does not blind mate to the compute sled; instead, users manually install the power cable and backplane cables and secure them with 3D-printed retention clips. Additionally, the overall materials cost of this configuration is significantly less than the production version.

Minibar Lite

Safety Considerations

Mechanical Safety

Minibar contains two physical controls to help prevent compute sleds from being damaged during programming and testing:

Minibar’s cubby uses the same floating backplane cartridge and locking mechanism as the cubbies in the Oxide Rack, which ensures safe, controlled, and consistent backplane connector mate/demate cycles. Additionally, Minibar maintains the same interaction patterns operators are used to when handling sleds. Installing a sled in Minibar is the same as installing a sled in an Oxide Rack during final assembly.
Once a compute sled is inserted in Minibar’s cubby, operators must press down on a second locking handle to plug in the programming adapter. The second locking handle clamps the sled in place using the fan tray and prevents operators from accidentally pulling a sled out of Minibar’s cubby while the programming cables are connected, which could damage the cables or break the programming header off the compute sled. To remove the sled, operators have to disengage the second locking handle, which automatically unplugs the programming adapter.

Electrical Safety

Minibar provides two physical controls to help prevent operators from coming into contact with the portions of the compute sled and Minibar electronics that operate at the bus voltage:

Minibar’s sled cubby and electronics cubby have sheet metal covers to keep fingers out. In normal operation, it isn’t possible for operators to touch portions of the compute sled and Minibar electronics that operate at the bus voltage.
Minibar does not enable its bus power output unless a sled is installed in the Minibar enclosure and all the backplane connectors are fully seated Minibar detects when a sled is connected by using one unused signal pin in each of the three ExaMAX backplane connectors for physical presence detection and will not enable sled power unless all three presence detection signals are asserted.

Security Considerations

Minibar is a small computer with persistent storage that connects to the sled’s power, Ignition, and management network interfaces. When Minibar is used on a programming station:

A threat actor could use Minibar to turn off power to the sled and interrupt the programming, testing, or locking processes at any point.
A threat actor can use Minibar’s Ignition controller to command the sled into the A3 power state any time after the Ignition target Flash has been programmed, which would interrupt the programming station work flow.
A threat actor could use Minibar to interrupt loading the host OS onto the internal M.2 SSDs process by turning off power to Minibar’s PCIe slot, or by using Minibar’s FPGA to reset the Ethernet NIC installed in Minibar’s PCIe slot.
A threat actor could configure Minibar’s management network switch so that Minibar’s SP and the sled’s SP were on the same VLAN. Once configured, the threat actor could then use Minibar to do anything to the sled that you could normally do over the management network.
A threat actor could use Minibar to communicate with the host CPU over the PCIe auxiliary I2C interface via Minibar’s FPGA. At the time of writing, Oxide doesn’t use this interface, and the I2C peripheral is not initialized in the compute sled. However, management sleds will eventually use the PCIe auxiliary I2C interface to retrieve the rack switch hardware configuration and MAC address from the rack switch SP before booting the Tofino 2 switch ASIC. Since this doesn’t have a specific implementation yet, we don’t currently understand the security considerations assocaited with having Minibar connect to the PCIe auxiliary I2C interface.

We can mitigate these security considerations by using Minibar’s RoT to perform measured boot validation on the SP firmware and the contents of Minibar’s auxiliary Flash, and have the programming station validate the measurements before each programming operation. We also have to weigh these security considerations against the clear need for Minibar’s hardware capabilities on the programming station.

Determinations

Oxide will build and deploy at least one Minibar for each compute sled programming station, and a TBD number of Minibar Lite units for lab use based on user demand. The total build quantity is expected to be 20-40 units.
We will understand more about Minibar’s security considerations over time, and in future hardware revisions, we may decide to make architectural changes or add additional electrical or physical controls to Minibars that are deployed in production environments. However, having the programming station perform measured boot validation on Minibar’s firmware is an acceptable mitigation for Minibar’s security concerns in the near term.

RFD 363 Minibar

Table of Contents