Viewing public RFDs.
RFD 630
BMR491 Glitch Mitigation Plan
RFD
630
Updated

This document analyzes our options for working around a design flaw in the Flex BMR491 IBC as used in all Oxide computers to date, including a proposed mitigation from Flex. It is a far lower level counterpart to RFD 596, which concerns the host-level recovery from conditions induced by the defect.

Background

The Flex BMR491 R1C and earlier, as installed on all of our machines produced to date, contains a design flaw that can cause its input undervoltage protection to misfire. This manifests as a momentary (milliseconds) droop on its 12 V output to something like 8 V, which it turns out computers don’t like very much.

Flex believes they have found the problem and will be fixing it in revision R1D. Some behavior in the auxiliary power supply causes transients on the internal circuit used for sensing the input voltage level, and the digital controller IC inside the BMR491 responds by briefly dropping the output.

However, the BMR491 is really hard to remove from our printed circuit boards, so we’d like to have a solution that doesn’t involve rebuilding every server we’ve made. So, Flex have proposed a configuration change that they say should mitigate the issue for the time being.

Flex’s proposed mitigation

Flex’s suggested change was delivered in a confidential document numbered BMR4912203/851. The document is quite short and doesn’t go into tremendous detail about the change being proposed (it winds up suggesting a list of hexadecimal register writes). We’ve taken the time to reverse engineer it.

Flex’s suggestion is: since it’s the VIN under-voltage protection that’s misfiring, turn it off. Instead, if under-voltage (and brown-out) protection is desired, use the output under-voltage protection, which they don’t appear to believe is affected by the same sort of design flaw.

Concretely, the register writes they proposed would have the following effects:

  1. Adjust the maximum duty cycle permitted in the power supply to 95%.

  2. Adjust the VOUT_COMMAND to exactly 12 V.

  3. Adjust the VOUT_UV_FAULT threshold to 11 V.

  4. Change the VOUT_UV_FAULT_RESPONSE to power off with no retries.

  5. Write all of this to persistent storage on the IBC, so that it becomes the new power-on default.

Working the math backwards

Those numbers might seem a bit arbitrarily chosen. It makes more sense than it initially appears, once you determine the input constraints.

These constraints appear to be:

  1. Input should be treated as undervoltage below 35 V. (This appears to be chosen based on the default undervoltage shutdown threshold.)

  2. The nominally 12 V output should be allowed to dip to 11 V. (This number I can’t explain, but it’s below the 11.54 V minimum loaded output voltage of the supply in expected operating conditions.)

We can use this to work out the rest of the math:

  • Per the datasheet, the BMR491 uses a 3:1 winding ratio on its transformer, and the representative fundamental circuit diagram on page 4 of the datasheet shows that this is being used to divide the input voltage by 3.

  • A 35 V input, divided by 3, would produce 11.67 V.

  • The "buck" DC-DC converter portion of the BMR491’s circuit limits this further through PWM (pulse width modulation), and the control electronics govern the output after PWM by adjusting the duty cycle — the fraction of a pulse cycle where the input is "on." As the input voltage drops, the control electronics will compensate by increasing the duty cycle of the converter, up to the maximum imposed by configuration.

  • Flex’s proposed maximum duty cycle (95%) prevents the converter from making it all the way up to 11.67 V for 35 V input. Instead, it stops at about 11.08 V.

  • This helps to ensure that the 11 V undervoltage fault trips at just below 35 V input.

Trying to rederive Flex’s constants

Flex’s mitigation document indicates that the MAX_DUTY register should be written with hex 0xF8EA, which they describe as 0.9428 or 0.95 depending on where you look in the document. They also provide a screenshot from a Word document describing PMBus register writes in hex, which gives a value that differs in one character: 0xF0EA.

Both of these values are wrong.

The MAX_DUTY register is specified by PMBus, but the standard gives the vendor leeway on the format used:

The PMBus device product literature shall clearly state which format the device uses.

PMBus Specification Rev 1.3.1 Part II

Fortunately, the "device product literature" (which is to say, Flex’s technical specification) does in fact specify the format used in MAX_DUTY as a PMBus LINEAR11 value.

LINEAR11 is a slightly idiosyncratic PMBus encoding for a 16-bit unsigned floating point number.

  • The most significant 5 bits are the exponent in two’s complement format.

  • The lower 11 bits are the mantissa as an unsigned integer.

Concretely, you can convert a LINEAR11 value to a decimal number using this Python fragment:

def linear11(bits):
exp5 = bits >> 11
mantissa = bits & ((1 << 11) - 1)
if exp5 >= 16:
exp = -((exp5 ^ 0b11111) + 1)
else:
exp = exp5
return mantissa * 2**exp

The technical specification says that the MAX_DUTY register defaults to a 99% duty cycle cap, and has the hex value 0xEB18. (We have also verified that an R1C BMR491 reports 0xEB18 in reality.) This value makes sense:

>>> linear11(0xeb18)
99.0

However, the two values given by Flex as proposed mitigation values do not make sense.

>>> linear11(0xf0ea)
58.5
>>> linear11(0xf8ea)
117.0

In fact, we tried loading these values onto a BMR491. The higher value was rejected. The lower value forced the power supply to a max duty cycle of 58.5%, which caps its unloaded output voltage to 10.53V, and drops rapidly lower under load. This proved insufficient to boot a server.

But, given the reverse engineering of the method described above, we can work out our own encoding for 95%:

>>> linear11(0xeaf8)
95.0

You might observe that this value is one of the two values Flex sent, but with its bytes swapped. It appears that at some point between devising the mitigation and writing the document, Flex had an internal endianness disagreement.

Does this make sense for Oxide?

The choice of 35 V as the critical input threshold is surprising. The BMR491 can only produce its full rated power down to an input of 40 V; below that, it becomes progressively less powerful and efficient. Below 36 V it becomes electrically incapable of maintaining 12 V output. Since our nominal bus bar input is 54 V, 35 V is 65% of normal voltage.

Because our systems are designed for hotplug, we have a hotplug controller IC (an ADM1272) upstream of the BMR491, controlling its input. It turns out that on most of our systems, the BMR491 will never make it to 35 V, because of that hotplug controller’s undervoltage fault setting:

SystemHotplug VIN threshold

Gimlet

~44.6 V

Sidecar

~44.6 V

Cosmo

31 V

Below these voltages, the hotplug controller will shut off power to the system, including the BMR491.

Determinations

In summary, expanded below:

  1. We should follow part of Flex’s advice and disable the VIN undervoltage check on BMR491 R1C. Concretely, by writing the VIN_OFF register to a very low number, asking it to tolerate arbitrarily low input voltage levels.

  2. We should enable the VOUT undervoltage check on machines with a hotplug VIN threshold under 40 V. Currently, this is only Cosmo. On such machines, we may need to adjust the max duty cycle to achieve the check we want.

  3. We should not persist these settings, so that we can easily revisit the details in the future. We have a race-free opportunity for applying these settings during startup, described below.

Disabling the VIN undervoltage check on rev R1C

Flex’s description of the design flaw and its effects on undervoltage detection in the face of transients seems convincing, and we can conclude that undervoltage detection is effectively broken on the BMR491 rev R1C. We currently expect the detection to be fixed on rev R1D, and leaving it enabled seems nice if it works. Thus, we should try to only disable the feature where it’s broken.

Concretely,

  • The defective units respond to a block read of the PMBus MFR_REVISION field with a byte string beginning b"R1C " (and then followed by some other stuff). We can selectively enable the mitigation in response to this revision.

  • We can then write the VIN_OFF register to value 0, asking it to tolerate any input voltage. (This is the key part of Flex’s mitigation.)

Warning
This of course assumes that the fixed BMR491 will report a new revision R1D. Without samples in hand, we don’t know this for certain yet.

Enabling VOUT undervoltage and choosing a threshold

The BMR491 monitors its own output voltage and can cut power if it drops below a programmable level. Currently, that value is 0 V — that is, the feature is effectively disabled. Flex’s mitigation suggested using this to indirectly detect drops in the input voltage, with the obvious drawback that anything powered downstream of the BMR491 will notice the voltage drop before action is taken.

Note
Recall that on both Gimlet and Sidecar, the input to the IBC will be cut off by the hotplug controller far above 40 V, so we don’t really need to do this at all on those machines.

Under normal operation, the specified minimum output level for the BMR491 is 11.54 V, and devices like disks appear to generally tolerate about 10% variation (so down to 10.8 V in extreme cases). This suggests that Flex’s undervoltage threshold choice of 11 V is probably reasonable. (Our CPU voltage regulators are currently configured to start alerting us at 11.75 V; it is not currently clear how far their input voltage can sag before they start to have problems, but it’s likely lower than the threshold that upsets the disks.)

We could use Flex’s VIN level of 35 V by adopting their proposed mitigation directly, and setting max duty to about 95% (0xeaf8).

If we wanted to take the same approach but respond to VIN below 40 V, we would need to limit the duty cycle more aggressively, to 82.5%. This might have knock-on effects on the peak power available from the IBC, and is not obviously a good idea.

When to apply the changes and whether to persist

For background context, note that we do not currently persist any settings to the BMR491. We use it in the manufacturer’s configuration.

Flex proposed applying the mitigation once and persisting it to internal flash on the BMR491. We could do this, but we have another option that might be preferable.

During system power state A2, the Service Processor is "awake" and the BMR491 is running but essentially unloaded. The only loads on the BMR491’s output are the step-down supplies that produce power for the subset of the system that is powered in A2. These systems are comparatively tolerant of voltage droop because they only require input voltages of slightly above 5 V to function — so none of them noticed the output glitches to 8 V, because the 5 V and 3.3 V supplies were still stable. This means we currently do not have a race condition after power-on when the BMR491 defect can cause problems, until we transition out of A2.

So, we could have the Service Processor reconfigure the BMR491, if necessary, during A2.

The main advantage of this is flexibility. If we want to apply a different or more complex configuration change in a future firmware version, we don’t need to worry about "undoing" the previous persistent mitigation, and we don’t need to distinguish between BMR491s that have had the mitigation applied vs those newly installed in manufacturing.

This also simplifies things if we decide that a different subset of the fleet should receive the mitigation: for instance, if we decide to reconfigure BMR491 R1C on Gimlet and Sidecar after all, or if we detect a related issue and need to apply a change to the upcoming R1D revision.

Finally, this choice avoids a potential gotcha in the PMBus persistence operations. Quoth the spec,

It is permitted to use the STORE_USER_ALL command while the device is operating. However, the device may be unresponsive during the copy operation with unpredictable, undesirable or even catastrophic results. PMBus device users are urged to contact the PMBus device manufacturer about the consequences of using the STORE_USER_ALL command while the device is operating and providing output power.

PMBus Power System Mgt Protocol Specification – Part II – Revision 1.3.1

Flex has not provided assurances that STORE_USER_ALL will not behave in this way, and may have assumed that we’d apply the mitigation in some sort of manufacturing or remanufacturing environment, where disruptions could be tolerated. Since we’re planning on applying it in-system, it seems best to avoid the possibility of disruption.