This document analyzes our options for working around a design flaw in the Flex BMR491 IBC as used in all Oxide computers to date, including a proposed mitigation from Flex. It is a far lower level counterpart to RFD 596, which concerns the host-level recovery from conditions induced by the defect.
Background
The Flex BMR491 R1C and earlier, as installed on all of our machines produced to date, contains a design flaw that can cause its input undervoltage protection to misfire. This manifests as a momentary (milliseconds) droop on its 12 V output to something like 8 V, which it turns out computers don’t like very much.
Flex believes they have found the problem and will be fixing it in revision R1D. Some behavior in the auxiliary power supply causes transients on the internal circuit used for sensing the input voltage level, and the digital controller IC inside the BMR491 responds by briefly dropping the output.
However, the BMR491 is really hard to remove from our printed circuit boards, so we’d like to have a solution that doesn’t involve rebuilding every server we’ve made. So, Flex have proposed a configuration change that they say should mitigate the issue for the time being.
Flex’s proposed mitigation
Flex’s suggested change was delivered in a confidential document numbered BMR4912203/851. The document is quite short and doesn’t go into tremendous detail about the change being proposed (it winds up suggesting a list of hexadecimal register writes). We’ve taken the time to reverse engineer it.
Flex’s suggestion is: since it’s the VIN under-voltage protection that’s misfiring, turn it off. Instead, if under-voltage (and brown-out) protection is desired, use the output under-voltage protection, which they don’t appear to believe is affected by the same sort of design flaw.
Concretely, the register writes they proposed would have the following effects:
Adjust the maximum duty cycle permitted in the power supply to 95%.
Adjust the
VOUT_COMMANDto exactly 12 V.Adjust the
VOUT_UV_FAULTthreshold to 11 V.Change the
VOUT_UV_FAULT_RESPONSEto power off with no retries.Write all of this to persistent storage on the IBC, so that it becomes the new power-on default.
Working the math backwards
Those numbers might seem a bit arbitrarily chosen. It makes more sense than it initially appears, once you determine the input constraints.
These constraints appear to be:
Input should be treated as undervoltage below 35 V. (This appears to be chosen based on the default undervoltage shutdown threshold.)
The nominally 12 V output should be allowed to dip to 11 V. (This number I can’t explain, but it’s below the 11.54 V minimum loaded output voltage of the supply in expected operating conditions.)
We can use this to work out the rest of the math:
Per the datasheet, the BMR491 uses a 3:1 winding ratio on its transformer, and the representative fundamental circuit diagram on page 4 of the datasheet shows that this is being used to divide the input voltage by 3.
A 35 V input, divided by 3, would produce 11.67 V.
The "buck" DC-DC converter portion of the BMR491’s circuit limits this further through PWM (pulse width modulation), and the control electronics govern the output after PWM by adjusting the duty cycle — the fraction of a pulse cycle where the input is "on." As the input voltage drops, the control electronics will compensate by increasing the duty cycle of the converter, up to the maximum imposed by configuration.
Flex’s proposed maximum duty cycle (95%) prevents the converter from making it all the way up to 11.67 V for 35 V input. Instead, it stops at about 11.08 V.
This helps to ensure that the 11 V undervoltage fault trips at just below 35 V input.
Trying to rederive Flex’s constants
Flex’s mitigation document indicates that the MAX_DUTY register should be
written with hex 0xF8EA, which they describe as 0.9428 or 0.95 depending on
where you look in the document. They also provide a screenshot from a Word
document describing PMBus register writes in hex, which gives a value that
differs in one character: 0xF0EA.
Both of these values are wrong.
The MAX_DUTY register is specified by PMBus, but the standard gives the vendor
leeway on the format used:
The PMBus device product literature shall clearly state which format the device uses.
Fortunately, the "device product literature" (which is to say, Flex’s technical
specification) does in fact specify the format used in MAX_DUTY as a PMBus
LINEAR11 value.
LINEAR11 is a slightly idiosyncratic PMBus encoding for a 16-bit unsigned
floating point number.
The most significant 5 bits are the exponent in two’s complement format.
The lower 11 bits are the mantissa as an unsigned integer.
Concretely, you can convert a LINEAR11 value to a decimal number using this
Python fragment:
def linear11(bits):
exp5 = bits >> 11
mantissa = bits & ((1 << 11) - 1)
if exp5 >= 16:
exp = -((exp5 ^ 0b11111) + 1)
else:
exp = exp5
return mantissa * 2**expThe technical specification says that the MAX_DUTY register defaults to a 99%
duty cycle cap, and has the hex value 0xEB18. (We have also verified that an
R1C BMR491 reports 0xEB18 in reality.) This value makes sense:
>>> linear11(0xeb18)
99.0However, the two values given by Flex as proposed mitigation values do not make sense.
>>> linear11(0xf0ea)
58.5
>>> linear11(0xf8ea)
117.0In fact, we tried loading these values onto a BMR491. The higher value was rejected. The lower value forced the power supply to a max duty cycle of 58.5%, which caps its unloaded output voltage to 10.53V, and drops rapidly lower under load. This proved insufficient to boot a server.
But, given the reverse engineering of the method described above, we can work out our own encoding for 95%:
>>> linear11(0xeaf8)
95.0You might observe that this value is one of the two values Flex sent, but with its bytes swapped. It appears that at some point between devising the mitigation and writing the document, Flex had an internal endianness disagreement.
Does this make sense for Oxide?
The choice of 35 V as the critical input threshold is surprising. The BMR491 can only produce its full rated power down to an input of 40 V; below that, it becomes progressively less powerful and efficient. Below 36 V it becomes electrically incapable of maintaining 12 V output. Since our nominal bus bar input is 54 V, 35 V is 65% of normal voltage.
Because our systems are designed for hotplug, we have a hotplug controller IC (an ADM1272) upstream of the BMR491, controlling its input. It turns out that on most of our systems, the BMR491 will never make it to 35 V, because of that hotplug controller’s undervoltage fault setting:
| System | Hotplug VIN threshold |
|---|---|
Gimlet | ~44.6 V |
Sidecar | ~44.6 V |
Cosmo | 31 V |
Below these voltages, the hotplug controller will shut off power to the system, including the BMR491.
Determinations
In summary, expanded below:
We should follow part of Flex’s advice and disable the VIN undervoltage check on BMR491 R1C. Concretely, by writing the
VIN_OFFregister to a very low number, asking it to tolerate arbitrarily low input voltage levels.We should enable the VOUT undervoltage check on machines with a hotplug VIN threshold under 40 V. Currently, this is only Cosmo. On such machines, we may need to adjust the max duty cycle to achieve the check we want.
We should not persist these settings, so that we can easily revisit the details in the future. We have a race-free opportunity for applying these settings during startup, described below.
Disabling the VIN undervoltage check on rev R1C
Flex’s description of the design flaw and its effects on undervoltage detection in the face of transients seems convincing, and we can conclude that undervoltage detection is effectively broken on the BMR491 rev R1C. We currently expect the detection to be fixed on rev R1D, and leaving it enabled seems nice if it works. Thus, we should try to only disable the feature where it’s broken.
Concretely,
The defective units respond to a block read of the PMBus
MFR_REVISIONfield with a byte string beginningb"R1C "(and then followed by some other stuff). We can selectively enable the mitigation in response to this revision.We can then write the
VIN_OFFregister to value 0, asking it to tolerate any input voltage. (This is the key part of Flex’s mitigation.)
R1D. Without samples in hand, we don’t know this for certain yet.Enabling VOUT undervoltage and choosing a threshold
The BMR491 monitors its own output voltage and can cut power if it drops below a programmable level. Currently, that value is 0 V — that is, the feature is effectively disabled. Flex’s mitigation suggested using this to indirectly detect drops in the input voltage, with the obvious drawback that anything powered downstream of the BMR491 will notice the voltage drop before action is taken.
Under normal operation, the specified minimum output level for the BMR491 is 11.54 V, and devices like disks appear to generally tolerate about 10% variation (so down to 10.8 V in extreme cases). This suggests that Flex’s undervoltage threshold choice of 11 V is probably reasonable. (Our CPU voltage regulators are currently configured to start alerting us at 11.75 V; it is not currently clear how far their input voltage can sag before they start to have problems, but it’s likely lower than the threshold that upsets the disks.)
We could use Flex’s VIN level of 35 V by adopting their proposed mitigation
directly, and setting max duty to about 95% (0xeaf8).
If we wanted to take the same approach but respond to VIN below 40 V, we would need to limit the duty cycle more aggressively, to 82.5%. This might have knock-on effects on the peak power available from the IBC, and is not obviously a good idea.
When to apply the changes and whether to persist
For background context, note that we do not currently persist any settings to the BMR491. We use it in the manufacturer’s configuration.
Flex proposed applying the mitigation once and persisting it to internal flash on the BMR491. We could do this, but we have another option that might be preferable.
During system power state A2, the Service Processor is "awake" and the BMR491 is running but essentially unloaded. The only loads on the BMR491’s output are the step-down supplies that produce power for the subset of the system that is powered in A2. These systems are comparatively tolerant of voltage droop because they only require input voltages of slightly above 5 V to function — so none of them noticed the output glitches to 8 V, because the 5 V and 3.3 V supplies were still stable. This means we currently do not have a race condition after power-on when the BMR491 defect can cause problems, until we transition out of A2.
So, we could have the Service Processor reconfigure the BMR491, if necessary, during A2.
The main advantage of this is flexibility. If we want to apply a different or more complex configuration change in a future firmware version, we don’t need to worry about "undoing" the previous persistent mitigation, and we don’t need to distinguish between BMR491s that have had the mitigation applied vs those newly installed in manufacturing.
This also simplifies things if we decide that a different subset of the fleet should receive the mitigation: for instance, if we decide to reconfigure BMR491 R1C on Gimlet and Sidecar after all, or if we detect a related issue and need to apply a change to the upcoming R1D revision.
Finally, this choice avoids a potential gotcha in the PMBus persistence operations. Quoth the spec,
It is permitted to use the STORE_USER_ALL command while the device is operating. However, the device may be unresponsive during the copy operation with unpredictable, undesirable or even catastrophic results. PMBus device users are urged to contact the PMBus device manufacturer about the consequences of using the STORE_USER_ALL command while the device is operating and providing output power.
Flex has not provided assurances that STORE_USER_ALL will not behave in this
way, and may have assumed that we’d apply the mitigation in some sort of
manufacturing or remanufacturing environment, where disruptions could be
tolerated. Since we’re planning on applying it in-system, it seems best to avoid
the possibility of disruption.