observer: MWOCP67 PMBus Support#2549
Conversation
| const REVISION_LEN: usize = 14; | ||
|
|
||
| let mut data = [0u8; REVISION_LEN]; | ||
| let expected = b"XXXX-YYYY-0000"; | ||
|
|
||
| let len = self | ||
| .device | ||
| .read_block(CommandCode::MFR_REVISION as u8, &mut data) | ||
| .map_err(|code| Error::BadFirmwareRevRead { code })?; | ||
|
|
||
| // | ||
| // Per ACAN-157, we are expecting this to be of the format: | ||
| // | ||
| // XXXX-YYYY-0000 | ||
| // | ||
| // Where XXXX is the firmware revision on the primary MCU (AC input | ||
| // side) and YYYY is the firmware revision on the secondary MCU (DC | ||
| // output side). We aren't going to be rigid about the format of | ||
| // either revision, but we will be rigid about the rest of the format. | ||
| // | ||
| if len != REVISION_LEN { | ||
| return Err(Error::BadFirmwareRevLength); | ||
| } | ||
|
|
||
| for index in 0..len { | ||
| if expected[index] == b'X' || expected[index] == b'Y' { | ||
| continue; | ||
| } | ||
|
|
||
| if data[index] != expected[index] { | ||
| return Err(Error::BadFirmwareRev { index: index as u8 }); | ||
| } | ||
| } | ||
|
|
||
| // | ||
| // Return the primary MCU version | ||
| // | ||
| Ok(FirmwareRev([data[0], data[1], data[2], data[3]])) |
There was a problem hiding this comment.
This looks like the same format as the one used by the MWOCP68, right? That observation kinda makes me want to push for pulling out the parsing code into a common "Murata FRU ID" thingy. But, well...I don't think that actually gains us all that much, I just have some aesthetic desire to not duplicate things.
There was a problem hiding this comment.
I pulled it out into a mwocp6x module along with the shared types. What do you think of the new module structure? It might need to change later if the firmware update process ends up being significantly different, but hopefully not.
There was a problem hiding this comment.
It's unfortunate that we can't share more of the implementations - they look similar but use different types from the pmbus crate.
| // ACAN-157 doesn't specify what page this is on. | ||
| // Assume it's on page 0, as it is on the better-documented mwocp68. |
There was a problem hiding this comment.
Out of curiosity, have we tested what happens when we read this (and the other status registers) on page 1? Does it also work, or does the MCU get mad at us?
There was a problem hiding this comment.
I tried reading a bunch of these registers on both pages and always got the same value on both. I don't know if that's because the page is ignored or they happened to be the same on both pages.
There was a problem hiding this comment.
Based on experience with other PMBus devices, I would guess that it's just that the value was the same on both pages; I'd expect that it should be possible to see a fault on one rail and not the other. I think we could probably test this by setting a warning threshold on one page (say VIN_WARN_LIMIT or something) but not the other, and seeing if the warning bit gets set on one page and not the other. I could be wrong about this, but I think it's worth investigating, because the current behavior of both MWOCPx drivers always sending page 0 feels sketchy to me.
I'm fine with merging this as-is and investigating as a follow-up though.
There was a problem hiding this comment.
I'll leave a comment on #2410 about investigating this more
| Slot::Psu5 => '5', | ||
| }; | ||
| FixedString::try_from_utf8(&v54_psu[..]).unwrap_lite() | ||
| Mwocp6x::rail_name(psu_label) |
There was a problem hiding this comment.
I found the decision to move this into the drivers a bit weird at first once I saw it, but now that I see how it's being used, it's not unreasonable. However, I don't love putting the rail name in the drv-i2c-devices crate, since we could imagine a universe where we have a MWOCP6x PSU on a board where we use different rail names (although admittedly, that's unlikely). Personally, I might have considered just leaving all the code here, and having one of two different "base" strings ("V54_PSUx" or "V50_PSUx") based on the board cfg. I'm not super bothered by the current approach, but I feel like it might be a bit simpler that way. Up to you!
There was a problem hiding this comment.
Yeah I don't love this approach either. On one hand it seems like the Mwocp67 class shouldn't need to know about rail naming conventions, and on the other hand scattering too many cfgs in the psc-seq-server could get confusing. So is this what you're suggesting?
let rail = {
#[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
let mut rail_name = *b"V54_PSUx";
#[cfg(target_board = "observer-a")]
let mut rail_name = *b"V50_MAIN_PSUx";
rail_name[rail_name.len() - 1] = match self.slot {
Slot::Psu0 => b'0',
...
};
FixedString::try_from_utf8(&rail_name[..]).unwrap_lite()
};There was a problem hiding this comment.
Actually it's a problem that the rail names are different lengths now but EreportFields is still using FixedString<8>.
There was a problem hiding this comment.
Yeah, we will have to either just make it longer and null pad it on the MWOCP68 version, or make the const size depend on which board we're building. But I do think having all of this right there in that file feels better to me, rather than hiding it in the driver.
There was a problem hiding this comment.
FWIW I agree that the names should be here in psc-seq-server, rather than in the I2C driver (which should be rail-agnostic).
There was a problem hiding this comment.
Done. I should probably test this... any suggestions for how to generate an ereport and see if the rail name looks good?
There was a problem hiding this comment.
Tested! I generated a spurious ereport and read it with humility. The rail name looks good.
| Device::Tmp117 => { | ||
| for &s in self.temperature_sensors.iter() { | ||
| let m = Tmp117::new(&dev); | ||
| match m.read_temperature() { | ||
| Ok(v) => sensor_api.post_now(s, v.0), | ||
| Err(e) => { | ||
| let e = Error::Tmp117Error(e); | ||
| ringbuf_entry!(Trace::TemperatureReadFailed(s, e)); | ||
| sensor_api.nodata_now(s, e.into()) | ||
| } | ||
| } | ||
| } | ||
| // The tmp117 is just a temperature sensor, not a speed sensor. | ||
| assert_eq!(self.speed_sensors.len(), 0); | ||
| } |
There was a problem hiding this comment.
Am I correct that, after this PR merges, if we were to add a temperature sensor to the TMP117 in the PSC config, we would also be able to fix #2547?
There was a problem hiding this comment.
Yeah you'd need to update the PSC's base.toml, add an entry to the SENSORS array in sensor-polling, and test that the ComponentDetails part works (not just humility sensors).
| // ACAN-157 doesn't specify what page this is on. | ||
| // Assume it's on page 0, as it is on the better-documented mwocp68. |
There was a problem hiding this comment.
Based on experience with other PMBus devices, I would guess that it's just that the value was the same on both pages; I'd expect that it should be possible to see a fault on one rail and not the other. I think we could probably test this by setting a warning threshold on one page (say VIN_WARN_LIMIT or something) but not the other, and seeing if the warning bit gets set on one page and not the other. I could be wrong about this, but I think it's worth investigating, because the current behavior of both MWOCPx drivers always sending page 0 feels sketchy to me.
I'm fine with merging this as-is and investigating as a follow-up though.
| @@ -308,6 +321,7 @@ const ALL_PSU_PWR_OK_PINS: sys_api::PinSet = | |||
|
|
|||
| // Our notification configuration system doesn't have any concept of arrays, so, | |||
| // collect its predefined masks into convenient arrays. | |||
| #[cfg(any(target_board = "psc-b", target_board = "psc-c"))] | |||
| const PSU_PWR_OK_NOTIF: [u32; PSU_COUNT] = [ | |||
| notifications::PSU_PWR_OK_1_MASK, | |||
| notifications::PSU_PWR_OK_2_MASK, | |||
| @@ -316,8 +330,18 @@ const PSU_PWR_OK_NOTIF: [u32; PSU_COUNT] = [ | |||
| notifications::PSU_PWR_OK_5_MASK, | |||
| notifications::PSU_PWR_OK_6_MASK, | |||
| ]; | |||
| #[cfg(target_board = "observer-a")] | |||
| const PSU_PWR_OK_NOTIF: [u32; PSU_COUNT] = [ | |||
| notifications::PSU_PWR_OK_0_MASK, | |||
| notifications::PSU_PWR_OK_1_MASK, | |||
| notifications::PSU_PWR_OK_2_MASK, | |||
| notifications::PSU_PWR_OK_3_MASK, | |||
| notifications::PSU_PWR_OK_4_MASK, | |||
| notifications::PSU_PWR_OK_5_MASK, | |||
| ]; | |||
|
|
|||
| /// In order to get the PMBus devices by PSU index, we need a little lookup table guy. | |||
| #[cfg(any(target_board = "psc-b", target_board = "psc-c"))] | |||
| const PSU_PMBUS_DEVS: [fn(TaskId) -> (I2cDevice, u8); PSU_COUNT] = [ | |||
| i2c_config::pmbus::v54_psu0, | |||
| i2c_config::pmbus::v54_psu1, | |||
| @@ -326,6 +350,15 @@ const PSU_PMBUS_DEVS: [fn(TaskId) -> (I2cDevice, u8); PSU_COUNT] = [ | |||
| i2c_config::pmbus::v54_psu4, | |||
| i2c_config::pmbus::v54_psu5, | |||
| ]; | |||
| #[cfg(target_board = "observer-a")] | |||
| const PSU_PMBUS_DEVS: [fn(TaskId) -> (I2cDevice, u8); PSU_COUNT] = [ | |||
| i2c_config::pmbus::v50_main_psu0, | |||
| i2c_config::pmbus::v50_main_psu1, | |||
| i2c_config::pmbus::v50_main_psu2, | |||
| i2c_config::pmbus::v50_main_psu3, | |||
| i2c_config::pmbus::v50_main_psu4, | |||
| i2c_config::pmbus::v50_main_psu5, | |||
| ]; | |||
There was a problem hiding this comment.
not a huge deal, but a fairly common practice in tasks that support multiple slightly different boards with mostly common code is to put all of the board specific stuff in a BSP (Board Support Package) submodule which is #[cfg(...)] flagged for the appropriate board, rather than having to put a bunch of #[cfg(...)] attributes on every definition. so something like this:
#[cfg(target_board = "observer-a")]
mod bsp {
pub(crate) const PSU_ENABLE_L_PORT = ...;
pub(crate) const PSU_ENABLE_L_PINS = ...;
pub(crate) const ALL_PSU_ENABLE_L_PINS = ...;
pub(crate) const PSU_PWR_OK_NOTIF = ...;
pub(crate) const PSU_PMBUS_DEVS = ...;
}
#[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
mod bsp {
pub(crate) const PSU_ENABLE_L_PORT = ...;
pub(crate) const PSU_ENABLE_L_PINS = ...;
pub(crate) const ALL_PSU_ENABLE_L_PINS = ...;
pub(crate) const PSU_PWR_OK_NOTIF = ...;
pub(crate) const PSU_PMBUS_DEVS = ...;
}
use self::bsp::*;this ends up being a little neater most of the time, IMO.
There was a problem hiding this comment.
Another pattern is to have an explicit bsp module, then change its path depending on board, e.g.
#[cfg_attr(
any(target_board = "minibar-a", target_board = "minibar-b"),
path = "bsp/minibar.rs"
)]
#[cfg_attr(
any(target_board = "cosmo-a", target_board = "cosmo-b"),
path = "bsp/cosmo_ab.rs"
)]
mod bsp;There was a problem hiding this comment.
I created a bsp module, but it got a bit messy and might need some tweaks.
- I duplicated the PRESENT and OK pin arrays in both files even though they happen to be the same on both boards. I think that's cleaner than interleaving shared and board-specific arrays, but maybe not.
- The bsp files need to use the PSU_COUNT constant in their array definitions. I had them each define their own constant then I assert that's its equal to 6. I guess I could define a shared PSU_COUNT somewhere else instead.
- The bsp files need to use the codegen'ed
notificationsandi2c_configmodules, so I stuckinclude!(concat!(env!("OUT_DIR"), "/notifications.rs"));etc in both of them. Not sure how weird that is.
There was a problem hiding this comment.
- I duplicated the PRESENT and OK pin arrays in both files even though they happen to be the same on both boards. I think that's cleaner than interleaving shared and board-specific arrays, but maybe not.
I think this is fine; just because it happens to be the same on both boards doesn't mean it needs to be.
- The bsp files need to use the PSU_COUNT constant in their array definitions. I had them each define their own constant then I assert that's its equal to 6. I guess I could define a shared PSU_COUNT somewhere else instead.
Hmm, I was going to say that we should actually not have to do this and that all the code should basically just work with arrays of any length, but then I remembered about the Slot enum which has six slot numbers. That kinda makes me think that it would be nicer to just define PSU_COUNT in main.rs and have the BSPs use super::PSU_COUNT, rather than asserting that every BSP uses 6, which feels a bit convoltued.
- The bsp files need to use the codegen'ed
notificationsandi2c_configmodules, so I stuckinclude!(concat!(env!("OUT_DIR"), "/notifications.rs"));etc in both of them. Not sure how weird that is.
This seems fine to me. You could also leave them in main.rs and have the BSPs use super::i2c_config and super::notifications, but since (at a glance) they are now basically just referenced in the BSP modules, making them belong to the BSP is also totally fine.
There was a problem hiding this comment.
Ok I moved PSU_COUNT to main. I've tended to think of use super:: as a code-smell but I don't think that's actually true, and it's definitely an improvement in this situation.
There was a problem hiding this comment.
I moved the codegen out too because main was using bsp::notifications::TIMER_MASK.
hawkw
left a comment
There was a problem hiding this comment.
One last thing. After that, I think this will be all good!
| // Currently the psc-seq-server only supports boards with 6 PSU slots. | ||
| const _: () = assert!(bsp::PSU_COUNT == 6); |
There was a problem hiding this comment.
As I mentioned, I think I'd really prefer we just define PSU_COUNT in main and make all the BSPs reference it as super::PSU_COUNT. Making the BSP define the constant and then asserting that it defined the right thing feels a bit goofy.
(also, the place where we initialize a [Slot; bsp::PSU_COUNT] to a 6-element array literal also serves as a slightly more implicit version of this assertion...but again, I think we should just not be asserting it)
hawkw
left a comment
There was a problem hiding this comment.
looks good to me! thank you for all your hard work on this!
This PR implements oxidecomputer/pmbus#28. It:
psc-seq-serverandobserver-seq-serverback into one task, since the differences can be managed by a fewcfgs. (I considered using anMwocp6Xtrait instead to make it generic over PSU models, but that's hard because of the methods that return device-specificCommandDatatypes).framulatortask to dev builds, to test that the FRAM works.Known issues / remaining todos:
Testing: