Skip to content

observer: MWOCP67 PMBus Support#2549

Merged
evan-oxide merged 27 commits into
masterfrom
evan/observer-pmbus
Jun 10, 2026
Merged

observer: MWOCP67 PMBus Support#2549
evan-oxide merged 27 commits into
masterfrom
evan/observer-pmbus

Conversation

@evan-oxide

@evan-oxide evan-oxide commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

This PR implements oxidecomputer/pmbus#28. It:

  • Adds support for PMBus communication with the MWOCP67 PSUs.
  • Combines the psc-seq-server and observer-seq-server back into one task, since the differences can be managed by a few cfgs. (I considered using an Mwocp6X trait instead to make it generic over PSU models, but that's hard because of the methods that return device-specific CommandData types).
  • Polls the onboard tmp117 temperature sensor.
  • Adds the framulator task to dev builds, to test that the FRAM works.

Known issues / remaining todos:

  • It's still unclear how many power rails the MWOCP67 actually has, what their voltages are, and what pages are used to access them. It looks suspiciously like it's reporting stats for the same 50V rail on both pages 0 and 1. There's also insufficient documentation about which other registers are paged.
  • The MWOCP67 has 2 new temperature sensors on the positive and negative bus bar clips and they both report ~170°C. This is probably incorrect.
  • I wasn't able to manually control the fan speed. I don't know if we care.
  • I'm not sure what model of EEPROM is built into the PSUs. I'm assuming it's still the M24C02, like the MWOCP68 had, and I was able to read some bytes off it.
  • I skipped implementing the blackbox feature because it looks complicated and we weren't using the (somewhat different) blackbox feature on the PSC.
  • PSU firmware updates aren't supported yet.

Testing:

  • Basic PMBus reads:
    • Can read model number, manufacturer ID, and serial number
    • Can read temperatures 1, 2 and 3
    • Can read both bus bar clip temperatures
    • Can read fan speed
    • Can read status_word
  • humility dashboard shows 5 temps and 1 fan speed per PSU, plus the onboard tmp117
  • humility power shows stats for power rails (though they might be wrong, see above)
  • When someone removes and reinserts a PSU, I see appropriate events in the ringbuf
  • Can read something from the at24csw080 VPD eeprom
  • Can read something from the PSUs' m24c02 eeproms
  • rail name looks right in ereports

@evan-oxide evan-oxide added the Observer PSC + 5.5 kW fun. label Jun 4, 2026
@evan-oxide evan-oxide marked this pull request as ready for review June 4, 2026 18:34
@evan-oxide evan-oxide requested review from hawkw and mkeeter June 4, 2026 19:58
Comment thread app/observer/base.toml
Comment thread app/observer/dev.toml Outdated
Comment thread drv/i2c-devices/src/mwocp67.rs Outdated
Comment thread drv/i2c-devices/src/mwocp6x/mwocp67.rs Outdated
Comment thread drv/i2c-devices/src/mwocp67.rs Outdated
Comment on lines +411 to +448
const REVISION_LEN: usize = 14;

let mut data = [0u8; REVISION_LEN];
let expected = b"XXXX-YYYY-0000";

let len = self
.device
.read_block(CommandCode::MFR_REVISION as u8, &mut data)
.map_err(|code| Error::BadFirmwareRevRead { code })?;

//
// Per ACAN-157, we are expecting this to be of the format:
//
// XXXX-YYYY-0000
//
// Where XXXX is the firmware revision on the primary MCU (AC input
// side) and YYYY is the firmware revision on the secondary MCU (DC
// output side). We aren't going to be rigid about the format of
// either revision, but we will be rigid about the rest of the format.
//
if len != REVISION_LEN {
return Err(Error::BadFirmwareRevLength);
}

for index in 0..len {
if expected[index] == b'X' || expected[index] == b'Y' {
continue;
}

if data[index] != expected[index] {
return Err(Error::BadFirmwareRev { index: index as u8 });
}
}

//
// Return the primary MCU version
//
Ok(FirmwareRev([data[0], data[1], data[2], data[3]]))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same format as the one used by the MWOCP68, right? That observation kinda makes me want to push for pulling out the parsing code into a common "Murata FRU ID" thingy. But, well...I don't think that actually gains us all that much, I just have some aesthetic desire to not duplicate things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled it out into a mwocp6x module along with the shared types. What do you think of the new module structure? It might need to change later if the firmware update process ends up being significantly different, but hopefully not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that we can't share more of the implementations - they look similar but use different types from the pmbus crate.

Comment on lines +490 to +491
// ACAN-157 doesn't specify what page this is on.
// Assume it's on page 0, as it is on the better-documented mwocp68.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have we tested what happens when we read this (and the other status registers) on page 1? Does it also work, or does the MCU get mad at us?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reading a bunch of these registers on both pages and always got the same value on both. I don't know if that's because the page is ignored or they happened to be the same on both pages.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on experience with other PMBus devices, I would guess that it's just that the value was the same on both pages; I'd expect that it should be possible to see a fault on one rail and not the other. I think we could probably test this by setting a warning threshold on one page (say VIN_WARN_LIMIT or something) but not the other, and seeing if the warning bit gets set on one page and not the other. I could be wrong about this, but I think it's worth investigating, because the current behavior of both MWOCPx drivers always sending page 0 feels sketchy to me.

I'm fine with merging this as-is and investigating as a follow-up though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave a comment on #2410 about investigating this more

Comment thread drv/psc-seq-server/src/main.rs Outdated
Comment thread drv/psc-seq-server/src/main.rs Outdated
Slot::Psu5 => '5',
};
FixedString::try_from_utf8(&v54_psu[..]).unwrap_lite()
Mwocp6x::rail_name(psu_label)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the decision to move this into the drivers a bit weird at first once I saw it, but now that I see how it's being used, it's not unreasonable. However, I don't love putting the rail name in the drv-i2c-devices crate, since we could imagine a universe where we have a MWOCP6x PSU on a board where we use different rail names (although admittedly, that's unlikely). Personally, I might have considered just leaving all the code here, and having one of two different "base" strings ("V54_PSUx" or "V50_PSUx") based on the board cfg. I'm not super bothered by the current approach, but I feel like it might be a bit simpler that way. Up to you!

@evan-oxide evan-oxide Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't love this approach either. On one hand it seems like the Mwocp67 class shouldn't need to know about rail naming conventions, and on the other hand scattering too many cfgs in the psc-seq-server could get confusing. So is this what you're suggesting?

let rail = {
    #[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
    let mut rail_name = *b"V54_PSUx";
    #[cfg(target_board = "observer-a")]
    let mut rail_name = *b"V50_MAIN_PSUx";

    rail_name[rail_name.len() - 1] = match self.slot {
        Slot::Psu0 => b'0',
        ...
    };
    FixedString::try_from_utf8(&rail_name[..]).unwrap_lite()
};

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's a problem that the rail names are different lengths now but EreportFields is still using FixedString<8>.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we will have to either just make it longer and null pad it on the MWOCP68 version, or make the const size depend on which board we're building. But I do think having all of this right there in that file feels better to me, rather than hiding it in the driver.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree that the names should be here in psc-seq-server, rather than in the I2C driver (which should be rail-agnostic).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I should probably test this... any suggestions for how to generate an ereport and see if the rail name looks good?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested! I generated a spurious ereport and read it with humility. The rail name looks good.

Comment on lines +141 to +155
Device::Tmp117 => {
for &s in self.temperature_sensors.iter() {
let m = Tmp117::new(&dev);
match m.read_temperature() {
Ok(v) => sensor_api.post_now(s, v.0),
Err(e) => {
let e = Error::Tmp117Error(e);
ringbuf_entry!(Trace::TemperatureReadFailed(s, e));
sensor_api.nodata_now(s, e.into())
}
}
}
// The tmp117 is just a temperature sensor, not a speed sensor.
assert_eq!(self.speed_sensors.len(), 0);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that, after this PR merges, if we were to add a temperature sensor to the TMP117 in the PSC config, we would also be able to fix #2547?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you'd need to update the PSC's base.toml, add an entry to the SENSORS array in sensor-polling, and test that the ComponentDetails part works (not just humility sensors).

Comment thread drv/i2c-devices/src/mwocp6x/mod.rs
@evan-oxide evan-oxide requested a review from hawkw June 8, 2026 19:59
Comment thread task/control-plane-agent/build.rs
Comment thread app/observer/base.toml
Comment thread app/observer/base.toml
Comment thread app/psc/rev-c-dev.toml
Comment thread drv/i2c-devices/src/mwocp6x/mod.rs Outdated
Comment thread drv/i2c-devices/src/mwocp6x/mod.rs
Comment on lines +490 to +491
// ACAN-157 doesn't specify what page this is on.
// Assume it's on page 0, as it is on the better-documented mwocp68.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on experience with other PMBus devices, I would guess that it's just that the value was the same on both pages; I'd expect that it should be possible to see a fault on one rail and not the other. I think we could probably test this by setting a warning threshold on one page (say VIN_WARN_LIMIT or something) but not the other, and seeing if the warning bit gets set on one page and not the other. I could be wrong about this, but I think it's worth investigating, because the current behavior of both MWOCPx drivers always sending page 0 feels sketchy to me.

I'm fine with merging this as-is and investigating as a follow-up though.

Comment thread drv/i2c-devices/src/lib.rs Outdated
Comment thread drv/i2c-devices/Cargo.toml Outdated
Comment thread drv/psc-seq-server/src/main.rs Outdated
Comment on lines +289 to +361
@@ -308,6 +321,7 @@ const ALL_PSU_PWR_OK_PINS: sys_api::PinSet =

// Our notification configuration system doesn't have any concept of arrays, so,
// collect its predefined masks into convenient arrays.
#[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
const PSU_PWR_OK_NOTIF: [u32; PSU_COUNT] = [
notifications::PSU_PWR_OK_1_MASK,
notifications::PSU_PWR_OK_2_MASK,
@@ -316,8 +330,18 @@ const PSU_PWR_OK_NOTIF: [u32; PSU_COUNT] = [
notifications::PSU_PWR_OK_5_MASK,
notifications::PSU_PWR_OK_6_MASK,
];
#[cfg(target_board = "observer-a")]
const PSU_PWR_OK_NOTIF: [u32; PSU_COUNT] = [
notifications::PSU_PWR_OK_0_MASK,
notifications::PSU_PWR_OK_1_MASK,
notifications::PSU_PWR_OK_2_MASK,
notifications::PSU_PWR_OK_3_MASK,
notifications::PSU_PWR_OK_4_MASK,
notifications::PSU_PWR_OK_5_MASK,
];

/// In order to get the PMBus devices by PSU index, we need a little lookup table guy.
#[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
const PSU_PMBUS_DEVS: [fn(TaskId) -> (I2cDevice, u8); PSU_COUNT] = [
i2c_config::pmbus::v54_psu0,
i2c_config::pmbus::v54_psu1,
@@ -326,6 +350,15 @@ const PSU_PMBUS_DEVS: [fn(TaskId) -> (I2cDevice, u8); PSU_COUNT] = [
i2c_config::pmbus::v54_psu4,
i2c_config::pmbus::v54_psu5,
];
#[cfg(target_board = "observer-a")]
const PSU_PMBUS_DEVS: [fn(TaskId) -> (I2cDevice, u8); PSU_COUNT] = [
i2c_config::pmbus::v50_main_psu0,
i2c_config::pmbus::v50_main_psu1,
i2c_config::pmbus::v50_main_psu2,
i2c_config::pmbus::v50_main_psu3,
i2c_config::pmbus::v50_main_psu4,
i2c_config::pmbus::v50_main_psu5,
];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge deal, but a fairly common practice in tasks that support multiple slightly different boards with mostly common code is to put all of the board specific stuff in a BSP (Board Support Package) submodule which is #[cfg(...)] flagged for the appropriate board, rather than having to put a bunch of #[cfg(...)] attributes on every definition. so something like this:

#[cfg(target_board = "observer-a")]
mod bsp {
    pub(crate) const PSU_ENABLE_L_PORT = ...;
    pub(crate) const PSU_ENABLE_L_PINS = ...;
    pub(crate) const ALL_PSU_ENABLE_L_PINS = ...;
    pub(crate) const PSU_PWR_OK_NOTIF = ...;
    pub(crate) const PSU_PMBUS_DEVS = ...;
}
#[cfg(any(target_board = "psc-b", target_board = "psc-c"))]
mod bsp {
    pub(crate) const PSU_ENABLE_L_PORT = ...;
    pub(crate) const PSU_ENABLE_L_PINS = ...;
    pub(crate) const ALL_PSU_ENABLE_L_PINS = ...;
    pub(crate) const PSU_PWR_OK_NOTIF = ...;
    pub(crate) const PSU_PMBUS_DEVS = ...;
}
use self::bsp::*;

this ends up being a little neater most of the time, IMO.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pattern is to have an explicit bsp module, then change its path depending on board, e.g.

#[cfg_attr(
    any(target_board = "minibar-a", target_board = "minibar-b"),
    path = "bsp/minibar.rs"
)]
#[cfg_attr(
    any(target_board = "cosmo-a", target_board = "cosmo-b"),
    path = "bsp/cosmo_ab.rs"
)]
mod bsp;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a bsp module, but it got a bit messy and might need some tweaks.

  1. I duplicated the PRESENT and OK pin arrays in both files even though they happen to be the same on both boards. I think that's cleaner than interleaving shared and board-specific arrays, but maybe not.
  2. The bsp files need to use the PSU_COUNT constant in their array definitions. I had them each define their own constant then I assert that's its equal to 6. I guess I could define a shared PSU_COUNT somewhere else instead.
  3. The bsp files need to use the codegen'ed notifications and i2c_config modules, so I stuck include!(concat!(env!("OUT_DIR"), "/notifications.rs")); etc in both of them. Not sure how weird that is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I duplicated the PRESENT and OK pin arrays in both files even though they happen to be the same on both boards. I think that's cleaner than interleaving shared and board-specific arrays, but maybe not.

I think this is fine; just because it happens to be the same on both boards doesn't mean it needs to be.

  1. The bsp files need to use the PSU_COUNT constant in their array definitions. I had them each define their own constant then I assert that's its equal to 6. I guess I could define a shared PSU_COUNT somewhere else instead.

Hmm, I was going to say that we should actually not have to do this and that all the code should basically just work with arrays of any length, but then I remembered about the Slot enum which has six slot numbers. That kinda makes me think that it would be nicer to just define PSU_COUNT in main.rs and have the BSPs use super::PSU_COUNT, rather than asserting that every BSP uses 6, which feels a bit convoltued.

  1. The bsp files need to use the codegen'ed notifications and i2c_config modules, so I stuck include!(concat!(env!("OUT_DIR"), "/notifications.rs")); etc in both of them. Not sure how weird that is.

This seems fine to me. You could also leave them in main.rs and have the BSPs use super::i2c_config and super::notifications, but since (at a glance) they are now basically just referenced in the BSP modules, making them belong to the BSP is also totally fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I moved PSU_COUNT to main. I've tended to think of use super:: as a code-smell but I don't think that's actually true, and it's definitely an improvement in this situation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the codegen out too because main was using bsp::notifications::TIMER_MASK.

Comment thread drv/psc-seq-server/src/main.rs Outdated
Comment thread task/sensor-polling/src/main.rs Outdated
@evan-oxide evan-oxide requested a review from hawkw June 10, 2026 18:55
@evan-oxide evan-oxide requested a review from mkeeter June 10, 2026 19:05

@hawkw hawkw left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing. After that, I think this will be all good!

Comment thread drv/psc-seq-server/src/main.rs Outdated
Comment on lines +284 to +285
// Currently the psc-seq-server only supports boards with 6 PSU slots.
const _: () = assert!(bsp::PSU_COUNT == 6);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, I think I'd really prefer we just define PSU_COUNT in main and make all the BSPs reference it as super::PSU_COUNT. Making the BSP define the constant and then asserting that it defined the right thing feels a bit goofy.

(also, the place where we initialize a [Slot; bsp::PSU_COUNT] to a 6-element array literal also serves as a slightly more implicit version of this assertion...but again, I think we should just not be asserting it)

@hawkw hawkw left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! thank you for all your hard work on this!

@evan-oxide evan-oxide enabled auto-merge (squash) June 10, 2026 20:04
@evan-oxide evan-oxide merged commit 3dc0535 into master Jun 10, 2026
190 checks passed
@evan-oxide evan-oxide deleted the evan/observer-pmbus branch June 10, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Observer PSC + 5.5 kW fun.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants