Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device inventory over IPCC #1469

Merged
merged 50 commits into from
Aug 4, 2023
Merged

Device inventory over IPCC #1469

merged 50 commits into from
Aug 4, 2023

Conversation

mkeeter
Copy link
Collaborator

@mkeeter mkeeter commented Jul 27, 2023

This PR adds device inventory to the IPCC implementation in host-sp-comms

See https://github.com/oxidecomputer/rfd/pull/629 and https://github.com/oxidecomputer/rfd/pull/628 for the relevant RFD changes.

It includes most of the requested devices from #1264, with a few limitations:

  • RoT info is not yet plumbed through
  • A few devices are not yet implemented
  • Inventory data only includes device identity, and does not include SensorId's. Because the reply is extensible, we can add them in future updates.

There's one infrastructure change: we split the local-vpd driver into two pieces:

  • local-vpd continues to support reading the local VPD EEPROM, but it now calls an implementation in...
  • oxide-vpd, which supports reading any VPD EEPROM (in the Oxide format)

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! It will be a boon to get this in. I know it will make @rmustacc happy.

One note is that a bunch of function names in header comments need changing to the new names. I didn't bother tagging them all.

/// ```
/// (where `TAG*` are example tags)
///
/// `read_config` should be called with a tag nested under `FRU0` (e.g. `TAG1`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this comment was copied a few times, but the name of the function (read_config) was not updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, done in baab9d5

let mut out = V::new_zeroed();
let n = read_config_nested_from_into(eeprom, tags, out.as_bytes_mut())?;

// `read_config_into()` fails if the data is too large for `out`, but will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name needs updating

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also done in baab9d5

e => e,
})?;

// Iterate over our tag list, finding inner chunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m a little confused here. Won't this loop only end up storing the last fount chunk in chunk and overwriting the previously found matching chunks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's intentional. We're looking for a single chunk, with a nested path, e.g.

[
    ("FOO", [ [...] ]),
    ("TAG1", [
        ("TAG2", [ [...] ]),
    ])
]

If we're looking for ["TAG1", "TAG2"], then we'll first select the chunk

    ("TAG1", [
        ("TAG2", [ [...] ]),
    ])

and then

        ("TAG2", [ [...] ]),

on repeated iterations through the loop.

#[derive(
Debug, Clone, Copy, PartialEq, Eq, Deserialize, Serialize, SerializedSize,
)]
pub enum InventoryDataResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why not make this a Result<(), InventoryDataError>? MSG size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair question; it's for ease of serialization / deserialization.

See RFD 316 for details, but in short, having this be a single enum that includes an Ok variant makes it constant size, instead of being either [0] (for Ok(())) or [1, i] (for Err(..)).

) where
F: FnOnce() -> Result<InventoryData, InventoryDataResult>,
{
let mut name_array = [0u8; 32];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a memcpy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda, it's a memcpy of the shorter of the two lengths. Simplified in 7316fe0

dev: usize,
out: LenLimit<Leased<idol_runtime::W, [u8]>, 512>,
) -> Result<(), RequestError<Infallible>> {
if out.len() != spd::MAX_SIZE {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could out.len() be safely greater than spd::MAX_SIZE ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be technically safe, but would suggest an error on the part of the callers; it would be weird to copy data into a subsection of the lease.

Comment on lines 208 to 209
max-sizes = {flash = 65536, ram = 32768}
stacksize = 6000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big jump. Do you know where the extra size is coming from?

Copy link
Collaborator Author

@mkeeter mkeeter Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same reaction. I think it's that the InventoryData type is ~512 bytes (since it holds a full DIMM SPD), and there are a few stack frames that aren't being inlined since we're using lambda functions here.

I knocked about 2K off it by allocating the InventoryData once in the outer frame, then using references to it in the lambda functions. It currently stands at

situation stack depth
baseline 864
reading DIMM 1672
reading from packrat 1776
reading BRM491 1768
reading AT24 barcode 2712 (!)

I'm mystified as to why reading the AT24 barcode adds a whole 1K of stack depth. I'm going to look at read_at24csw080_id a little more and see if I can figure it out, but more eyes are welcome here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha: read_at24csw080 wasn't being inlined, so we had one stack frame in perform_inventory_lookup and another in read_at24csw080_id, costing us an extra 512 bytes. Explicitly allocating the data in the caller and passing it by reference brings us down to 2200, which seems reasonable.

Copy link
Contributor

@rmustacc rmustacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions on some of the types as I've stated to plumb up specifics around them.

Comment on lines 242 to 245
Max5970 {
voltage: [SensorId; 2],
current: [SensorId; 2],
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd help for me to talk through how we're planning to use these a bit. If the answer is that's just future work that's fine and I won't rig them up. I think the main question is are we going to basically assign semantic meaning to a given index in the structure and that the various properties of the actual sensor, like the units, etc. are going to be fixed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to punt that to a future PR; this is a unused leftover variant that I should probably delete before merging!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Fan subassembly identity
FanIdentity {
identity: oxide_barcode::VpdIdentity,
fans: [oxide_barcode::VpdIdentity; 4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the max of 4 entries come from? Right now we only have up to 3 fans in the tray assembly. I don't think we'd reuse this for Sidecar exactly unless we had an unsigned int of some kind that said how many entries were valid because we'd only have a single one that'd be valid since each fan assembly is disjoint there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the VPD data that you gave me in chat, which has four barcodes under SASY:

[
    ("FRU0", [
        ("BARC", [
            "0XV1:9910000084:002:BRM42220085",
        ]),
        ("SASY", [
            ("BARC", [
                "0XV1:9130000022:001:BRM39220046",
            ]),
            ("BARC", [
                "0XV1:9910000094:001:BRM42220005",
            ]),
            ("BARC", [
                "0XV1:9910000094:001:BRM42220091",
            ]),
            ("BARC", [
                "0XV1:9910000094:001:BRM42220072",
            ]),
        ]),
    ]),
]

Looking more closely, I'm guessing three of them are fans (9910000094), and one is something else (9130000022)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, 913-22 refers to the vpd board's identity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I made changes in 52d463c to include identity, vpd_identity, and 3 items in the fans array.

/// STM32H7 UID information
Stm32H7 {
/// 96-bit unique identifier
uid: [u32; 3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a 96-bit ID, but we're getting it as 3 32-bit little endian integers. Do we know how ST would re-encode the ID to get the 96-bit ID they expect? Mostly I've just been bit by things say this is 12 byte array versus as u32s and endianness.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the chip, the UID is stored in three adjacent 32-bit registers (since that's our native word size), so this is a direct translation. The registers / array items represent UID[31:0], UID[63:32], and UID[95:64] respectively; maybe I should put that in a comment?

DimmSpd([u8; 512]),

/// Device or board identity, typically stored in a VPD EEPROM
VpdIdentity(oxide_barcode::VpdIdentity),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure we're on the same page here, if for some reason the sizing of the VPD Identity were ever to change beyond the 11/11 characters for the MPN/SN, we would introduce a newer type here. This does make me wonder if we should be passing something here where we use either a larger size than the others and zero-terminate this or have an explicit integer encoding lengths so we're ok to change this in the future.

In particular, the existing MODEL_LEN and SERIAL_LEN which are used by the Identity structure are 51 characters (50 + \0), so given that it is already future proofed, I wonder if this should be consistent or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct that moving away from oxide_barcode::VpdIdentity would require making a new message type for InventoryData::VpdIdentity and InventoryData::FanIdentity.

Do you think it would make sense to use host_sp_messages::Identity (which stores the 51-byte values) instead here? It's hard to imagine that encoding becoming defunct.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @mkeeter

@mkeeter mkeeter marked this pull request as ready for review August 3, 2023 19:10
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! My only concern is the get_inventory idol OP - rest of my comments are minor / nitpicks.

/// `read_config` should be called with a tag nested under `FRU0` (e.g. `TAG1`
/// in the example above). It will copy the raw byte array (shown as
/// `[...]`) into `out`, returning the number of bytes written.
/// Calls into [`read_config_from_into`]; see details in that docstring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - I don't think this docstring link will work without an explicit module:

Suggested change
/// Calls into [`read_config_from_into`]; see details in that docstring
/// Calls into [`drv_oxide_vpd::read_config_from_into`]; see details in that docstring

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 40bb753

@@ -175,6 +186,181 @@ pub enum KeyLookupResult {
MaxResponseLenTooShort,
}

/// Results for an inventory data request
///
/// These **cannot be reordered**; the host and SP must agree on them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding a unit test for the serialized values? E.g., host_to_sp_command_values() in this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6cf4d21


/// Data payload for an inventory data request
///
/// These **cannot be reordered**; the host and SP must agree on them. New
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto adding a unit test for the serialized values, with apologies because constructing values for each variant on this one looks pretty annoying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6cf4d21 (and yes, it was annoying!)

let data = InventoryData::Stm32H7 {
uid,
dbgmcu_rev_id: 0, // TODO
dbgmcu_dev_id: 0, // TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking - did you want to address these TODOs before merging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done in 84a157c

Comment on lines 884 to 887
hubpack::serialize(buf, &REPLY).unwrap_lite()
},
);
core::mem::size_of_val(&REPLY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but looks a little weird at first glance: we're using hubpack to serialize, but then returning mem::size_of_val for the number of bytes serialized. Those are definitely the same, but could we add an assertion on the value returned by hubpack::serialize() to document that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I switched to matching existing behavior in f5f02fc

let mut data = InventoryData::At24csw08xSerial([0u8; 16]);
self.read_at24csw080_id(sequence, &name, f, &mut data)
}
// TODO: Sharkfin HSC?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do you want to address this before merging" check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR, no; the HSCs will be relevant for the future PR that adds SensorId to all of our types

let data = InventoryData::Stm32H7 {
uid,
dbgmcu_rev_id: 0, // TODO
dbgmcu_dev_id: 0, // TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do you want to address this before merging" check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 84a157c

/// Look up a device in our inventory, by index
///
/// Indexes are assigned arbitrarily and may change freely with SP
/// revisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently require a host reboot any time we're updating the SP, so this seems totally fine. However, if we ever want to support updating the SP without rebooting the host, it could be problematic: if the host caches the set of device indices and uses them to (e.g.) query sensors at runtime, and the SP is updates and reorders the indices, that host cache will now be out of date. Chatted about this briefly - to support that, we probably want to add something that will allow the SP to tell if a host's request for sensor data matches its current sensor list, and return a "your cache is out of date / I've been updated" error if it does not. (To be clear, this is a problem for future us: "updating the SP requires rebooting the host" is fine!)

let InventoryData::At24csw08xSerial(id) = data
else { unreachable!(); };
for (i, b) in id.iter_mut().enumerate() {
// TODO: add an API to make this a single IPC call?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do you want to address this before merging" check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine as-is

ResponseCode::NoDevice,
)) =>
{
// TODO: ringbuf logging here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do you want to address this before merging" check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, deleted the comment

@mkeeter mkeeter enabled auto-merge (squash) August 4, 2023 20:28
@mkeeter mkeeter merged commit b7123a9 into master Aug 4, 2023
66 checks passed
@mkeeter mkeeter deleted the ipcc-device-id branch August 4, 2023 20:40
@mkeeter mkeeter mentioned this pull request Aug 30, 2023
mkeeter added a commit that referenced this pull request Sep 12, 2023
This PR extends the inventory IPCC messages (#1469) with trailing `SensorId`
values, so that the host can get sensor data over MGS (#1505).

To simplify and deduplicate the code, I modified the I2C code-gen in two ways.

The first change is minor: I added devices and sensors by refdes (if present),
e.g. `fn at24csw080_u615(task: TaskId) -> I2cDevice` and
`const TMP451_U491_TEMPERATURE_SENSOR: SensorId`.

The second change is a bit larger: I added a per-device-type `struct
Sensors_{device}` which encodes sensors on a per-device basis, i.e.

```rust
#[allow(non_camel_case_types)]
pub struct Sensors_tps546b24a {
    pub temperature: SensorId,
    pub current: SensorId,
    pub voltage: SensorId,
}
#[allow(dead_code)]
pub const TPS546B24A_V3P3_SP_A2_SENSORS: Sensors_tps546b24a = Sensors_tps546b24a {
    temperature: SensorId(54),
    voltage: SensorId(56),
    current: SensorId(55),
};
#[allow(dead_code)]
pub const TPS546B24A_U522_SENSORS: Sensors_tps546b24a = Sensors_tps546b24a {
    voltage: SensorId(56),
    current: SensorId(55),
    temperature: SensorId(54),
};
```
(note that we produce `struct`'s by name and refdes, containing equivalent data)

Being able to get the device and sensors by refdes makes it simpler to load
inventory data, since we already need the refdes as our canonical key; I wrote a
small macro (`by_refdes`) to simplify this further.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants