-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device inventory over IPCC #1469
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! It will be a boon to get this in. I know it will make @rmustacc happy.
One note is that a bunch of function names in header comments need changing to the new names. I didn't bother tagging them all.
drv/oxide-vpd/src/lib.rs
Outdated
/// ``` | ||
/// (where `TAG*` are example tags) | ||
/// | ||
/// `read_config` should be called with a tag nested under `FRU0` (e.g. `TAG1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment was copied a few times, but the name of the function (read_config
) was not updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, done in baab9d5
drv/oxide-vpd/src/lib.rs
Outdated
let mut out = V::new_zeroed(); | ||
let n = read_config_nested_from_into(eeprom, tags, out.as_bytes_mut())?; | ||
|
||
// `read_config_into()` fails if the data is too large for `out`, but will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function name needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also done in baab9d5
e => e, | ||
})?; | ||
|
||
// Iterate over our tag list, finding inner chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I"m a little confused here. Won't this loop only end up storing the last fount chunk in chunk
and overwriting the previously found matching chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's intentional. We're looking for a single chunk, with a nested path, e.g.
[
("FOO", [ [...] ]),
("TAG1", [
("TAG2", [ [...] ]),
])
]
If we're looking for ["TAG1", "TAG2"]
, then we'll first select the chunk
("TAG1", [
("TAG2", [ [...] ]),
])
and then
("TAG2", [ [...] ]),
on repeated iterations through the loop.
#[derive( | ||
Debug, Clone, Copy, PartialEq, Eq, Deserialize, Serialize, SerializedSize, | ||
)] | ||
pub enum InventoryDataResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why not make this a Result<(), InventoryDataError>
? MSG size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair question; it's for ease of serialization / deserialization.
See RFD 316 for details, but in short, having this be a single enum
that includes an Ok
variant makes it constant size, instead of being either [0]
(for Ok(())
) or [1, i]
(for Err(..)
).
) where | ||
F: FnOnce() -> Result<InventoryData, InventoryDataResult>, | ||
{ | ||
let mut name_array = [0u8; 32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a memcpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda, it's a memcpy
of the shorter of the two lengths. Simplified in 7316fe0
dev: usize, | ||
out: LenLimit<Leased<idol_runtime::W, [u8]>, 512>, | ||
) -> Result<(), RequestError<Infallible>> { | ||
if out.len() != spd::MAX_SIZE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could out.len()
be safely greater than spd::MAX_SIZE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be technically safe, but would suggest an error on the part of the callers; it would be weird to copy data into a subsection of the lease.
app/gimlet/rev-d.toml
Outdated
max-sizes = {flash = 65536, ram = 32768} | ||
stacksize = 6000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big jump. Do you know where the extra size is coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had the same reaction. I think it's that the InventoryData
type is ~512 bytes (since it holds a full DIMM SPD), and there are a few stack frames that aren't being inlined since we're using lambda functions here.
I knocked about 2K off it by allocating the InventoryData
once in the outer frame, then using references to it in the lambda functions. It currently stands at
situation | stack depth |
---|---|
baseline | 864 |
reading DIMM | 1672 |
reading from packrat | 1776 |
reading BRM491 | 1768 |
reading AT24 barcode | 2712 (!) |
I'm mystified as to why reading the AT24 barcode adds a whole 1K of stack depth. I'm going to look at read_at24csw080_id
a little more and see if I can figure it out, but more eyes are welcome here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha: read_at24csw080
wasn't being inlined, so we had one stack frame in perform_inventory_lookup
and another in read_at24csw080_id
, costing us an extra 512 bytes. Explicitly allocating the data in the caller and passing it by reference brings us down to 2200, which seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions on some of the types as I've stated to plumb up specifics around them.
lib/host-sp-messages/src/lib.rs
Outdated
Max5970 { | ||
voltage: [SensorId; 2], | ||
current: [SensorId; 2], | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd help for me to talk through how we're planning to use these a bit. If the answer is that's just future work that's fine and I won't rig them up. I think the main question is are we going to basically assign semantic meaning to a given index in the structure and that the various properties of the actual sensor, like the units, etc. are going to be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to punt that to a future PR; this is a unused leftover variant that I should probably delete before merging!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/host-sp-messages/src/lib.rs
Outdated
/// Fan subassembly identity | ||
FanIdentity { | ||
identity: oxide_barcode::VpdIdentity, | ||
fans: [oxide_barcode::VpdIdentity; 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the max of 4 entries come from? Right now we only have up to 3 fans in the tray assembly. I don't think we'd reuse this for Sidecar exactly unless we had an unsigned int of some kind that said how many entries were valid because we'd only have a single one that'd be valid since each fan assembly is disjoint there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the VPD data that you gave me in chat, which has four barcodes under SASY
:
[
("FRU0", [
("BARC", [
"0XV1:9910000084:002:BRM42220085",
]),
("SASY", [
("BARC", [
"0XV1:9130000022:001:BRM39220046",
]),
("BARC", [
"0XV1:9910000094:001:BRM42220005",
]),
("BARC", [
"0XV1:9910000094:001:BRM42220091",
]),
("BARC", [
"0XV1:9910000094:001:BRM42220072",
]),
]),
]),
]
Looking more closely, I'm guessing three of them are fans (9910000094), and one is something else (9130000022)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, 913-22 refers to the vpd board's identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I made changes in 52d463c to include identity
, vpd_identity
, and 3 items in the fans
array.
/// STM32H7 UID information | ||
Stm32H7 { | ||
/// 96-bit unique identifier | ||
uid: [u32; 3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a 96-bit ID, but we're getting it as 3 32-bit little endian integers. Do we know how ST would re-encode the ID to get the 96-bit ID they expect? Mostly I've just been bit by things say this is 12 byte array versus as u32s and endianness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the chip, the UID is stored in three adjacent 32-bit registers (since that's our native word size), so this is a direct translation. The registers / array items represent UID[31:0]
, UID[63:32]
, and UID[95:64]
respectively; maybe I should put that in a comment?
lib/host-sp-messages/src/lib.rs
Outdated
DimmSpd([u8; 512]), | ||
|
||
/// Device or board identity, typically stored in a VPD EEPROM | ||
VpdIdentity(oxide_barcode::VpdIdentity), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure we're on the same page here, if for some reason the sizing of the VPD Identity were ever to change beyond the 11/11 characters for the MPN/SN, we would introduce a newer type here. This does make me wonder if we should be passing something here where we use either a larger size than the others and zero-terminate this or have an explicit integer encoding lengths so we're ok to change this in the future.
In particular, the existing MODEL_LEN
and SERIAL_LEN
which are used by the Identity
structure are 51 characters (50 + \0
), so given that it is already future proofed, I wonder if this should be consistent or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's correct that moving away from oxide_barcode::VpdIdentity
would require making a new message type for InventoryData::VpdIdentity
and InventoryData::FanIdentity
.
Do you think it would make sense to use host_sp_messages::Identity
(which stores the 51-byte values) instead here? It's hard to imagine that encoding becoming defunct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes @mkeeter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! My only concern is the get_inventory
idol OP - rest of my comments are minor / nitpicks.
drv/local-vpd/src/lib.rs
Outdated
/// `read_config` should be called with a tag nested under `FRU0` (e.g. `TAG1` | ||
/// in the example above). It will copy the raw byte array (shown as | ||
/// `[...]`) into `out`, returning the number of bytes written. | ||
/// Calls into [`read_config_from_into`]; see details in that docstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - I don't think this docstring link will work without an explicit module:
/// Calls into [`read_config_from_into`]; see details in that docstring | |
/// Calls into [`drv_oxide_vpd::read_config_from_into`]; see details in that docstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 40bb753
@@ -175,6 +186,181 @@ pub enum KeyLookupResult { | |||
MaxResponseLenTooShort, | |||
} | |||
|
|||
/// Results for an inventory data request | |||
/// | |||
/// These **cannot be reordered**; the host and SP must agree on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding a unit test for the serialized values? E.g., host_to_sp_command_values()
in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 6cf4d21
|
||
/// Data payload for an inventory data request | ||
/// | ||
/// These **cannot be reordered**; the host and SP must agree on them. New |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto adding a unit test for the serialized values, with apologies because constructing values for each variant on this one looks pretty annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 6cf4d21 (and yes, it was annoying!)
let data = InventoryData::Stm32H7 { | ||
uid, | ||
dbgmcu_rev_id: 0, // TODO | ||
dbgmcu_dev_id: 0, // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double-checking - did you want to address these TODO
s before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done in 84a157c
task/host-sp-comms/src/main.rs
Outdated
hubpack::serialize(buf, &REPLY).unwrap_lite() | ||
}, | ||
); | ||
core::mem::size_of_val(&REPLY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, but looks a little weird at first glance: we're using hubpack to serialize, but then returning mem::size_of_val
for the number of bytes serialized. Those are definitely the same, but could we add an assertion on the value returned by hubpack::serialize()
to document that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I switched to matching existing behavior in f5f02fc
let mut data = InventoryData::At24csw08xSerial([0u8; 16]); | ||
self.read_at24csw080_id(sequence, &name, f, &mut data) | ||
} | ||
// TODO: Sharkfin HSC? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do you want to address this before merging" check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR, no; the HSCs will be relevant for the future PR that adds SensorId
to all of our types
let data = InventoryData::Stm32H7 { | ||
uid, | ||
dbgmcu_rev_id: 0, // TODO | ||
dbgmcu_dev_id: 0, // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do you want to address this before merging" check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 84a157c
/// Look up a device in our inventory, by index | ||
/// | ||
/// Indexes are assigned arbitrarily and may change freely with SP | ||
/// revisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently require a host reboot any time we're updating the SP, so this seems totally fine. However, if we ever want to support updating the SP without rebooting the host, it could be problematic: if the host caches the set of device indices and uses them to (e.g.) query sensors at runtime, and the SP is updates and reorders the indices, that host cache will now be out of date. Chatted about this briefly - to support that, we probably want to add something that will allow the SP to tell if a host's request for sensor data matches its current sensor list, and return a "your cache is out of date / I've been updated" error if it does not. (To be clear, this is a problem for future us: "updating the SP requires rebooting the host" is fine!)
let InventoryData::At24csw08xSerial(id) = data | ||
else { unreachable!(); }; | ||
for (i, b) in id.iter_mut().enumerate() { | ||
// TODO: add an API to make this a single IPC call? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do you want to address this before merging" check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think it's fine as-is
ResponseCode::NoDevice, | ||
)) => | ||
{ | ||
// TODO: ringbuf logging here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"do you want to address this before merging" check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, deleted the comment
This PR extends the inventory IPCC messages (#1469) with trailing `SensorId` values, so that the host can get sensor data over MGS (#1505). To simplify and deduplicate the code, I modified the I2C code-gen in two ways. The first change is minor: I added devices and sensors by refdes (if present), e.g. `fn at24csw080_u615(task: TaskId) -> I2cDevice` and `const TMP451_U491_TEMPERATURE_SENSOR: SensorId`. The second change is a bit larger: I added a per-device-type `struct Sensors_{device}` which encodes sensors on a per-device basis, i.e. ```rust #[allow(non_camel_case_types)] pub struct Sensors_tps546b24a { pub temperature: SensorId, pub current: SensorId, pub voltage: SensorId, } #[allow(dead_code)] pub const TPS546B24A_V3P3_SP_A2_SENSORS: Sensors_tps546b24a = Sensors_tps546b24a { temperature: SensorId(54), voltage: SensorId(56), current: SensorId(55), }; #[allow(dead_code)] pub const TPS546B24A_U522_SENSORS: Sensors_tps546b24a = Sensors_tps546b24a { voltage: SensorId(56), current: SensorId(55), temperature: SensorId(54), }; ``` (note that we produce `struct`'s by name and refdes, containing equivalent data) Being able to get the device and sensors by refdes makes it simpler to load inventory data, since we already need the refdes as our canonical key; I wrote a small macro (`by_refdes`) to simplify this further.
This PR adds device inventory to the IPCC implementation in
host-sp-comms
See https://github.com/oxidecomputer/rfd/pull/629 and https://github.com/oxidecomputer/rfd/pull/628 for the relevant RFD changes.
It includes most of the requested devices from #1264, with a few limitations:
SensorId
's. Because the reply is extensible, we can add them in future updates.There's one infrastructure change: we split the
local-vpd
driver into two pieces:local-vpd
continues to support reading the local VPD EEPROM, but it now calls an implementation in...oxide-vpd
, which supports reading any VPD EEPROM (in the Oxide format)