Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C: Fix broken timeout handler (!) #212

Open
omeh-a opened this issue Aug 19, 2024 · 8 comments · May be fixed by #241
Open

I2C: Fix broken timeout handler (!) #212

omeh-a opened this issue Aug 19, 2024 · 8 comments · May be fixed by #241
Assignees
Labels
bug Something isn't working drivers Issues pertaining to driver code for a device class meson Devices on a Meson platform - e.g. ODROID C4 protocol Issues pertaining to protocols for interfacing with drivers

Comments

@omeh-a
Copy link
Member

omeh-a commented Aug 19, 2024

Currently, the timeout handler in the Meson I2C driver is unable to function properly. It was firing constantly when operating demo code and has been disabled as a result. This is a serious problem however - in the event that a device ACKs a transaction and stops responding, the entire I2C driver will stall forever. This compromises the trustworthiness of the entire driver.

This issue proposes fixing this handler such that:
a. It is able to function properly for all our demos - inc. PN532
b. It doesn't spuriously fire.

This issue might be related to the current hardcoded clock dividers used for I2C. If this is the case, this issue actually requires a clock driver to be implemented in order for correct clock setting.

@omeh-a omeh-a added bug Something isn't working drivers Issues pertaining to driver code for a device class meson Devices on a Meson platform - e.g. ODROID C4 protocol Issues pertaining to protocols for interfacing with drivers labels Aug 19, 2024
@Ivan-Velickovic
Copy link
Collaborator

Duplicate of #138.

@Ivan-Velickovic
Copy link
Collaborator

Although not sure when we'll get to this one so probably better to have it as a separate issue.

@omeh-a omeh-a self-assigned this Aug 29, 2024
@omeh-a
Copy link
Member Author

omeh-a commented Aug 29, 2024

@Ivan-Velickovic Have we got a preferred way of handling real errors? As of right now, the only mechanism we have for raising the alarm to the virtualiser is to notify and enqueue a request. We ought to have a discussion about how we want to deal with issues arising in terms of handling.

As a general case though, I think it might be smartest to have a retry system which will drop the request once exhausted. E.g. retry 5 times, drop request and return to virt with error if no successful ops between those retries.

@Ivan-Velickovic
Copy link
Collaborator

I'm not sure what you mean, an error from which side? The client or the driver?

@Ivan-Velickovic
Copy link
Collaborator

For real timeout errors, which I think is what you're referring to, we should just forward the error to the client and leave it up to them to decide what kind of retry policy they want. In reality they'll be some default policy in the client-level library.

We should avoid any policy in the driver whenever possible.

@omeh-a
Copy link
Member Author

omeh-a commented Sep 6, 2024

I meant on the driver side, but yes I think it's reasonable to just try and leave it in a state where the client can manage the rest.

In other news, upon closer inspection this issue is actually much worse than I anticipated. Using our PN532 example shows that the IRQs fire continuously whenever the driver is not operating, giving us >1000 timeouts between every PN532
read attempt. I know this is a PN532 specific thing, but pragmatically there really is nothing we can do for it other than neglect the timeouts because they are all simply artefacts of the NACKS from issue #235. If we do anything like discarding the request upon a timeout, the PN532 simply cannot operate since all the frames transmitted are too small for a single R/W buffer on the I2C hardware.

By constrast, there are never timeouts for the DS3231 example. Bearing this in mind, would it be inappropriate to consider adding a no-timeout flag to requests? This way we can handle timeouts for all other devices that aren't borked and don't need to reschedule work on the PN532 immediately.

Perhaps this is the best way to handle this for now, because having a timeout handler is needed for any kind of prospective complex application but any timeout handler of any type would de-facto break the PN532 which is needed immediately.

Alternatively we can just neglect this until #235 is resolved. What do you think? Pragmatically the only actual problem here is the PN532, writing the timeout is trivial otherwise unless there is some other problem we don't know about. @wom-bat @Ivan-Velickovic

@Ivan-Velickovic
Copy link
Collaborator

You mentioned that the PN532 is using I2C incorrectly? Can you describe why? I find it surprising since it seems like a fairly popular device.

@omeh-a
Copy link
Member Author

omeh-a commented Sep 13, 2024

You mentioned that the PN532 is using I2C incorrectly? Can you describe why? I find it surprising since it seems like a fairly popular device.

See #235 - it creates a constant stream of NACK conditions that seem to be related to the I2C part of the PN532's microcontroller not properly halting the bus when it's busy doing NFC operations. It happens both with our driver and off-the-shelf ones I have tested with Arduino

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working drivers Issues pertaining to driver code for a device class meson Devices on a Meson platform - e.g. ODROID C4 protocol Issues pertaining to protocols for interfacing with drivers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants