Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LPSPI with word packing, futures #157

Merged
merged 1 commit into from
May 27, 2024

Conversation

mciantyre
Copy link
Member

@mciantyre mciantyre commented Apr 30, 2024

Instead of using LPSPI continuous transactions, we pack the user's data
into u32 words. The primitives for translating the user's data to / from
the data FIFOs should help us as we consider async LPSPI drivers. And,
in order to implement embedded-hal 1.0 traits, we'll need something like
the dummy transmit / receive helpers, since we need to handle differing
transmit / receive buffer sizes. The included unit tests don't trigger
an error in Miri, and they try to simulate how we'd use the primitives
in firmware. (This is an "it's not obviously wrong" test, not an "it's
correct" test; help me review here.)

The commit introduces spinning futures into the LPSPI driver. By
combining and spinning on these futures, we can realize in-place
transfers, read-only transactions, write-only transactions, etc.

We no longer return the Busy error; we'll wait for transmit FIFO space.
We also never return the NoData error, instead returning success when
there's no I/O to do. Since this commit is a non-breaking change, the
two errors are still available in the error enum. I'll remove them
later.

I'm moving the blocking SPI example into RTIC and rewriting the driver
test. The tests demonstrate overlapping writes, writes with flushes, and
in-place transfers with a physical loopback. There's also tests that
show how word sizes and bit orders interact. I'd appreciate if folks
could test these changes in their system, since it affects how the
embedded-hal implementations behave. I'm only testing this commit on a
1170EVK with the new example.

Copy link
Contributor

@Florob Florob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this on my custom i.MX-RT 1020 board. It breaks the SPI display I'm using, which used to work (with a small workaround for the BUSY issue). Unfortunately I can't sniff that bus. I've bisected the issue to the last commit, introducing word packing and async functions.
I'll try to see if I can also see issues with the EVK where I can look at what's being transmitted.
Generally I'm not sure how to feel about using async. Looking at the generated code I have to say it optimizes much better than I would have expected. However, there is still some code size and runtime overhead.

src/lib.rs Outdated Show resolved Hide resolved
@mciantyre
Copy link
Member Author

mciantyre commented May 20, 2024

Thanks for the feedback. One thing I noticed about this change: without a spin on the busy flag, this driver reduces the delay between transfers, or the time that chip select is deasserted between subsequent transactions. Given a 1MHz CLK, I see a reduction from 1430ns to 530ns for that delay.

If your workaround includes a spin on the busy flag, if you're bursting write(...)s, and if you were previous at your SPI display's minimum delay between transfers, then this could affect you.


There's some refactoring we could do to reclaim code space. I pushed a commit to show those changes. I agree that it's still not optimal; in particular, defining transfer with a try_join combinator doesn't get close to what might be ideal.

It's convenient (and fun) to define I/O like this, but I'm happy to drop it if folks need that space. mciantyre/imxrt-hal@ab053df isn't applied here, but it drops the futures from this branch.

@Florob
Copy link
Contributor

Florob commented May 20, 2024

Thanks for the feedback. One thing I noticed about this change: without a spin on the busy flag, this driver reduces the delay between transfers, or the time that chip select is deasserted between subsequent transactions. Given a 1MHz CLK, I see a reduction from 1430ns to 530ns for that delay.

That's interesting. I had tested the EVK yesterday and thought I saw the opposite result. So I re-tested and it turns out that inter-transfer gaps can be really strange at higher clock frequencies (I had 11 MHz). There are sometimes gaps in the order of multiple microseconds. Toggling a GPIO between each write() call indicates that in these cases there is actually more time spend in the write() call. However, that is AFAICT not a regression, and there are interactions with I-cache that might cause this, so it's probably fine.
The implication being that this is absolutely fine on i.MX RT 1021 and at least as far as bus performance goes does not appear to have a negative impact. Rather spinning on FIFO space instead of Busy seems to be an improvement.

If your workaround includes a spin on the busy flag, if you're bursting write(...)s, and if you were previous at your SPI display's minimum delay between transfers, then this could affect you.

Fortunately (because that would have been virtually impossible to fix) it's likely something else. For one the minimum allowed transfer gap is specified as 40ns, but I've also realized something else:
SPI MIPI interfaces typically have a separate D/C (data/command) pin, which is used to indicate what the data on the SPI bus means. This requires the SPI transfers to be blocking, or at least the ability to flush. That does not entirely explain why spinning for Busy on subsequent write() calls worked, but fundamentally brings us back to a variation on the original bug report: Users of e-h 0.2 expect the trait's methods to block. e-h 1.0 allows not blocking for the SpiBus trait, adding flush(). The SpiDevice trait's methods are however still blocking (implicitly flush).

There's some refactoring we could do to reclaim code space. I pushed a commit to show those changes. I agree that it's still not optimal; in particular, defining transfer with a try_join combinator doesn't get close to what might be ideal.

Interesting, that having two spin_on() (one via wait_for_fifo_space()) calls per operation reduces code size. I assume that is partially due to the inliner refusing to inline the wait_for_fifo_space().

It's convenient (and fun) to define I/O like this, but I'm happy to drop it if folks need that space. mciantyre/imxrt-hal@ab053df isn't applied here, but it drops the futures from this branch.

It's probably better to worry about this later. Having the bug fixed and e-h 1.0 support is more important than code size at least to me. If this turns out to be a real problem we can still try to optimize further down the line.

@mciantyre
Copy link
Member Author

SPI MIPI interfaces typically have a separate D/C (data/command) pin, which is used to indicate what the data on the SPI bus means. This requires the SPI transfers to be blocking, or at least the ability to flush. That does not entirely explain why spinning for Busy on subsequent write() calls worked, but fundamentally brings us back to a variation on the original bug report: Users of e-h 0.2 expect the trait's methods to block.

I keep forgetting folks want to synchronize external components with the LPSPI through these eh02 interfaces. The latest commit adds a flush at the end of writes and transfers, striving for approach 1 in #136. I'm also not sure why a busy spin in the subsequent write worked, but a concluding flush should provide better guarantees.

Any other thoughts on these changes? Otherwise, I plan to release this into the 0.5 series for more user testing.

CHANGELOG.md Show resolved Hide resolved
Copy link
Member

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, and makes sense. I love all the tests here.

I guess if you move to an async API some form of completion notification (interrupt) would be needed to mark a Future<> complete is that right? Is there an interrupt for FIFO empty/FIFO watermark/FIFO full events? I'd guess so? I don't have the ref manual handy or lpspi IP block handy in my mind to answer those, and not really needed here, but were curiosity questions that popped up while reading through this.

@Florob
Copy link
Contributor

Florob commented May 24, 2024

I keep forgetting folks want to synchronize external components with the LPSPI through these eh02 interfaces.

Obviously so do I, even though I'm technically one of these folks…

Any other thoughts on these changes? Otherwise, I plan to release this into the 0.5 series for more user testing.

Sounds and looks good to me.

Instead of using LPSPI continuous transactions, we pack the user's data
into u32 words. The primitives for translating the user's data to / from
the data FIFOs should help us as we consider async LPSPI drivers. And,
in order to implement embedded-hal 1.0 traits, we'll need something like
the dummy transmit / receive helpers, since we need to handle differing
transmit / receive buffer sizes. The included unit tests don't trigger
an error in Miri, and they try to simulate how we'd use the primitives
in firmware. (This is an "it's not obviously wrong" test, not an "it's
correct" test; help me review here.)

The commit introduces spinning futures into the LPSPI driver. By
combining and spinning on these futures, we can realize in-place
transfers, read-only transactions, write-only transactions, etc. These
implementations flush the FIFOs, allowing users to synchronize external
components with LPSPI I/O.

We no longer return the Busy error; we'll wait for transmit FIFO space.
We also never return the NoData error, instead returning success when
there's no I/O to do. Since this commit is a non-breaking change, the
two errors are still available in the error enum. I'll remove them
later.

I'm moving the blocking SPI example into RTIC and rewriting the driver
test. The tests demonstrate overlapping writes, writes with flushes, and
in-place transfers with a physical loopback. There's also tests that
show how word sizes and bit orders interact. I'd appreciate if folks
could test these changes in their system, since it affects how the
embedded-hal implementations behave. I'm only testing this commit on a
1170EVK with the new example.
@mciantyre
Copy link
Member Author

Thanks for the reviews!

I guess if you move to an async API some form of completion notification (interrupt) would be needed to mark a Future<> complete is that right? Is there an interrupt for FIFO empty/FIFO watermark/FIFO full events? I'd guess so?

Yup, the LPSPI supports watermarks and interrupt events for both FIFOs. We could use those to drive async I/O.

@mciantyre mciantyre merged commit f4a369e into imxrt-rs:main May 27, 2024
31 checks passed
@mciantyre mciantyre deleted the lpspi-spin branch May 27, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants