Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spi transfer16 is not 16 bit #2514

Open
drmcnelson opened this issue Sep 12, 2024 · 21 comments
Open

spi transfer16 is not 16 bit #2514

drmcnelson opened this issue Sep 12, 2024 · 21 comments

Comments

@drmcnelson
Copy link

drmcnelson commented Sep 12, 2024

Describe the bug

SPI transfer16() produces two 8 byte transfers rather than one 16 bit transfer.

To Reproduce
Here is a code that is used for testing SPI transfers. In this instance, the CS pin tells an ADC to convert its input to digital, then 730 nanoseconds later the data is ready for a 16 bit SPI transfer.

With an SPI clock of 50kHz, this should be able to run at close to 1MSPS.

   for (int n = 0; n < NKNTS; n++ ) {

   // For the ADC
DIGITALWRITE(CSPin,HIGH);
DELAYNANOSECONDS(ADC_WAIT_NANOSECONDS);
DIGITALWRITE(CSPin,LOW);

// For the oscilloscope
DIGITALWRITE(SPAREPIN,HIGH);  

    // 16 bit transfer
data[n] = SPI.transfer16(0xFFFF);

// For the oscilloscope       
DIGITALWRITE(SPAREPIN,LOW);

  }

`

And here is the oscilloscope trace. Notice that rather than one 16 bit transfer, we have two 8 bit transfers. And note that they are separated by 400 nsecs, but there is 1200 nsecs before and after.

TEK0001

Here is the implementation, for transfer16() it calls spi_transfer( .... length )

uint16_t SPIClass::transfer16(uint16_t data, bool skipReceive)
{
uint16_t tmp;
if (_spiSettings.bitOrder) {
tmp = ((data & 0xff00) >> 8) | ((data & 0xff) << 8);
data = tmp;
}
spi_transfer(&_spi, (uint8_t *)&data, (!skipReceive) ? (uint8_t *)&data : NULL, sizeof(uint16_t));
if (_spiSettings.bitOrder) {
tmp = ((data & 0xff00) >> 8) | ((data & 0xff) << 8);
data = tmp;
}
return data;
}

And, in utility/spi.c, we find that spi_transfer() simply loops over LL_SPI_TransmitData8( ).

spi_status_e spi_transfer(spi_t *obj, const uint8_t *tx_buffer, uint8_t *rx_buffer,
uint16_t len)
{
spi_status_e ret = SPI_OK;
uint32_t tickstart, size = len;
SPI_TypeDef *_SPI = obj->handle.Instance;
uint8_t *tx_buf = (uint8_t *)tx_buffer;
if (len == 0) {
ret = SPI_ERROR;
} else {
tickstart = HAL_GetTick();
#if defined(SPI_CR2_TSIZE)
/* Start transfer */
LL_SPI_SetTransferSize(_SPI, size);
LL_SPI_Enable(_SPI);
LL_SPI_StartMasterTransfer(_SPI);
#endif
while (size--) {
#if defined(SPI_SR_TXP)
while (!LL_SPI_IsActiveFlag_TXP(_SPI));
#else
while (!LL_SPI_IsActiveFlag_TXE(_SPI));
#endif
LL_SPI_TransmitData8(_SPI, tx_buf ? *tx_buf++ : 0XFF);
#if defined(SPI_SR_RXP)
while (!LL_SPI_IsActiveFlag_RXP(_SPI));
#else
while (!LL_SPI_IsActiveFlag_RXNE(_SPI));
#endif
if (rx_buffer) {
*rx_buffer++ = LL_SPI_ReceiveData8(_SPI);
} else {
LL_SPI_ReceiveData8(_SPI);
}
if ((SPI_TRANSFER_TIMEOUT != HAL_MAX_DELAY) &&
(HAL_GetTick() - tickstart >= SPI_TRANSFER_TIMEOUT)) {
ret = SPI_TIMEOUT;
break;
}
}
#if defined(SPI_IFCR_EOTC)
// Add a delay before disabling SPI otherwise last-bit/last-clock may be truncated
// See https://github.com/stm32duino/Arduino_Core_STM32/issues/1294
// Computed delay is half SPI clock
delayMicroseconds(obj->disable_delay);
/* Close transfer */
/* Clear flags */
LL_SPI_ClearFlag_EOT(_SPI);
LL_SPI_ClearFlag_TXTF(_SPI);
/* Disable SPI peripheral */
LL_SPI_Disable(_SPI);
#else
/* Wait for end of transfer */
while (LL_SPI_IsActiveFlag_BSY(_SPI));
#endif
}
return ret;
}
#ifdef __cplusplus
}
#endif

You already have a LL_SPI_TransmitData16( ), under hardware/stm32/2.8.1/system/Drivers/

What is missing in this implementation is a spi_transfer16() and to change transfer16() to call it.

Thank you

@drmcnelson drmcnelson changed the title spi transfer16 is not 16 bit, and excessive overhead spi transfer16 is not 16 bit Sep 12, 2024
@drmcnelson
Copy link
Author

drmcnelson commented Sep 13, 2024

Yes, notice that they are derived from the same earlier version, The earliest version that I found, ran on the uno, an 8bit 16MHZ legacy platform marketed to hobbyists.

Adapting it in form to the much faster 16 and 32 bit platforms, leaving the internal kluges and inefficiencies in place was amateur and hurried, and the result is simply not correct in many places. (I am referring to the history of this code, not a specific person).

Two errors that require urgent attention are: (a) 16 bit transfers implemented as 8 bit transfers, and (b) failure to also provide a loop friendly interface (i.e., separate calls for setup and transfer, rather than forcing setup and transfer into every call)

The erroneous implementation of 16 bit transfers as two 8 bit transfers, alone, reduces performance by something like a factor of 2 or more. Forcing setups into very transfer and not providing a loop friendly API, reduces performance by a solid factor of 8.

The result is that the API is therefore incompatible with a large class of important peripherals; c.f., any device that requires a pin assertion between transfers and better than 250KSPS transfers. An ADC reading a linear CCD would be a good example.

Sometimes there has been push back on fixing this, it is usually about the transaction interface, and it usually comes from that same hobbyist crowd. The logic does not hold up. Reserving the device, even pushing the settings onto a stack, if you think you must despite it being an on-the-metal platform, should not be effected by what you do with it once it is reserved.

Concluding this introduction, lets note that limiting a 200MHz MCU with a 50MHZ SPI clock to 250ksps transfers, only because the code was inherited from a slow 8 bit legacy platform that catered to hobbyists, does not make a lot of sense and it is not good for the platform.

My solution for that for the UNO R4, is here . There is also a solution for the Teensy 4.x, the discussion is here

In both of those, the patch (a) implements a true 16 bit transfer, and (b) provides two new calls, one for the setup and another for the actual transfer.

A loop in a user program can look like this:

  SPI.transfer16_setup();
 for (n=0;n<2048;n++) {

     // Toggle the CONVERT for the ADC
     digitalWrite(some_pin,HI);
     nanosecondwait(500);
     digitalWrite(some_pin,LOW);

     // 200 nanosec to the first clock, so 700 nanosec total, as required by the ADC
     SPI.transfer16_transfer(inbuffer[n].outbuffer[n]);
 }

Note that with a 50MHZ SPI clock, this could run at close to 1MSPS.

The STM32 could be a serious platform. It deserves a little more attention to the API for the SPI, one of its most important peripheral interfaces.

(BTW The original title referred also to overheads. Note that is indeed part of the problem. But feel free to choose a title that makes sense.)

@drmcnelson
Copy link
Author

P/S, i forgot, in the loop-friendly API, you may want a "restore" as well. Looking at the transfer times. I see there is only 400 nsecs between the two 8 bit transfers, but 1200nsecs before and 1100nsecs after. So that means a correct loop-friendly call, besides the 16bits/50MHZ, only needs an additional 400nsecs.

@fpistm
Copy link
Member

fpistm commented Sep 13, 2024

Main issue with your setup/transfer api's is it is not Arduino api compatible.

@drmcnelson
Copy link
Author

What makes it incompatible?

a) Those calls that I suggest, are what is missing from the Arduino API to make it actually useful.

b) You already added new calls and functionality to the API, setSSEL() etc.

@drmcnelson
Copy link
Author

drmcnelson commented Sep 13, 2024

Looking at the code again, if you could save me a little time and explain what the conditional compiles are about in this, I might try a patch and you can see if you like it enough for a PR.

spi_status_e spi_transfer(spi_t *obj, const uint8_t *tx_buffer, uint8_t *rx_buffer,
uint16_t len)
{
spi_status_e ret = SPI_OK;
uint32_t tickstart, size = len;
SPI_TypeDef *_SPI = obj->handle.Instance;
uint8_t *tx_buf = (uint8_t *)tx_buffer;
if (len == 0) {
ret = SPI_ERROR;
} else {
tickstart = HAL_GetTick();
#if defined(SPI_CR2_TSIZE)
/* Start transfer */
LL_SPI_SetTransferSize(_SPI, size);
LL_SPI_Enable(_SPI);
LL_SPI_StartMasterTransfer(_SPI);
#endif
while (size--) {
#if defined(SPI_SR_TXP)
while (!LL_SPI_IsActiveFlag_TXP(_SPI));
#else
while (!LL_SPI_IsActiveFlag_TXE(_SPI));
#endif
LL_SPI_TransmitData8(_SPI, tx_buf ? *tx_buf++ : 0XFF);
#if defined(SPI_SR_RXP)
while (!LL_SPI_IsActiveFlag_RXP(_SPI));
#else
while (!LL_SPI_IsActiveFlag_RXNE(_SPI));
#endif
if (rx_buffer) {
*rx_buffer++ = LL_SPI_ReceiveData8(_SPI);
} else {
LL_SPI_ReceiveData8(_SPI);
}
if ((SPI_TRANSFER_TIMEOUT != HAL_MAX_DELAY) &&
(HAL_GetTick() - tickstart >= SPI_TRANSFER_TIMEOUT)) {
ret = SPI_TIMEOUT;
break;
}
}
#if defined(SPI_IFCR_EOTC)
// Add a delay before disabling SPI otherwise last-bit/last-clock may be truncated
// See https://github.com/stm32duino/Arduino_Core_STM32/issues/1294
// Computed delay is half SPI clock
delayMicroseconds(obj->disable_delay);
/* Close transfer */
/* Clear flags */
LL_SPI_ClearFlag_EOT(_SPI);
LL_SPI_ClearFlag_TXTF(_SPI);
/* Disable SPI peripheral */
LL_SPI_Disable(_SPI);
#else
/* Wait for end of transfer */
while (LL_SPI_IsActiveFlag_BSY(_SPI));
#endif
}
return ret;
}
#ifdef __cplusplus
}
#endif

@fpistm
Copy link
Member

fpistm commented Sep 16, 2024

What makes it incompatible?

a) Those calls that I suggest, are what is missing from the Arduino API to make it actually useful.

b) You already added new calls and functionality to the API, setSSEL() etc.

Well several examples already exists, if new one are created using those new API they will make it incompatible with other core.
the setxxx are dedicated but only for setup purpose and it is easy to use it and having a sketch compatible with other core:

#if defined ARDUINO_ARCH_STM32
SPI.setxxx();
#endif

While new API simply change the way you implement the SPI transaction.

About the "conditional compiles", it is simply to have the code working for each STM32 series which can have different SPI IP and so required different configuration or API.

@drmcnelson
Copy link
Author

drmcnelson commented Sep 16, 2024

I am sorry, that is not correct at alll.

Incompatibility means that something you add or change, causes something in the existing API to no longer meet it's specification.

Think of it in terms of code that uses the library: As long as existing code that uses the library continues to work to spec, then an added function cannot be said to be incompatible.

You have undoubtedly seen the term "backward compatibility".

In this case, transfer16() was one such example, I remember when it was added. Teensy did it correctly, Arduino did not. But both are lacking a way to run an efficient loop.

As long as you insist on this very incorrect position, your SPI implementation will be limited to taking 5 usecs for transfers that should take 320 nsecs.

Aside, your SPI.setxxx() would of course be backward compatible. And I am very interestred in setSSEL(), but unfortunately when I add that call to my setup(), it crashes.


P/S Adding SPI.transfe16_setup() and SPI.transfer16_transfer(), will of course help sell the board and your API.

The 50MHz SPI is a big deal, it is a terrible shame that is squandered by stopping short when you already did so much work and you are already so close.

@fpistm
Copy link
Member

fpistm commented Sep 16, 2024

My concern is about the example (sketch) which "should" as much as possible be compatible between core. This is the way Arduino works. That's all 😉

As long as you insist on this very incorrect position, your SPI implementation will be limited to taking 5 usecs for transfers that should take 320 nsecs.

I don't insist 😉 we discuss. I never told it will not be implemented. This is a community project, any contribution are welcome.

Aside, your SPI.setxxx() would of course be backward compatible. And I am very interestred in setSSEL(), but unfortunately when I add that call to my setup(), it crashes.

See my answer on the forum -> https://www.stm32duino.com/viewtopic.php?p=14825#p14825
As usual wrong usage.

The 50MHz SPI is a big deal, it is a terrible shame that is squandered by stopping short when you already did so much work and you are already so close.

Why? First goal was to provide support for STM32 series in Arduino with Arduino API references. I know it provides limitations and it could be improved but as always time is the missing ingredient. Several other features required to be implemented or enhanced., do my best. As stated any contribution are welcome.

@drmcnelson
Copy link
Author

My concern is about the example (sketch) which "should" as much as possible be compatible between core. This is the way Arduino works. That's all 😉

Correcting transfer16() to do 16bit transfers, and adding loop friendly calls should not change the way the existing example works.

I don't insist 😉 we discuss. I never told it will not be implemented. This is a community project, any contribution are welcome.

Yes I apologize for sounding argumentative. There is a lot of resistance and appeal to dogma in the community. The transaction api is a good example.

See my answer on the forum -> https://www.stm32duino.com/viewtopic.php?p=14825#p14825 As usual wrong usage.

Thank you, I am in the lab at the moment and dont have that log-in. I see that it was the wrong pin., no doubt a result of failing to digest the datasheet.

Are D13, D12, D11 then not the pins that it is using for the SPI? My intent was to use the next pin, as is customary,

Also I wasnt sure which SPI it is using, it seems like SP1,

The 50MHz SPI is a big deal, it is a terrible shame that is squandered by stopping short when you already did so much work and you are already so close.

Why?

Because that it is what enables 1MSPS reads and that ties into a large swath of applications that are important in using arduino class boards for doing science.

First goal was to provide support for STM32 series in Arduino with Arduino API references. I know it provides limitations and it could be improved but as always time is the missing ingredient. Several other features required to be implemented or enhanced., do my best. As stated any contribution are welcome.

My philosophy is that while I might not implement everything, I try to at least, not introduce any limitation not present in the hardware. So there is a difference between fully supporting a rich set of features provided by the hardware, and for that I agree, who has time. Unnecessarily choking off performance is something else, and that is what the historic SPI library does. Fortunately, it can be fixed with a patch or two plus a few extra calls that do not disturb the API already in place.

@drmcnelson
Copy link
Author

Are you using CLA for PR's?

If not, then I might take a look at patching the library, perhaps even today.

It looks like you did a good job with factoring, so it seems like it should be easy.

@fpistm
Copy link
Member

fpistm commented Sep 16, 2024

Are D13, D12, D11 then not the pins that it is using for the SPI? My intent was to use the next pin, as is customary,

Also I wasnt sure which SPI it is using, it seems like SP1,

D11 to D13 are usually the default SPI pins (for boards with Arduino UNO connector) and D10 used a CS. But in a general way CS is managed by the application (software). When you used setSSEL(), you ask to use the Hardware CS and so no need to drive the pin at application level.
For the Nucleo F722, unfortunately the D10 pin does not have this capability.

To go deeper, when a board have a Arduino connector, default SPI pins are D10 to D13:

/* Default for Arduino connector compatibility */
/* SPI Definitions */
#ifndef PIN_SPI_SS
#define PIN_SPI_SS 10
#endif
#ifndef PIN_SPI_SS1
#define PIN_SPI_SS1 4
#endif
#ifndef PIN_SPI_SS2
#define PIN_SPI_SS2 7
#endif
#ifndef PIN_SPI_SS3
#define PIN_SPI_SS3 8
#endif
#ifndef PIN_SPI_MOSI
#define PIN_SPI_MOSI 11
#endif
#ifndef PIN_SPI_MISO
#define PIN_SPI_MISO 12
#endif
#ifndef PIN_SPI_SCK
#define PIN_SPI_SCK 13
#endif

In our case for Nucleo F722:
#define PA7 PIN_A10
#define PA6 PIN_A11
#define PA5 PIN_A12

Thanks to the PeripheralPins.c:
{PA_7, SPI1, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI1)},

{PA_6, SPI1, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI1)},

{PA_5, SPI1, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI1)},

So to answer your question SPI uses SPI1.

Finally, you can check in the below table which SPI SSEL to use:

WEAK const PinMap PinMap_SPI_SSEL[] = {
{PA_4, SPI1, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI1)},
{PA_4_ALT1, SPI3, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF6_SPI3)},
{PA_15, SPI1, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI1)},
{PA_15_ALT1, SPI3, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF6_SPI3)},
{PB_4, SPI2, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF7_SPI2)},
{PB_9, SPI2, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI2)},
{PB_12, SPI2, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI2)},
{PE_4, SPI4, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI4)},
{PE_11, SPI4, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI4)},
{PF_6, SPI5, STM_PIN_DATA(STM_MODE_AF_PP, GPIO_PULLUP, GPIO_AF5_SPI5)},
{NC, NP, 0}

Are you using CLA for PR's?

No. Only follow the CONTRIBUTING.md

@drmcnelson
Copy link
Author

drmcnelson commented Sep 16, 2024

Thank you, quite elaborate and seems like several layers of naming. So it seems like SPI1 uses PA_5,PA_6,PA_7 and PA_4 for CS.

What is confusing then, is this from the user manual (UM1974), page 58, where it says the arduino SPI uses PA5, PA6, PA7, as D13,D12,D11, and for D10 it is not PA4, but rather PD14

image

So, bottom line, what goes in the call to setSSEL(), and what pin does it appear on?

Thank you

@drmcnelson
Copy link
Author

I found it, page 32 in the user manual. If one uses an Arduino shield adapter, the pin is covered. So that means it needs a custom adapter board. Not a big deal, the first thing though is to see the timing and whether it works with the ADC.

image

@fpistm
Copy link
Member

fpistm commented Sep 17, 2024

So, bottom line, what goes in the call to setSSEL(), and what pin does it appear on?

Well, if PD14 is described as SPI1_CS then there is a mistake in the datasheet or in the stm32 open pin data: https://github.com/STMicroelectronics/STM32_open_pin_data/blob/f4ec11f00e762e37ffc4020f6d4f20d225bc061d/mcu/STM32F722Z(C-E)Tx.xml#L626-L632

The PeripheralPins.c is generated from this file that's why PD14 is not in the SPI SSEL array.
Looking at the STM32F722ZE datasheet, there is no SPI_CS for PD14:
image

My guess is the Nucleo F722ZE datasheet simply tell that PD14 can be used for CS on the Arduino connector to be compatible. As Arduino manages CS by software (gpio toggle) and not by hardware (manage by the SPI IP itself).

I found it, page 32 in the user manual. If one uses an Arduino shield adapter, the pin is covered. So that means it needs a custom adapter board. Not a big deal, the first thing though is to see the timing and whether it works with the ADC.

About this it is not page 32 but 35 even if several pins match.
image

@drmcnelson
Copy link
Author

drmcnelson commented Sep 18, 2024

Yes that was the table that I meant to cite.

@drmcnelson
Copy link
Author

Okay, so the code compiles and runs using PA_4 or D24. i.e. SPI.setSSEL(PA_4) and SPI.setSSEL(24), both compile and run.

In the following, the traces from top to bottom are yellow = D13 (SCK), blue = D12 (MISO), red = D11 (MOSI), and green = PA_4 (D24).

Notice that the first time the SPI is called, the MOSI is low and goes high for the transfers. That is actually not correct, It needs to be high when not active.

The second time the SPI loop is invoked, MOSI stays high. So it is not consistent,.

The PA_4 pin (SSE)L does nothing.

STM32_SPI_run1

STM32_SPI_run2

And here is the interface using D10 for the CSEL, under program control. You can see the MOSI again starts low, and then goes to the high state after the second pair of 8 bit transfers.

STM32_SPI_withCNVST

@drmcnelson
Copy link
Author

Okie dokie,

Here is an implementation for 16 bit transfers. The attached is a zip of the SPI/src tree with the 16 bit transfer implemented. I left the old routine in place with a suffix on the name. The mod for the 16 bit transfer is pretty simple, really. Does it look right?

SPI_src.zip

The good news is that there seems to be no per-transfer setup that could be optimized out of the call for loops. So there is nothing that requires or would benefit from a loop friendly API. (I am still testing this, so please consider it preliminary).

There are two items of less good news.

A) The overheads anyway are large, and for the one before transfer starts it is not a constant delay. So that is a big deal for some real time applications.

TEK0003

I think the delays comes from here, does that seem right?

while (size--) {
#if defined(SPI_SR_TXP)
while (!LL_SPI_IsActiveFlag_TXP(_SPI));
#else
while (!LL_SPI_IsActiveFlag_TXE(_SPI));
#endif

#if defined(SPI_SR_RXP)
while (!LL_SPI_IsActiveFlag_RXP(_SPI));
#else
while (!LL_SPI_IsActiveFlag_RXNE(_SPI));
#endif

Aside, for the ADC, there is a pin CNVST (connected to the SSEL line) that starts the conversion, conversion then takes 700nsecs and the CNVST pin needs to be cleared at least 30 nsecs before readout. That means to get best performance, I would have to hack that into the transfer routine, i.e. set it, do the idiot loop for the interface ready flag, then clear cnvst, and start the transfer. Doable, but tedious.

B) The MOSI line is, well, flakey, and not completely correct.

I am guessing there is a setup or config somewhere that controls the idle state for the MOSI line, is that correct?

Here is the aberrant behavior that needs to be fixed:

i) The apparent idle state before the first call is low (should be high)

ii) For txdata, it goes high momentarily for the first transfer, that is half-correct (see above, i)

iii) For the second transfer it goes high and stays high. It stays high after this.

TEK0002

The correct or at lease more conventional, behavior for txdata 0xFFFF, of course, is that it starts high and stays high.

@fpistm
Copy link
Member

fpistm commented Sep 23, 2024

Does it look right?

SPI_src.zip

Please create a PR in draft. Archive is really not efficient to track and comment.

@drmcnelson
Copy link
Author

The PR draft is submitted.

I also want to take up the issue with the MOSI pin state. That too is critical. Shall I make a new issue for it?

@fpistm
Copy link
Member

fpistm commented Sep 23, 2024

I also want to take up the issue with the MOSI pin state. That too is critical. Shall I make a new issue for it?

No. It is not a bug. MOSI pins kept the last value level. We have no way to specify the level when CS is not active. And see no reference it should be high in SPI documentations. It is simply "don't care"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants