Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packet dropped on packet forwarder sx1303 corecell #134

Open
queifaro opened this issue Dec 9, 2024 · 12 comments
Open

packet dropped on packet forwarder sx1303 corecell #134

queifaro opened this issue Dec 9, 2024 · 12 comments

Comments

@queifaro
Copy link

queifaro commented Dec 9, 2024

Hello,
my environment:
cpu ARM
Linux OpenWRT (Kernel 2.6)
Corecell miniPCIE (sx1303)
GPS Ublox module (indoor antenna) with pps connected to the sx1303 (GPS locked and time ok)
packet forwarder on localhost (port up and down =1700) on one shell
on another shell --> net_downlink -f 865.1 -s 7 -b 125 -r 8 -t 500 -x 1 -P 1700 (1 pkt)

in packet forwarder all pkt (except sometimes the very first) result in

JSON down: {"txpk":{"imme":true,"freq":865.100000,"rfch":0,"powe":27,"modu":"LORA","datr":"SF7BW125","codr":"4/5","ipol":false,"prea":8,"ncrc":true,"nhdr":false,"size":4,"data":"AAAAAA=="}}
INFO: [down] a packet will be sent in "immediate" mode
src/jitqueue.c:131:jit_enqueue(): Current concentrator time is 400097239, pkt_type=2
src/jitqueue.c:172:jit_enqueue(): DEBUG: insert IMMEDIATE downlink, first in JiT queue (count_us=400177239)
src/jitqueue.c:109:jit_sort_queue(): sorting queue in ascending order packet timestamp - queue size:1
src/jitqueue.c:111:jit_sort_queue(): sorting queue done - swapped
src/jitqueue.c:439:jit_print_queue(): INFO: [jit] queue contains 1 packets:
src/jitqueue.c:440:jit_print_queue(): INFO: [jit] queue contains 0 beacons:
src/jitqueue.c:446:jit_print_queue(): - node[0]: count_us=400177239 - type=2
src/jitqueue.c:309:jit_enqueue(): enqueued packet with count_us=400177239 (size=4 bytes, toa=31000 us, type=2)
WARNING: --- Packet dropped (current_time=400202948, packet_time=400177239) ---

Please note that all the utility tool work fine

Please note that the same host and the same OS work fine with 1.5 reference design and https://github.com/Lora-net/lora_gateway library and https://github.com/Lora-net/packet_forwarder

Could you please provide soe suggestions in order to investigate further ?
Thanks in advance
B/R
Fabio

@queifaro
Copy link
Author

Hello,
little update:
after increasing TX_JIT_DELAY from 40000 to 60000 packet dropped situation disappears (happen only sometimes) but always get
WARNING: a downlink was already scheduled on rf_chain 0, overwritting it...
INFO: [jit] lgw_status returned TX_SCHEDULED

@mcoracin
Copy link
Contributor

Hello,
You may need to check the actual speed of the SPI writes on your platform.
The first warning you got "Packet dropped" means that it has been taken too late from the JIT queue before writing it to the sx1303 internal TX buffer.
So yes, increasing the JIT_DELAY can help, but you may need to check why you have such latency on your platform.

Then, the second warning you get, is that apparently it took too much time between the jit_dequeue and the time at which the packet has been actually written in the sx1303 TX buffer. The sx1303 will try to send packets at the timestamp it has been programmed for. If the internal counter of the sx1303 has exceeded the timestamp value when the packet arrives in the TX buffer, it will wait for the counter to roll-over, so the packet remains in the TX buffer. And it will be overwritten by the next request.

So overall it seems there are some timing issues on your platform, to be investigated.

Best regards,
Michael

@queifaro
Copy link
Author

Hi Michael,
i forgot to mention that the corecell <-> host connection is over usb interface. So probably the problem is not speed spi related ?
Last minute update: talking to my developers they told me that our uClibc-0.9.29 libraries do not implement the clock_nanosleep function and therefore they used the syscall(__NR_clock_nanosleep, clock_id, flags, request,remain);
Could this be the timing issue?
Thanks
Best regards,
Fabio

@mcoracin
Copy link
Contributor

Hi Fabio,
Ok, so, if the interface is usb, speed issues can be even worse.
On thing that you can try, is to set the DEBUG_PERF flag to 2 to add various timing measurments:

#define DEBUG_PERF 0 /* Debug timing performances: level [0..4] */

This flag enables prints when the _meas_time_start/_meas_time_stop macros are called in the code. For example:
_meas_time_start(&tm);

If it is too verbose, you'll need to tweak it.
But at least it should measure the complete time of execution of the sx1302_send() function, which is the function which configure the sx1303 for the packets to be sent.

Hope this helps.
Best regards,
Michael

@queifaro
Copy link
Author

Hi Michael, thanks for your support.
Maintaining TX_JIT_DELAY = 60000 and setting DEBUG_PERF = 2 the sx1302_send() execution time is :

lgw_send:1339: --- IN
PERF: .. sx1302_send 10.061000 ms
PERF: lgw_send 10.331000 ms
lgw_send:1473: --- OUT

Is it what you expect (more or less) ?

a note my host processor support USB 2.0 Full-Speed 12 Mbps speed.

Thanks
Best regards,
Fabio

@mcoracin
Copy link
Contributor

Hi Fabio,

Indeed, it seems that you have more latency than what I get on my side with a Raspberry Pi. I get 3ms for the lgw_send() where you have 10ms.

This means that there could be latencies here and there which make the packets reaching the sx1303 TX buffer too late for being sent on time.

The way communication between the linux host and the corecell MCU is done for USB corecells is that "SPI" commands are grouped by bulks to minimize the number of USB transactions which have, if I remember well a minimum duration of 1ms.
You can search for LGW_COM_WRITE_MODE_BULK and lgw_com_flush in the code to see how it is used to group communication with the sx1303.

In you case, it would be worth check timings you get in the downlink request chain of execution:

  • jit_queue
  • jit_peek
  • jit_dequeue
  • lgw_send => the call to lgw_com_flush in the sx1302_send() function.

The packet is supposed to be peeked from the jit_queue according to JIT_DELAY in advance to departure time.
You need to ensure that it reaches the SX1303 before the departure time.

I hope you'll be able to progress on this.

Which linux host are you using ?

Best regards,
Michael

@queifaro
Copy link
Author

Hi Michael,
i will try to investigate following your suggestions.

My Host Linux is Microchip AT91SAM9260:

https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/DataSheets/ARM_926EJS_TRM.pdf

 180 MHz ARM926EJ-S™ ARM® Thumb® Processor
̶ 8 Kbytes Data Cache, 8 Kbytes Instruction Cache, MMU
 Memories
̶ 32-bit External Bus Interface supporting 4-bank SDRAM/LPSDR, Static Memories, CompactFlash, SLC NAND
Flash with ECC
̶ Two 4-Kbyte internal SRAM, single-cycle access at system speed
̶ One 32-Kbyte internal ROM, embedding bootstrap routine
 Peripherals
̶ ITU-R BT. 601/656 Image Sensor Interface (ISI)
̶ USB Device and USB Host with dedicated On-Chip Transceiver
̶ 10/100 Mbps Ethernet MAC Controller (EMAC)
̶ One High Speed Memory Card Host
̶ Two Master/Slave Serial Peripheral Interfaces (SPI)
̶ Two 3-channel 32-bit Timer/Counters (TC)
̶ One Synchronous Serial Controller (SSC)
̶ One Two-wire Interface (TWI)
̶ Four USARTs
̶ Two UARTs
̶ 4-channel 10-bit ADC

Thanks
Best regards,
Fabio

@queifaro
Copy link
Author

Hi Michael,
after an investigation (pktforwarder and lib source code analisys) on your suggestion, i think that there are no tuning opportunity regarding the SPI COMMAND BULK SENDING to the MCU. For example the lgw_send hal function call the sx1302_send function that store all the spi mesage in the buffer and only near the end end of function flush calling lgw_com_flush. I think that in this manner the buffer will be always completely flushed to the MCU. So probaly, but i'm not so sure, the problem could be concerned in some latency on my USB v.2.0 Full Speed (12Mbps) USB Host bus that provide also an embedded root hub and an external (on board) usb hub.
The only other things concerning timing is the replacement of clock_nanosleep with syscall(__NR_clock_nanosleep, clock_id, flags, request,remain) due to the fact that my uClibc 0.29 do not support it.
Can i kyndly ask your opinion ?
Thanks in advance
Best regards,
Fabio

@mcoracin
Copy link
Contributor

Hi Fabio,

The fact that your system has bigger latency on USB than usual should not be a blocking issue. Your gateway will just be a bit less reactive than others but it is fine.
Have you tried just to increase more your JIT_DELAY, to 100ms or even 200ms, to see what happens.

You just need to be sure that the packet is peeked from the JIT queue in advance enough so that it is written to the sx1302 before its departure time.

But of course, if the latencies are large, you'll need to go easy on the net_downlink tx requests speed (the -t argument).

Can you try for example to set JIT_DELAY to 100ms, and check that if you send a packet every second with net_downlink, everyhting is fine, no tx packet is lost ?
Then increase the net_downlink speed to see the limit you get ?

Best regards,
Michael

@queifaro
Copy link
Author

Hi Michael,
thanks a lot for your suggestion. Setting JIT_DELAY to 200ms seems work fine. Now i will decrease the delay in order to reach the limit.
Do you see any drawbacks to increasing JIT_DELAY ?
Many thanks
Best regards,
Fabio

@mcoracin
Copy link
Contributor

The main drawback of increasing JIT_DELAY is that you won't be able to send packets from the gateway faster than this delay. So if you set 200ms, you won't be able to send more than 1 packet every 200++ ms.
If it fits your needs, I think it's not more problematic that this.
Best regards,
Michael

@queifaro
Copy link
Author

Hi Michael,
ok. Thanks a lot for your support. Have a nice Christmas.
Best regards,
Fabio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants