-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time sync issue with uxr_sync_session() #381
Comments
Could you provide a quick replicator? |
Sure, you can use this repo https://github.com/erikboto/microros-sync-bug-trigger. The example uses a really low timeout (3ms) since it triggers the bug on the first run, so it's easy to reproduce. In our actual application it's 5 ms, where it works most of the times but just fails sometimes. Using 5 ms here will always work so the bug is never triggered, since there's not a lot of interrupts or other threads running. I'm using a https://www.olimex.com/Products/IoT/ESP32/ESP32-DevKit-LiPo/open-source-hardware board, but I guess any HW will result in the same thing. On this board I'm using the "USB-uart" for microROS, and then I set up the log output to a UART on one of the GPIO pins. The microROS agent I run using the docker images: |
Why do you need to sync time so quickly? This approach was designed to calculate drift between clocks and then assume a certain stability for the drift variance. Which is your use case? |
Well this minimal replicator isn't really representative of how we are using it, here it is taken to an extreme just to replicate the issue. The reason I want to use a short timeout is that the system also samples sensors at high sample rates so I want to make sure the sync affects this as little as possible. So the real use-case is just to be able to call sync() with a short timeout, but with a long time between sync() calls, without risking that future sync() calls trigger this bug where an old reply is parsed causing the round-trip time calculations to be way off. My understanding was that the sync() call measures the round-trip time, which it uses to set the offset from the received timestamp. But is there some periodical timestamp message coming from the microROS agent even if I don't call sync() again, that it can use to avoid long term drift? Or how does it calculate the drift? (I'm by no means a time-sync expert so I might be missing something that is obvious to you). |
This approach uses an NTP-like algorithm for calculating the time offset between the MCU and the Agent's clocks. This is not a high-precision approach that targets to provide a valid (POSIX-like) time reference to an MCU that probably boots with a POSIX timestamp set to zero. Ideally, the sync procedure shall be run once a startup and it will be used internally in Micro XRCE-DDS Client to offset the timestamp as you can see here:
One approach that you can use is configuring the In any case, as you are right and time sync responses shall not be processed out of sync call, I have prepared this PR: #382 Could you take a look and validate that it fixes your use case? Finally you have some documentation about this feature here: https://micro-xrce-dds.docs.eprosima.com/en/latest/time_sync.html |
I had a quick look at the commit and I don't think it will solve the issue I see, which is that the old reply is read the next time I call sync(). So in my case Yeah using a custom on_time() callback could definitely work to make sure I don't set the time wrong, but it would still leave me in state where I always parse the reply from the previous sync() call which means that the new reply gets left for the next time. So I can't ever "catch up" and process the extra timestamp reply that is left in some buffer (at least that's what I think is happening). |
You are right, replies from different requests still can cause problems. It seems that the wire protocol is not designed to handle this because the definition of the sync request and reply does not provide any kind of request_id so we cannot differentiate between responses to different requests. |
I'm using an ESP32 with the micro-ROS component using a custom transport (serial port), and in my application I do a periodical rmw_uros_sync_session() call with a rather short time-out. The idea is that I want to make sure the clock doesn't drift during longer recordings, but I also keep the timeout short so it doesn't interfere with all my sensor sampling, basically it's not a big thing if the sync fails every once in a while since the periodic sync is mainly there to correct any drift.
But, every now and then I notice that we suddenly got a jump backwards in time. Looking into this in more detail I found that it happens some times after a successful sync following an unsuccessful one (unsuccessful due to timeout), but far from every time. Another thing I noticed is that the jump backwards in time was always half the time I set between my sync attempts.
After having a quick look at the code I would guess that this is caused by having an unprocessed timestamp_reply due to timing out before it's processed, since the calculations done in process_timestamp_reply() would cause this "sync period / 2" delay. Then on the next sync, the old reply is then processed and the new reply is left for the next sync() causing the time-shift to be persistent even after more syncs.
Not sure how this can be fixed best. Perhaps there's some way to invalidate previous timestamp replies when issue a new sync request, or add a id field to the request so that we can verify that we process the correct reply when calculating the session->time_offset.
The text was updated successfully, but these errors were encountered: