Guarding NMT transitions is broken #6

g-arjones · 2020-05-26T15:03:01Z

The current implementation (setting and resetting heartbeat period during state transitions) is a pain. Besides the fact that we have to handle RESET, RESET_COMMUNICATION, and STOP differently, it's not possible to transition back to start after stopping (because writing SDO's is not allowed while stopped). IMO, setting up the heartbeat period to a reasonable value (to allow guarding the transition) should be left to the driver developer. We could also consider a flag to disable this in toNMTState(). @doudou Thoughts?

The text was updated successfully, but these errors were encountered:

doudou · 2020-05-30T13:27:45Z

I agree that it's very sub-optimal.

The problem it is (poorly) trying to "fix" is to not have very long start/stop because we're waiting for heartbeat messages. Maybe simply having "runtime" and "configure" heartbeat periods set to reasonably high for the second (even could be zero) and rather low for the latter (100ms ?).

g-arjones · 2020-05-31T18:35:54Z

Not sure I understand what you are suggesting. How does having different periods change the fact that you can't do SDO download requests while the node is in a stopped state? At which point would you set these periods in the toNMTState() API?

doudou · 2020-06-01T15:14:37Z

Some context: I don't even go into STOP in the driver I'm currently using. Probably because it wasn't working :P But I do mandatorily reset in configureHook. Maybe that's the easiest behavior ? Since reset is the one transition that is acknowledged through the bootup message.

Another option, since we can't do SDOs in STOP state, one possibility would be to reach PreOperational as first step in configureHook. That would be (in pseudo-code):

while !confirmed && configure_deadline not reached
    send enter_pre_operational
    # Wait some time :( thanks guys for "non-acknowledged state transitions"
    set hearbeat SDO to configure heartbeat period
    # > on timeout, send reset-node and wait for bootup message, then retry
    wait for heartbeat
    # > on timeout, send reset-node and wait for bootup message, then retry
    confirmed = last hearbeat confirmed pre-operational
end

Then, transition to a slower "runtime heartbeat" in startHook after the start state has been reached:

# set the period to the "fast" period so that we don't have to do it in stopHook (and have an essentially inactive device spamming the bus)
set heatbeat period to configure period
send start remote node
wait for heartbeat confirming OPERATIONAL state
# > fail on timeout
set heartbeat period to runtime period

finally, reset the node in exceptionHook (with synchro to bootup message) and stop it in cleanupHook.

This way, none of the NMT methods will need to change the heartbeat (as you wished). It will break existing CANOpen components, but I don't think there are any in the wild except for those you and I are writing.

Implementation-wise, I would suggest putting the implementation of the code snippets to utility methods within SlaveTask, and create a CommonSlaveTask that calls them in their lifecycle hooks. This way, people will be able to create non-standard lifecycle handling for devices that don't match this particular "common" behavior.

g-arjones · 2020-06-01T16:20:00Z

Yeah, well, that still wouldn't work for me and I think it still wouldn't work in the most basic use case which is to enter STOPPED state in stopHook and to enter OPERATIONAL in startHook. What I was trying to achieve was to be able to stop the component and start it without having to go through configuration again. For that to work, you have to set up the heartbeat periods in configureHook only (which makes sense, IMO). I do understand your concern about having the device spamming the bus with heartbeat messages but unless there are many nodes on the bus this should not be a problem for most people.

doudou · 2020-06-01T17:25:40Z

the most basic use case

Well. Depends on what you call "the most basic use case". You never said what you were looking for (which is going to STOP in RTT's stopHook), so I couldn't guess. What I proposed was for me "the most basic use case". We definitely had two different views on the basic use cases -- not to say that we can't agree on a final version :P

unless there are many nodes on the bus this should not be a problem for most people.

I find it interesting that you want to transition to STOPPED as a generic change (with the associated problems), but don't consider that spamming a very bandwidth-limited bus a potential problem. For the three controllers I worked on, the NMT state machine was essentially separate from the actual controller's state machine, which is why I did not see the urge of going into STOPPED.

Now, why not replicating the algorithm I was proposing for PreOperational for Operational ? It would not even need to have a heartbeat set up. This is essentially a fixed version of the toNMTState algorithm that works when going out of STOP. Would "just" need a special case for STOPPED :(

g-arjones · 2020-06-01T18:52:21Z

you never said what you were looking for

Yes, poor wording on my part.

you want to transition to STOPPED as a generic change

From my perspective, this is something that was intended in the original implementation (transitioning to STOPPED was supposed to work). So I opened the issue to discuss a potential fix since it actually doesn't.

don't consider that spamming a very bandwidth-limited bus a potential problem

I do, but I also think that if the period of the heartbeat messages is not a problem during OPERATIONAL state then they probably won't be during STOPPED unless you have a set of nodes in each state (which sounds rather specific to me).

Would "just" need a special case for STOPPED :(

Do you mean relying on the StateMachine current known node state to determine whether or not we should set up the periods? I think that would make sense.

Just to be clear, I am not trying to impose my use case here (I actually worked around it by resetting the node in startHook). The goal really is to improve the library if we both agree that improvement is due.

doudou · 2020-06-05T00:06:19Z

@g-arjones I appreciate you making the effort of discussing this issue. The transition to STOPPED is interesting/useful. Just felt (maybe wrongly) a little bit attacked by the wording indeed. Oh well.

The "new" situation - even with having a high-frequency heartbeat message - would be better than the current one, so that's one positive thing for your approach.

Now, heartbeat messages are very high-priority on the CAN bus, so while we're looking into allowing a transition to STOPPED (CANOpen) in STOPPED (Rock) - confusing wording intended - I thought we might want to do it right if we can.

Do you mean relying on the StateMachine current known node state to determine whether or not we should set up the periods? I think that would make sense.

Yes. To be more specific, I mean doing a reset routine in configureHook that guarantees our knowledge of the CANOpen's device state, and then rely on our knowledge of each transition to allow us to do just that.

g-arjones · 2020-06-05T00:21:08Z

Just felt (maybe wrongly) a little bit attacked by the wording indeed.

Oh. That was most certainly not my intention. I'm so sorry.

Well, we agree on the fix at least. As I said, I worked around it but I will find the time to implement what we've discussed. Thank you for the thoughts.

doudou · 2020-06-05T00:22:05Z

Oh. That was most certainly not my intention. I'm so sorry.

Don't be. Knowing you, that was a very wrong reaction of me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guarding NMT transitions is broken #6

Guarding NMT transitions is broken #6

g-arjones commented May 26, 2020

doudou commented May 30, 2020

g-arjones commented May 31, 2020

doudou commented Jun 1, 2020

g-arjones commented Jun 1, 2020

doudou commented Jun 1, 2020

g-arjones commented Jun 1, 2020

doudou commented Jun 5, 2020

g-arjones commented Jun 5, 2020

doudou commented Jun 5, 2020

Guarding NMT transitions is broken #6

Guarding NMT transitions is broken #6

Comments

g-arjones commented May 26, 2020

doudou commented May 30, 2020

g-arjones commented May 31, 2020

doudou commented Jun 1, 2020

g-arjones commented Jun 1, 2020

doudou commented Jun 1, 2020

g-arjones commented Jun 1, 2020

doudou commented Jun 5, 2020

g-arjones commented Jun 5, 2020

doudou commented Jun 5, 2020