Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples Error on alternating runs without device reset #23

Closed
dcyoung opened this issue Jan 19, 2017 · 6 comments
Closed

Examples Error on alternating runs without device reset #23

dcyoung opened this issue Jan 19, 2017 · 6 comments
Labels

Comments

@dcyoung
Copy link

dcyoung commented Jan 19, 2017

Running the examples ./example-c, ./example-c++ or ./example-viewer works the first time, but unless the sensor is reset between each run, the example will error on subsequent attempts in an alternating pattern. Ie: the 2nd attempt will fail, but the 3rd will succeed, 4th will fail, etc

In the case of ./example-c the stated error is:

Error: unable to receive motor speed command response

In the case of ./example-c++ the stated error is:

Error: unable to receive stop scanning command response

And in the case of ./example-viewer the stated error is:

Error: unable to receive start scanning command response

Has this always been the behavior or is it a new issue? I don't believe any changes were made to the protocol for DS or DX receipts. I haven't started tracking down the origin of the issue yet, but I'm assuming the device is not being shutdown properly. Whether its a logical error in the example itself, or an error with libsweep is unclear.

Edit: Looking at the example code, I don't see where we send a DX stop command to the sensor. However, after the example runs the device does stop transmitting serial data (judging from the TX led on the USB Serial adapter which stops blinking). It was my understanding that the device would continue to transmit data regardless of the state of the host port.... so somehow the device is getting that stop command. Is it hidden somewhere I might have overlooked?

If the example does trigger the host to send a DX stop data command before or during the process of shutting down, are the still incoming Data Block receipts handled properly? Recall that after sending a DX command, the host should still expect to receive Data Block receipts until the DX receipt is received, which indicates no further Data Block receipts are coming. Is it possible that the host sends the DX and then immediately shuts down? This might be leaving incoming bytes in the serial buffer that are misinterpreted at the beginning of the example's next execution... producing the observed alternating errors.

@dcyoung dcyoung changed the title Examples Error on subsequent runs without device reset Examples Error on alternating runs without device reset Jan 19, 2017
@daniel-j-h daniel-j-h added the Bug label Jan 19, 2017
@daniel-j-h
Copy link
Collaborator

No, this should not happen. I never use the reset command in my examples and playground work (except for testing it once while implementing it). But I also used some older device versions.

The sweep_device_stop_scanning function is the only place where we send the stop command. We

  • send the stop command
  • wait a bit (kinda ugly) for the device to receive and stop sending data
  • send the stop command again and wait for the ack response

When we initialize a new Sweep device in sweep_device_construct we also call the sweep_device_stop_scanning function here to work around already started devices.

I remember talking to @kent-williams about this but back then this was what we were supposed to do.

@daniel-j-h
Copy link
Collaborator

Just tried to reproduce this issue with the device I have here (note this is a old version, the new ones will hopefully arrive next week). But I was unable to.

What I tried:

  • compile and install libsweep
  • compile the examples
  • plug in the device
  • start the viewer example (works), stop the example
  • start the viewer example a second time (works), stop the example
  • unplug the device
  • plug in the device
  • start the viewer example (works), stop the example
  • start the viewer example a second time (works), stop the example

All the viewer examples does is starting the scanning and receiving responses here.

As soon as the new units arrive I can test and try to reproduce again. Any hints from your side what could happen here? cc @kent-williams @dcyoung

@dcyoung
Copy link
Author

dcyoung commented Jan 23, 2017

Odd that the behavior was not reproduce-able. I'll double check on my end again in a bit.

I never use the reset command in my examples

Sorry for the confusion... When i said "reset", I was actually referring to resetting the device manually (unplugging it and plugging it back in), not sending the reset command over serial.

The sweep_device_stop_scanning function is the only place where we send the stop command

I'm aware that the stop command is transmitted in the referenced method, but I'm still unable to find where the method sweep_device_stop_scanning is called in the examples, post data acquisition? Consider example.c. It appears that:

  • sweep_device_start_scanning method initiates the acquisition
  • a scan is read with sweep_device_get_scan, and then parsed with other methods (REPEATS 10 times)
  • the device is destroyed using sweep_device_destruct

Am I missing where the sweep_device_stop_scanning method comes into play? It seems like it would be placed at the end of the acquisition, and before the destruction of the device.

Edit: I missed the bit in your original response regarding the behavior of sweep_device_construct. Upon further inspection, I see that the sweep_device_construct calls the sweep_device_stop_scanning method. While it is a good idea to ensure data is not streaming when you create a new device, i think it should also be standard practice to stop scanning in the following scenarios:

  • prompted by the application code once desired data acquisition is over (ie: prompted by the example.c when it finishes gathering 10 scans)
  • if the device is destroyed while it is scanning

In example.c, adding a call to sweep_device_stop_scanning immediately before the call to sweep_device_destruct solved the issue. This should probably be added to the examples to indicate best practice... but also this case (destroying a device when it is acquiring data) should be considered by the SDK and dealt with.

Destroying a device would obviously occur in multiple circumstances, some of which might not even permit data transmission (ie: disconnected device). We want destroy the device gracefully in all situations. So, perhaps we should consider a shutdown method that first stops any existing scanning and then destroys the device.

Edit 2: Given that sweep_device_construct calls sweep_device_stop_scanning, I'm still unsure why I received errors on subsequent runs before my modifications to the example.c. @kent-williams was mentioning that the serial port needs to be flushed on destruction, which I don't think we're doing currently. Perhaps the issue is accumulating contents in the buffer before the next sweep_device_construct is called. It seems that the sweep_serial_device_construct and sweep_device_stop_scanning are the only places we flush the buffer. Not entirely clear on all the possible sequence of events yet, but this might be worth looking into. At the very least we should probably flush the port when it is destroyed, ie: gracefully shutdown in sweep_serial_device_destruct, which again should only be done after we have guaranteed that data streaming has stopped (either by detecting a disconnected device or by sending the stop command to a connected device).

@daniel-j-h
Copy link
Collaborator

daniel-j-h commented Jan 25, 2017

Am I missing where the sweep_device_stop_scanning method comes into play? It seems like it would be placed at the end of the acquisition, and before the destruction of the device.

Good catch, this could be done here before shutdown.

Not sure if the sweep_device_destruct function should stop the device from scanning. I'd like to keep the symmetry as in device construct / destruct, start / stop scanning. But then again we can just call sweep_device_stop_scanning in sweep_device_destruct:

  • in case the user already sent the stop command we exit early here anyway
  • in case the user did not send the stop command, we handle it for the user transparently

@kent-williams was mentioning that the serial port needs to be flushed on destruction, which I don't think we're doing currently.

Oh? This is good to know! If so we should do this in the destruct function here where we still have access to the serial port. Check the serial interface for a flush function.

To summarize:

  • Let sweep_device_destruct call sweep_device_stop_scanning before destructing the device
  • Flush the serial device before destructing it in sweep_device_destruct. Should this be done in sweep_serial_device_destruct? Strictly speaking flushing the device is the Sweep driver's and not the serial driver's responsibility.
  • Adapt examples (and bindings) to still call sweep_device_stop_scanning to show user API symmetry
  • Verify with device and check if this fixes your issue

@daniel-j-h
Copy link
Collaborator

The new units just arrived and I could reproduce the issue there!
For the record the old unit's serial no. was in the two digit range, this is in the three digit range.

I created a pull request fixing this issue and could confirm it working now on the recent units I have here.

@dcyoung
Copy link
Author

dcyoung commented Jan 25, 2017

Looks good, I was using the same internal fix myself. This resolves issues with subsequent runs. This issue can be effectively closed when the PR is merged.

The example-c++ and example-viewer are still throwing an occasional (read ~10% of the time) error...

unable to receive stop scanning command response

However, this is being thrown on program exit, not startup. From some early debugging, it looks like this is because the sleep duration in the sweep_device_stop_scanning method is occasionally too short. Increasing it a bit resolves the issue, but is NOT the ideal fix. This is related to a different issue, continue it here #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants