Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device 697EE stops updating on AWS #76

Open
jlandau10 opened this issue Mar 4, 2020 · 8 comments
Open

Device 697EE stops updating on AWS #76

jlandau10 opened this issue Mar 4, 2020 · 8 comments

Comments

@jlandau10
Copy link
Contributor

Figure out why it stops updating. It looks like the RTC isn't changing properly.

at 4:43pm pst 3/3/20: tried to connect bluetooth uncredentialled to get a print out waited a few minutes, tried again:

{2020-03-03T23:53:20Z INFO src\Bluetooth.cpp:268 GAP_ConnectionComplete_CB , device : 77-1C-8D-CC-1C-5E }
{2020-03-03T23:53:20Z INFO src\Bluetooth.cpp:274 GAP_DisconnectionComplete_CB , _ : BLE Disconnected }
{2020-03-03T23:53:20Z INFO src\Bluetooth.cpp:268 GAP_ConnectionComplete_CB , device : 77-1C-8D-CC-1C-5E }
{2020-03-03T23:53:20Z INFO src\Bluetooth.cpp:274 GAP_DisconnectionComplete_CB , _ : BLE Disconnected

23:53 is 3:53 PST.

Last AWS update:

Shadow update accepted Mar 3, 2020 3:49:57 PM -0800

Disconnected:

Thing disconnected Mar 3, 2020 4:22:50 PM -0800

{
"clientId": "0123CCBCCC98B697EE",
"timestamp": 1583281370574,
"eventType": "disconnected",
"clientInitiatedDisconnect": false,
"sessionIdentifier": "bc0979f4-4110-4254-b3bd-be1174219d8a",
"principalIdentifier": "2393989bc18ff0ef0a7670cd4d519ac712dd0ece790ac91fe1240c071f14e170",
"disconnectReason": "MQTT_KEEP_ALIVE_TIMEOUT",
"versionNumber": 5
}

Makes sense that it would forget to keep alive if it doesn't know what time it is.

@jlandau10
Copy link
Contributor Author

added a bunch of print logs at each keep time loop to determine when it dies. unfortunately now i have to sit here and wait and it will probably not even happen this time.

Been going well for half an hour or so

10:16:11.604 -> {2020-03-04T18:16:11Z INFO src\\System.cpp:279 keepTime , _ : internet time }
10:16:11.641 -> {2020-03-04T18:16:11Z INFO src\\System.cpp:284 keepTime , _ : keep time }
10:16:12.688 -> {2020-03-04T18:16:12Z INFO src\\System.cpp:276 keepTime , _ : 10 mod has remainder, millis }
10:16:12.891 -> {2020-03-04T18:16:12Z INFO src\\System.cpp:284 keepTime , _ : keep time }
10:16:13.951 -> {2020-03-04T18:16:13Z INFO src\\System.cpp:276 keepTime , _ : 10 mod has remainder, millis }
10:16:13.951 -> {2020-03-04T18:16:13Z INFO src\\System.cpp:284 keepTime , _ : keep time }

@jlandau10
Copy link
Contributor Author

It disconnected from AWS but not the same condition: then was basically stuck not updating it's time ever, until it crashed:

Thing disconnected Mar 4, 2020 4:23:06 PM -0800

{
  "clientId": "0123CCBCCC98B697EE",
  "timestamp": 1583367786066,
  "eventType": "disconnected",
  "clientInitiatedDisconnect": false,
  "sessionIdentifier": "a2a3676e-17f6-4d14-bf3c-9550e28086c0",
  "principalIdentifier": "2393989bc18ff0ef0a7670cd4d519ac712dd0ece790ac91fe1240c071f14e170",
  "disconnectReason": "CONNECTION_LOST",
  "versionNumber": 9
}
16:31:59.023 -> cede
16:31:59.023 -> 1ff9a
16:31:59.023 -> 1ff9a
16:31:59.023 -> cfd4
16:31:59.023 -> 1067c
16:31:59.023 -> cfb6
16:31:59.161 -> ret: 1,j: 72

@jlandau10
Copy link
Contributor Author

another crash:

16:38:03.972 -> cede
16:38:03.972 -> cfd4
16:38:03.972 -> 1067c
16:38:03.972 -> cfb6
16:38:03.972 -> 107fa
16:38:03.972 -> d284
16:38:04.074 -> ret: 1,j: 72

and again on the mqtt connect:

16:38:29.791 -> {2020-03-05T00:38:28Z INFO src\\Cellular.cpp:139 connect , carrier : AT&T Sierra Wireless , connectTime :7292, signal :99}
16:38:29.827 -> {2020-03-05T00:38:28Z DEBUG src\\Mqtt.cpp:96 connect , signal :99}
16:38:29.827 -> {2020-03-05T00:38:28Z INFO src\\Mqtt.cpp:101 connect , broker : a2ink9r2yi1ntl-ats.iot.us-east-2.amazonaws.com }
16:42:47.714 -> shutdown() ret: 1
16:42:47.749 -> cede
16:42:47.749 -> cfd4
16:42:47.749 -> 1067c
16:42:47.749 -> cfb6
16:42:47.749 -> 107fa
16:42:47.749 -> d284
16:42:47.888 -> ret: 1,j: 72

this is not the same failure condition.

trying a hard reset

@jlandau10
Copy link
Contributor Author

still timing out watchdog on the mqtt connect.

16:47:12.255 -> {2020-03-05T00:47:11Z INFO src\\Cellular.cpp:139 connect , carrier : AT&T Sierra Wireless , connectTime :8559, signal :18}
16:47:12.290 -> {2020-03-05T00:47:11Z DEBUG src\\Mqtt.cpp:96 connect , signal :18}
16:47:12.324 -> {2020-03-05T00:47:11Z INFO src\\Mqtt.cpp:101 connect , broker : a2ink9r2yi1ntl-ats.iot.us-east-2.amazonaws.com }
16:51:30.137 -> shutdown() ret: 1
16:51:30.205 -> cede
16:51:30.205 -> cfd4
16:51:30.205 -> 1067c
16:51:30.205 -> cfb6
16:51:30.205 -> 107fa
16:51:30.205 -> d284
16:51:30.306 -> ret: 1,j: 72

the connection isnt failing, it's not refused, it's just nothing about 4 and half minutes in to the attempt. changing the WDT to 8M.

@jlandau10
Copy link
Contributor Author

and now it dies at 8M. so confirmed it's the WDT. EEProm isn't corrupted as it says the right url

17:42:17.659 -> {2020-03-05T01:42:17Z INFO src\\Cellular.cpp:139 connect , carrier : AT&T Sierra Wireless , connectTime :8712, signal :18}
17:42:17.693 -> {2020-03-05T01:42:17Z DEBUG src\\Mqtt.cpp:96 connect , signal :18}
17:42:17.693 -> {2020-03-05T01:42:17Z INFO src\\Mqtt.cpp:101 connect , broker : a2ink9r2yi1ntl-ats.iot.us-east-2.amazonaws.com }
17:50:54.074 -> shutdown() ret: 1
17:50:54.142 -> cede
17:50:54.142 -> cfd4
17:50:54.142 -> 1067c
17:50:54.142 -> cfb6
17:50:54.142 -> 107fa
17:50:54.142 -> d284
17:50:54.247 -> ret: 1,j: 72

@jlandau10
Copy link
Contributor Author

hasn't been connected to aws to update the shadow since 3/5.

Here is the "i don't know what time it is" printout:

16:34:31.950 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:279 keepTime , _ : internet time }
16:34:31.950 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:284 keepTime , _ : keep time }
16:34:32.917 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:279 keepTime , _ : internet time }
16:34:32.917 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:284 keepTime , _ : keep time }
16:34:33.881 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:279 keepTime , _ : internet time }
16:34:33.881 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:284 keepTime , _ : keep time }
16:34:34.848 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:279 keepTime , _ : internet time }
16:34:34.883 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:284 keepTime , _ : keep time }
16:34:35.810 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:279 keepTime , _ : internet time }
16:34:35.844 -> {2020-03-06T02:05:40Z INFO src\\System.cpp:284 keepTime , _ : keep time }

changed time keeping code to:

  if (time % 10 != 0) {
    keepTimeWithMillis();
    logInfo("10 mod has remainder, millis");
  } else if (Internet.isConnected()) {
    setTimes(Internet.getTime());
    logInfo("internet time",Internet.isConnected(),Internet.getTime());
  } else if (!Gps.poll()) {
    keepTimeWithMillis();
    logInfo("no gps poll, milis");
  }
  logInfo("keep time");
}

I think that if internet.getTime() returns 0 we need to revert to keepTimeWithMillis();

@jlandau10
Copy link
Contributor Author

Okay, so there are a couple issues.

  1. it takes too long to connect and triggers the wdt. this occurs at flag marker 2.1 somewhere in the bearssl i think.
  2. it loses time and so it doesn't do heartbeats and times out.
  3. after a reset, it is still connected and the new connection is refused for duplicate client ID.

isolating problem 2 by skipping the mqtt aspect. got it to get stuck on a time, always doing internet time, not getting a new time.

{2020-03-18T21:31:40Z INFO src\\System.cpp:279 keepTime , internet time : ␡ }
{2020-03-18T21:31:40Z INFO src\\System.cpp:284 keepTime , _ : keep time }
{2020-03-18T21:31:40Z INFO src\\Main.cpp:```

How does this happen?

@jlandau10
Copy link
Contributor Author

never lose time with just gps and the cell antenna unplugged. possible that this is a second damaged modem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant