Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP2 connection not being reset when error received #1401

Open
davidmytton opened this issue Aug 24, 2024 · 4 comments
Open

HTTP2 connection not being reset when error received #1401

davidmytton opened this issue Aug 24, 2024 · 4 comments
Assignees
Labels
bug Something isn't working node Related to the Node SDK.

Comments

@davidmytton
Copy link
Contributor

A user is receiving various HTTP2 errors e.g. {"error":{"message":"[internal] Stream closed with error code NGHTTP2_ENHANCE_YOUR_CALM"}} and {"error":{"message":"[internal] Stream closed with error code NGHTTP2_INTERNAL_ERROR"}} for every call to decide, which is then being logged via report.

I'm unsure about the cause of this, but it may be a timeout or error on a previous request. This is referenced in various places, but all suggest fixes have already gone out:

The only way to resolve this is to restart the Node process, which isn't ideal. When we get these errors, can we re-establish the connection?

@davidmytton davidmytton added bug Something isn't working node Related to the Node SDK. labels Aug 24, 2024
@davidmytton davidmytton changed the title Connection not being reset when error received HTTP2 connection not being reset when error received Aug 25, 2024
@blaine-arcjet
Copy link
Contributor

This is a message the load balancer sends back to the client. It can be handled in the client via killing the entire session and reopening from scratch, but we don't have control over that with ConnectRPC. This is something we can revisit when we write our own session manager, but in the meantime @davidmytton can look into the load balancer configuration.

@davidmytton
Copy link
Contributor Author

The load balancer isn't under our control - it's from Fly.io. I've opened a support case with them.

@davidmytton
Copy link
Contributor Author

Fly have now confirmed their proxy may return NGHTTP2_ENHANCE_YOUR_CALM in some circumstances. I'm still working with them to figure out why. I expect it's to do with the network jitter and tail latency we're seeing in yul and yuz whereby enough timeouts cause it to hit an error threshold.

@davidmytton
Copy link
Contributor Author

davidmytton commented Aug 31, 2024

We have had the high tail latency (network jitter) support case open with Fly.io for over a week. The tail latency causes internal slowdowns in our API which results in a timeout on the client, and I think goes back to the Fly proxy as a reset. Given enough timeouts / resets (1024 by default), this causes the Fly proxy to trigger ENHANCE_YOUR_CALM (here) which node reports as NGHTTP2_ENHANCE_YOUR_CALM.

This is what hyperium/h2 describes at:

https://github.com/hyperium/h2/blob/90359ba6d38843b106967a6ac9419a500ea26873/src/server.rs#L892

However, that error isn't being handled properly by the client - it continues to send requests on the same connection. What we see is the Arcjet SDK client connects to our API endpoint on Fly, Fly proxy proxies it our app, the API processes the request normally and returns the response, the client gets NGHTTP2_ENHANCE_YOUR_CALM. It never sees the response we send.

Some additional notes:

  • Fly report they're receiving a reset - could this be the connection being reset due to the timeout? See the stream tracking logging below (from here in hyperium/h2)
  • The full error that the Arcjet client sees is [internal] Stream closed with error code NGHTTP2_INTERNAL_ERROR
  • When it receives that error it should end the stream

https://github.com/connectrpc/connect-es/blob/dc7d75a1eaaac12a06b4dcffd0cfd370c5432648/packages/connect-node/src/node-universal-handler.spec.ts#L122

https://github.com/nodejs/node/blob/2f0b3713efd1f8f06a38115393cee8531803ce43/lib/internal/errors.js#L1307

  • Which I think is then coming into connect-es here:

https://github.com/connectrpc/connect-es/blob/dc7d75a1eaaac12a06b4dcffd0cfd370c5432648/packages/connect-node/src/node-error.ts#L22

Stream tracking

Aug 31, 2024 @ 19:59:04.147141000
received Reset { stream_id: StreamId(1306619), error_code: ENHANCE_YOUR_CALM }

Aug 31, 2024 @ 19:59:04.146676000
send Data { stream_id: StreamId(1306619), flags: (0x1: END_STREAM) }

Aug 31, 2024 @ 19:59:04.146645000
send Headers { stream_id: StreamId(1306619), flags: (0x4: END_HEADERS) }

Aug 31, 2024 @ 19:59:04.142444000
received Data { stream_id: StreamId(1306619), flags: (0x1: END_STREAM) }

Aug 31, 2024 @ 19:59:04.141560000
received Data { stream_id: StreamId(1306619) }

Aug 31, 2024 @ 19:59:04.141500000
received Headers { stream_id: StreamId(1306619), flags: (0x4: END_HEADERS) }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working node Related to the Node SDK.
Projects
None yet
Development

No branches or pull requests

2 participants