Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After an hour without use, client instances start spontaneously logging error messages related to OAUTHBEARER invalid tokens #49

Open
lpsinger opened this issue Dec 4, 2024 · 0 comments · May be fixed by #22
Labels
bug Something isn't working

Comments

@lpsinger
Copy link
Member

lpsinger commented Dec 4, 2024

Current behavior

If you create a client object but do not call any transport methods (poll, consume, produce) for an hour or more, then the Kafka client neglects to renew the access token JWT. Once the original JWT expires, the Python console spontaneously starts printing:

%4|1733276150.871|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (300000ms) exceeded by 72ms (adjust max.poll.interval.ms for long-running message processing): leaving group
%6|1733279074.050|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap]: sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap: Disconnected (after 20ms in state AUTH_HANDSHAKE)
%3|1733279074.050|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap]: sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap: SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) (after 0ms in state DOWN)
%6|1733279074.077|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka1.test.gcn.nasa.gov:9092: Disconnected (after 17ms in state AUTH_HANDSHAKE)
%3|1733279074.077|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka1.test.gcn.nasa.gov:9092: SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) (after 0ms in state DOWN)
%3|1733279433.943|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka1.test.gcn.nasa.gov:9092: SASL OAUTHBEARER mechanism handshake failed: Broker: Request not valid in current SASL state: broker's supported mechanisms:  (after 28ms in state AUTH_HANDSHAKE)
%3|1733279433.945|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap]: sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap: SASL OAUTHBEARER mechanism handshake failed: Broker: Request not valid in current SASL state: broker's supported mechanisms:  (after 30ms in state AUTH_HANDSHAKE)
%3|1733279433.946|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2]: sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2: SASL OAUTHBEARER mechanism handshake failed: Broker: Request not valid in current SASL state: broker's supported mechanisms:  (after 31ms in state AUTH_HANDSHAKE)
%6|1733279434.146|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka1.test.gcn.nasa.gov:9092: Disconnected (after 15ms in state UP)
%6|1733279434.163|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1]: sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1: Disconnected (after 22ms in state UP)
%6|1733279434.346|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1]: sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1: Disconnected (after 16ms in state UP, 1 identical error(s) suppressed)
%6|1733279434.368|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap]: sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap: Disconnected (after 143ms in state UP)
%6|1733279434.386|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka1.test.gcn.nasa.gov:9092: Disconnected (after 76ms in state UP, 1 identical error(s) suppressed)
%6|1733279434.502|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2]: sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2: Disconnected (after 46ms in state UP)
%6|1733279434.521|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka3.test.gcn.nasa.gov:9092/3]: sasl_ssl://kafka3.test.gcn.nasa.gov:9092/3: Disconnected (after 34ms in state UP)
%6|1733279434.560|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap]: sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap: Disconnected (after 18ms in state UP, 1 identical error(s) suppressed)
%4|1733279434.690|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2]: sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2: Disconnected (after 14ms in state UP, 1 identical error(s) suppressed)
%6|1733279434.714|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka3.test.gcn.nasa.gov:9092/3]: sasl_ssl://kafka3.test.gcn.nasa.gov:9092/3: Disconnected (after 20ms in state UP, 1 identical error(s) suppressed)
%3|1733279467.855|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1]: sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1: SASL authentication error: {"status":"invalid_token"} (after 387ms in state AUTH_REQ)
%3|1733279469.507|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: kafka1.test.gcn.nasa.gov:9092: SASL authentication error: {"status":"invalid_token"} (after 356ms in state AUTH_REQ)
%3|1733279470.109|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap]: sasl_ssl://kafka.test.gcn.nasa.gov:9092/bootstrap: SASL authentication error: {"status":"invalid_token"} (after 287ms in state AUTH_REQ)
%3|1733279470.579|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2]: sasl_ssl://kafka2.test.gcn.nasa.gov:9092/2: SASL authentication error: {"status":"invalid_token"} (after 465ms in state AUTH_REQ)
%3|1733279471.612|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka3.test.gcn.nasa.gov:9092/3]: sasl_ssl://kafka3.test.gcn.nasa.gov:9092/3: SASL authentication error: {"status":"invalid_token"} (after 462ms in state AUTH_REQ)
%3|1733279477.909|FAIL|rdkafka#consumer-1| [thrd:sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1]: sasl_ssl://kafka1.test.gcn.nasa.gov:9092/1: SASL authentication error: {"status":"invalid_token"} (after 437ms in state AUTH_REQ, 1 identical error(s) suppressed)

From then on, the last 6 lines are repeated once a second, and at that point the Kafka server log starts to fill up with the expired JWT messages.

On the server side, we are flooded with these error messages once per second:

[2024-11-08 11:19:59,871] WARN Could not validate the access token: JWT (claims->
{
    "sub": "(redacted)",
    "token_use": "access",
    "scope": "gcn.nasa.gov/kafka-public-consumer",
    "auth_time": 1727363658,
    "iss": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_r6fgBMOKr",
    "exp": 1727367258,
    "iat": 1727363658,
    "version": 2,
    "jti": "(redacted)",
    "client_id": "(redacted)"
}
) rejected due to invalid claims or other invalid content. Additional details: [[1] The JWT is no longer valid - the evaluation time NumericDate{1731064799 -> Nov 8, 2024, 11:19:59 AM UTC} is on or after the Expiration Time (exp=NumericDate{1727367258 -> Sep 26, 2024, 4:14:18 PM UTC}) claim value (even when providing 30 seconds of leeway to account for clock skew).] (org.apache.kafka.common.security.oauthbearer.OAuthBearerValidatorCallbackHandler)

Expected behavior

The Kafka client should automatically renew the JWT in a background thread. The client should never send expired JWTs to the server.

Steps to reproduce

  1. Execute our Python client sample code (https://gcn.nasa.gov/docs/client#python) in an interactive Python console.
  2. After a few seconds, type Control-C to interrupt the while loop in the sample code. The client is now still connected, but not polling regularly.
  3. Wait an hour for the original JWT to expire.

Environment

Any

@lpsinger lpsinger added the bug Something isn't working label Dec 4, 2024
lpsinger added a commit to lpsinger/gcn-kafka-python that referenced this issue Dec 4, 2024
lpsinger added a commit to lpsinger/gcn-kafka-python that referenced this issue Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant