Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reqwest seems to hang around after instrumented function is done #3090

Open
SorenHolstHansen opened this issue Sep 26, 2024 · 2 comments
Open

Comments

@SorenHolstHansen
Copy link

SorenHolstHansen commented Sep 26, 2024

Bug Report

Version

reqwest_tracing_bug v0.1.0
│   │   └── tracing v0.1.40
│   │       ├── tracing-attributes v0.1.27 (proc-macro)
│   │       └── tracing-core v0.1.32
│   │   │   └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
│   │       │   └── tracing v0.1.40 (*)
│   │       └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
├── tracing v0.1.40 (*)
├── tracing-opentelemetry v0.26.0
│   ├── tracing v0.1.40 (*)
│   ├── tracing-core v0.1.32 (*)
│   ├── tracing-log v0.2.0
│   │   └── tracing-core v0.1.32 (*)
│   └── tracing-subscriber v0.3.18
│       ├── tracing v0.1.40 (*)
│       ├── tracing-core v0.1.32 (*)
│       └── tracing-log v0.2.0 (*)
└── tracing-subscriber v0.3.18 (*)

Platform

Darwin Srens-MacBook-Air.local 23.6.0 Darwin Kernel Version 23.6.0: Wed Jul 31 20:53:05 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T8112 arm64

Description

We have noticed a weird situation in our production traces for our axum server where sometimes an endpoint, even if the endpoint makes a response very quickly, that the traces say that the endpoint took a very long time. This doesn't happen all the time, but when it happens, it is always related to reqwest calls (i.e. the reqwest crate).

A trace could look something like this (this is how it looks in our prod metrics platform in azure, you will see below that another platform shows the endpoint as shorter)

endpoint:     -------------
something:    --
reqwest_call:   -----------
stuff_after:       --
response:           --

that is, the response is done quite quickly, but because the reqwest call seems to hang, the whole endpoint shows as taking a long time. Now, the reqwest calls actually all finish very quickly. You can see that the "stuff_after" continues shortly after, and we have done other approaches to show that they actually finish fast. So something is hanging in the reqwest client that makes this show as taking a long time.

We even added a field to the root request span (using tower tracing) that shows that the endpoint is done quickly. So from the users point of view, nothing is taking a long time, but for us, we can't trust our tracing.

I made a reproducible case here: https://github.com/SorenHolstHansen/reqwest_tracing_bug

To reproduce, I started a local open telemetry platform (signoz or jaeger), started the server in the repo i linked, and then ran seq 1 5000 | xargs -P0 -I{} curl http://127.0.0.1:8008

In signoz, you should then be able to see cases like these:
Screenshot 2024-09-26 at 12 18 23

You can see in the bottom right, that the endpoint took 0 seconds, but resolve took 35 seconds.

I am not sure if this is strictly the fault of tracing, or something weird happens internally in reqwest or somewhere else.

@SorenHolstHansen SorenHolstHansen changed the title reqwest seems to hand around after instrumented function is done reqwest seems to hang around after instrumented function is done Sep 26, 2024
@mladedav
Copy link
Contributor

I believe stuff like this happens when something holds onto a span longer than expected. I know this happened to me with h2 and I think it could be a cause here since reqwest pools connections.

Basically the connection has a span that is a child of your endpoint span and thus keeps it alive.

You can try doing something like let _guard = info_span!(parent :none "dummy").entered(); just before doing the reqwest call (and maybe dropping the guard right after). That should fix the endpoint span but you'll no longer see the reqwest span in your trace.

Otherwise just filtering reqwest out should also fix your trace length.

I don't think this can really be fixed in tracing, tracing itself can just give advise on whether libraries should or should not hold onto spans like this and better convey the nuances and what will the user see.

@SorenHolstHansen
Copy link
Author

Aha, that makes a lot of sense! Thanks for the quick answer, I will open an issue over there as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants