-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmatched responses due to new contexts and other causes #44
Comments
…eb worker and service worker requests. Fixes #44
Thank you for such a detailed report! #46 fixes issues with service worker, web worker are some of the "other" issues. As you guessed, attaching takes time so I'm now forcing new contexts to wait for the debugger. This didn't work for popups for some reason, I'll create a follow up task to look at popups separately. In case of ca.gov there is no missing request, but rather we are seeing an "unmatched response" event for a script created dynamically by https://translate.googleapis.com/element/TE_20210224_00/e/js/element/element_main.js . I have no idea why this produces a network request. You can debug cases like that by fetching request body for them ( |
Thanks so much! |
…eb worker and service worker requests. Fixes #44
Thanks again @gunesacar for reporting this! I added integration test that checks some of those edge cases (workers) - https://github.com/duckduckgo/tracker-radar-collector/blob/main/tests/integration/requestCollection.test.js and created a follow up to look into popups in #47 . Please let me know if I missed something. |
Hi!
Going through a crawl data of Tranco top 10K sites, I found ~15-16% websites have at least one unmatched response. Almost half of these websites have some new contexts (e.g. service workers) initialized during the page visit.
Just to be clear, compared to total requests/responses captured by Tracker Radar Collector, the unmatched ones seem to be rare occurrences. Also, this may be an upstream issue with Puppeteer, but I couldn't figure out how to find out. I wanted to give a heads up anyways.
Here's the breakdown of unmatched responses based on new contexts initialized on websites:
Service workers
770 (of the ~10K) sites have at least one service worker context initiated. 660 (86%) of these have at least one unmatched response. The unmatched response seems to come immediately after the service worker context initialization. Spot checking the URLs they look like the requests to download the service worker script itself:
Reproducible on: pptr.dev, whatwg.org, coinbase.com
Web workers
81 sites have unmatched responses and
(other)
context initiated. For a few of these scripts I checked the devtools on a headful browser and verified that they are web worker scripts:Reproducible on: fivethirtyeight.com, venturebeat.com, marieclaire.com, twitch.tv
Popups
On a handful of sites unmatched responses seem to happen due to popup windows. To my surprise it turns out Puppeteer does not block (some?) popup windows as headful Chromium does. (Perhaps due to this open issue: puppeteer/puppeteer#6161)
Reproducible on: naukri.com, see screenshots below:
Headful Chrome
Tracker Radar Collector (with
VISUAL_DEBUG=true
)Popup windows are represented as
(page) context
in the logs:Other (no new contexts)
345 sites has no popups or new contexts such as
service_worker
,shared_worker
,(other)
, but they have unmatched responses.Reproducible on: ca.gov, 9gag.com, bbc.co.uk
I had the suspicion that attaching to targets may take a while for new contexts, during which some requests may be missed.
But that doesn't explain why requests are missed on websites with no new contexts.
Anyhow, hope this helps. If you need any other info, or raw crawl data, just let me know.
The text was updated successfully, but these errors were encountered: