Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splash:go() does not always return the same HAR data #646

Closed
k-liao opened this issue Jul 12, 2017 · 1 comment
Closed

splash:go() does not always return the same HAR data #646

k-liao opened this issue Jul 12, 2017 · 1 comment

Comments

@k-liao
Copy link

k-liao commented Jul 12, 2017

When I call splash:go() using the url http://bit.ly/1dNVPAW and wait for 0.5 seconds using splash:wait(), I don't always get the same HAR entries (see screenshots). I'm interested in using splash:history() to build up a redirection chain that includes both JS and HTTP redirects from a given url, but I've found the HAR data to be inconsistent.

Is there a better/more reliable API call to help with building the redirection chain?

Here's the Lua script that I'm using:

function main(splash)
    assert(splash:go(splash.args.url))
    splash:wait{time=0.5}
    return {
    har = splash:har(),
    }
end

screen shot 2017-07-12 at 2 37 32 pm

screen shot 2017-07-12 at 2 37 44 pm

@nirvana-msu
Copy link

This is likely due to the fact that cached requests/responses do not appear in har/history. More discussion at scrapy-plugins/scrapy-splash#168. You could solve that problem by building a custom Splash image where caching is disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants