Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

豆瓣限制一個 IP 一次至多抓10頁 #5

Open
gasolin opened this issue Dec 25, 2021 · 1 comment
Open

豆瓣限制一個 IP 一次至多抓10頁 #5

gasolin opened this issue Dec 25, 2021 · 1 comment

Comments

@gasolin
Copy link

gasolin commented Dec 25, 2021

今天試發現 豆瓣限制一個 IP 短時間內一次最多抓10頁,

稍微改了一下加入 pagination = 1 參數如下

def export(user_id):
    urls = url_generator(user_id)
    info = []
    pagination = 1
    page_no = pagination
    for idx, url in enumerate(urls, start=1):
        if idx < pagination:
            continue
        if IS_OVER:#or page_no == pagination + 5
            break
        print(f'开始处理第 {page_no} 页...')
...

調整 pagination 值, 搭配不同 VPN server 可以全抓下來

@gasolin gasolin changed the title 豆瓣限制一個 IP 一次抓10頁 豆瓣限制一個 IP 一次至多抓10頁 Dec 25, 2021
@niauah
Copy link

niauah commented Apr 24, 2022

My workaround is to automtically change IP whenever blocked.
I'm using windscribe VPN, following the instructions in the first part this article. Once manually logged in through terminal, I'm able to change IP by just one Windscribe command without any input prompted, which makes it easy to integrate into Python code.

Some pseudo-code snippets:

os.system("windscribe connect US")
try:
    get_info(url)
except TypeError:  # get_info(url) returns None when reached maximum request limit from same IP
    os.system("windscribe connect US")
    get_info(url)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants