Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【疑似BUG】0.1.2 版本pretend.py 文件存在问题,导致采集失败 #29

Open
DeSireFire opened this issue Jul 22, 2021 · 0 comments

Comments

@DeSireFire
Copy link

DeSireFire commented Jul 22, 2021

部署新服务器的时候出现了问题。经过对比定位到了原因。
GerapyPyppeteer/gerapy_pyppeteer/pretend.py
使用 0.0.13版本正常代码如下
SET_WEBDRIVER = '''() => {Object.defineProperty(navigator, 'webdriver', {get: () => undefined})}'''
使用 0.1.2
其中第73行的SET_WEBDRIVER变量存在问题.请求某数时,被检测返回400.

测试代码:

import json
import os
import asyncio
import time

from pyppeteer import launch, connection
from pyppeteer import chromium_downloader
from gerapy_pyppeteer.pretend import SCRIPTS as PRETEND_SCRIPTS
from pyppeteer.network_manager import Response



async def main():
    browser = await launch({'headless': False, 'timeout': 10000, 'args': ['--no-sandbox', ]},)
    page = await browser.newPage()
    for script in PRETEND_SCRIPTS:
        await page.evaluateOnNewDocument(script)

    print(len(await browser.pages()))
    await page.goto(http://www.某个网址.com.cn/old_house/old_house.html') # 记得修改

    await page.waitForNavigation()


    await page.waitFor(10 * 1000)

    print(await page.evaluate("document.cookie"))
    print(f'等待url 完成')

    # await page.waitFor(10 * 1000)
    print(await page.content())

    await browser.close()



asyncio.get_event_loop().run_until_complete(main())

会拿到一个空白页

@DeSireFire DeSireFire changed the title 【BUG】0.1.2 版本pretend.py 文件存在问题,导致采集失败 【疑似BUG】0.1.2 版本pretend.py 文件存在问题,导致采集失败 Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant