Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"webpage fetching error for url" #39

Open
Kamran12646 opened this issue Nov 20, 2023 · 27 comments
Open

"webpage fetching error for url" #39

Kamran12646 opened this issue Nov 20, 2023 · 27 comments

Comments

@Kamran12646
Copy link

Hello dear community, Hello dear vinc3po,
Since 5 p.m. I've been getting the following error with the ebAlert bot: "webpage fetching error for url:"
Has the HTML structure of Kleinanzeigen really been changed again in this short time or is it due to my Raspberry Pi 4?

Hallo liebe Community, Hallo lieber vinc3po,
seit 17 Uhr bekomme ich beim ebAlert Bot den folgenden Fehler: "webpage fetching error for url:"
Wurde die HTML-Struktur von Kleinanzeigen in dieser kurzen Zeit nochmals abgeändert oder liegt es an meinem Raspberry Pi 4?

@makedamnsure
Copy link

Same for me @win10 since today.

@Pelicana
Copy link

Same issue here :) running locally on windows, so it shouldn't be your raspberry pi.

@Kamran12646
Copy link
Author

Did anyone find a solution yet?

@Pelicana
Copy link

nope, would love to know too.

@DanielZ3108
Copy link

Same issue here @ Intel NUC (Proxmox)

@henengel
Copy link

same for me on MacOS..

@vinc3PO
Copy link
Owner

vinc3PO commented Nov 26, 2023

Here is the hidden message:
'In deinem IP-Bereich kam es vor Kurzem mehrfach zu unsicheren Versuchen, unsere Plattform zu verwenden. Dies kann auch durch andere Personen erfolgt sein. Daher wurde dieser
IP-Bereich zur Vorbeugung von Betrug zeitweilig von der Nutzung von Kleinanzeigen ausgeschlossen. Bitte versuche es später erneut.'
It seems that they are implementing some sort of anti-scalping security.
I'll investigate if this can be bypassed somehow.

I'll keep you updated

@Araibona
Copy link

Gibt's was neues ?

@max49944
Copy link

leider noch nichts

@alafad
Copy link

alafad commented Dec 29, 2023

I've tried to change the User Agent Header, but unfortunately it doesn't was a solution..

@max49944
Copy link

I don't think there will be a solution anymore, unfortunately I think it's been put on ice.
Unfortunately no one cares about it

@workinghard
Copy link

Best advice is to fetch less frequently and random.

@alafad
Copy link

alafad commented Dec 30, 2023

I don't think there will be a solution anymore, unfortunately I think it's been put on ice. Unfortunately no one cares about it

I think there is not really a bug, it should be working again with other headers and in case a proxied requests by high frequently usage.

@vinc3PO
Copy link
Owner

vinc3PO commented Dec 30, 2023

Indeed, the problem is not the Header.
If you try with the same ip address with similar header in Postman it goes through.
However, with python requests is does not.
I have a old version that still works, but the new ones does not.
I tried downgrading to the version that works without success.
I have been quite busy to fully investigate the error.
My next try is to try without the requests library. As I believe that might be the problem, that ebay recognise it as a python as assume that is a bot.

To be continued...

@svenisda
Copy link

svenisda commented Feb 1, 2024

Hey man, did you found any solution to bypass the detection? In the past i made my own monitor which worked, I wanted to reactivate it today but only 403 and the same message.

I personally use discord for notifications and a simple proxy function to monitor multiple urls at once maybe this is a good functions for your one too.

Would share my code but it is really shitty because I’m not a good coder 😄

@Zippochonda
Copy link

I would also appreciate a solution, is there any way to help?

@svenisda
Copy link

svenisda commented Feb 2, 2024

I would also appreciate a solution, is there any way to help?

I will try today other scraping methods instead of bs4 if I find one I will reply. Maybe we all can connect an built a super Kleinanzeigen monitor 😁

@vinc3PO
Copy link
Owner

vinc3PO commented Feb 2, 2024

Good news everyone,

it seems that if you update the latest requests and urllib it would work.
requests=2.31.0
urllib3=2.2.0

Not sure how long it will take ebay to shut it down again.
But for the time being it should work

pip install requests urllib3 --update

Let me know if that works for you as well.

@beleza-pura
Copy link

For me it is still not working after the update of requests and urllib3. I was also experimenting with different headers, but no solution so far.

@beleza-pura
Copy link

I've been investigating a little. Looks like they introduced Akamai Bot Protection. Mainly there are two things keeping the bot from working properly:

⚠️ FingerprintJS detected: 
https://static.kleinanzeigen.de/static/js/top.yt20r2l2bahn.js

⚠️ Akamai detected: 
https://www.kleinanzeigen.de/akam/13/435259e7

More information about this bot protection you can find here. My quick workaround was to use the trial version of ZenRows API service to bypass anti-bot protection. But since in the future I will have to pay for it, I'll try to figure something out on my own.

@svenisda
Copy link

svenisda commented Feb 3, 2024

Found a very good workaround using playwright and Chromium. I can use it as a headless browser so it is working in the background and you don't have to pay any Akamai solving stuff.

I have some contacts which selling Akamai, px and other apis but why paying when playwright works 😊

@makedamnsure
Copy link

it seems that if you update the latest requests and urllib it would work.

For me this fixes the problem at my working machine (Win10 Home) but unfortunately not at my thin client (Win10 IoT LSCT 21H2).

@Zippochonda
Copy link

Good news everyone,

it seems that if you update the latest requests and urllib it would work. requests=2.31.0 urllib3=2.2.0

Not sure how long it will take ebay to shut it down again. But for the time being it should work

pip install requests urllib3 --update

Let me know if that works for you as well.

It doesn't work for me (RPi3), i used pip install --upgrade requests urllib3 to update. But i get still the webfetching error

 sudo python -m ebAlert links -a https://www.kleinanzeigen.de/s-dreibaum/k0
>> Adding url
<< webpage fetching error for url: https://www.kleinanzeigen.de/s-dreibaum/k0
<< Link and post added to the database

@DanielZ3108
Copy link

DanielZ3108 commented Feb 6, 2024

It worked with this command on a Proxmox Container:

It doesn't work for me (RPi3), I used pip install --upgrade requests urllib3 to update. But i get still the webfetching error

Bot is currently working again

Thanks!

@vinc3PO
Copy link
Owner

vinc3PO commented Feb 8, 2024

Bad News Everyone!!!

As noted by @beleza-pura it seems that ebay is starting a fight against bots.

I've been investigating a little. Looks like they introduced Akamai Bot Protection. Mainly there are two things keeping the bot from working properly:

⚠️ FingerprintJS detected: 
https://static.kleinanzeigen.de/static/js/top.yt20r2l2bahn.js

⚠️ Akamai detected: 
https://www.kleinanzeigen.de/akam/13/435259e7

More information about this bot protection you can find here. My quick workaround was to use the trial version of ZenRows API service to bypass anti-bot protection. But since in the future I will have to pay for it, I'll try to figure something out on my own.

In this case, the problem is not the Akamai bot protection the problem as the requests library can't perform those Javascript challenges.
However, this means they are actively trying to stop us from using bots. It seems that they have invested money and have tools that analysis traffic, learn from it and block what they suspect is a bot. That means that it is a matter of time before the new updated library get blacklisted and blocked again.

If you start using selenium or other library using virtual browser, then the Akamai bot will start learning your bot behaviour and eventually block it.

What does it mean?

It means that this simple bot will be soon archived. To counter the akamai bot a much larger project must be undertaken where the browser activity have to be randomized to act like human.

Thank you all!

@yamanatoo
Copy link

This currently works. But as you mentioned it might be countered in the future.
#41

@tchleb
Copy link

tchleb commented Jun 8, 2024

Since yesterday 12:20 i got the "webpage fetching error for url" errror. I tried #41 but it doesn't work.
@yamanatoo does your fix still work for you?

This are the error message:

Starting Ebay alert
Processing link - id: 1 - link: https://www.kleinanzeigen.de/s-test/k0
2024-06-08 21:07:10,903 - get_session in ebAlert.crud.base - ERROR - Message: session not created: Chrome failed to start: exited normally.
(session not created: DevToolsActivePort file doesn't exist)
(The process started from chrome location /snap/chromium/2873/usr/lib/chromium-browser/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55f7bce8e63a
#1 0x55f7bcb8f65c
#2 0x55f7bcbc3c95
#3 0x55f7bcbbff8f
#4 0x55f7bcc099a4
#5 0x55f7bcbfd313
#6 0x55f7bcbcd586
#7 0x55f7bcbcdefe
#8 0x55f7bce57b7f
#9 0x55f7bce5bd0a
#10 0x55f7bce459dc
#11 0x55f7bce5c491
#12 0x55f7bce2b7ee
#13 0x55f7bce7dc28
#14 0x55f7bce7de36
#15 0x55f7bce8d6f1
#16 0x7f174c46eac3

ERROR:ebAlert.crud.base:Message: session not created: Chrome failed to start: exited normally.
(session not created: DevToolsActivePort file doesn't exist)
(The process started from chrome location /snap/chromium/2873/usr/lib/chromium-browser/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55f7bce8e63a
#1 0x55f7bcb8f65c
#2 0x55f7bcbc3c95
#3 0x55f7bcbbff8f
#4 0x55f7bcc099a4
#5 0x55f7bcbfd313
#6 0x55f7bcbcd586
#7 0x55f7bcbcdefe
#8 0x55f7bce57b7f
#9 0x55f7bce5bd0a
#10 0x55f7bce459dc
#11 0x55f7bce5c491
#12 0x55f7bce2b7ee
#13 0x55f7bce7dc28
#14 0x55f7bce7de36
#15 0x55f7bce8d6f1
#16 0x7f174c46eac3

<< Ebay alert finished

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests