Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHDB scraper produces inaccurate output #96

Open
marz-hunter opened this issue Feb 3, 2024 · 7 comments
Open

GHDB scraper produces inaccurate output #96

marz-hunter opened this issue Feb 3, 2024 · 7 comments

Comments

@marz-hunter
Copy link

I found that the output from ghdb scraper was not precise. for example the title gives "Google Dork" but when viewed it produces site:".edu" intitle:"index of"|".db" and the tool saves the output "Google Dork" instead of site:".edu" intitle:" index of"|".db"

bandicam.2024-02-03.13-23-30-421.mp4
@marz-hunter
Copy link
Author

output tools. I think crawling a url like https://www.exploit-db.com/ghdb/8389 would be more accurate although it would take a little longer
output

@marz-hunter
Copy link
Author

crawling and then taking this part will be more accurate
get

@opsdisk
Copy link
Owner

opsdisk commented Feb 3, 2024

Hi @marz-hunter - thanks for opening an issue. In the past, I've also noticed the data isn't structured as precisely as it should be by exploit-db.com. I vaguely recall thinking exploit-db.com needs to clean up the data instead of wanting to handle edge cases or invest any more time in the script. Give me a week or two to dig deeper into it though.

In the mean time, you could reach out to them and see if they could clean up the ones you found. Email can be found here: https://www.exploit-db.com/submit

image

@opsdisk
Copy link
Owner

opsdisk commented Feb 3, 2024

Just found this as well that could be used https://gitlab.com/exploit-database/exploitdb/-/blob/main/ghdb.xml

Mind checking it for the same dependencies you found?

Edit: You may be able to submit a PR against it for any ones you find as well. I'm guessing that is what powers https://www.exploit-db.com/google-hacking-database

@marz-hunter
Copy link
Author

yeah this one looks pretty good https://gitlab.com/exploit-database/exploitdb/-/blob/main/ghdb.xml, but updates seem to take a while

@opsdisk
Copy link
Owner

opsdisk commented Feb 11, 2024

I'll keep this issue for a while to see if they update it. Let me know if you do submit a PR against https://gitlab.com/exploit-database/exploitdb/-/blob/main/ghdb.xml

@opsdisk
Copy link
Owner

opsdisk commented Apr 11, 2024

Just wanted to check back on this one @marz-hunter Did you have any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants