Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk-extractor generates SegFault midway through processing complex directory hierarchy with -R #396

Open
zdavatz opened this issue Apr 3, 2023 · 4 comments

Comments

@zdavatz
Copy link

zdavatz commented Apr 3, 2023

Test case:

./bulk_extractor -o out -R <directory>

output:

zsh: segmentation fault  bulk_extractor
@simsong
Copy link
Owner

simsong commented Apr 3, 2023

Hi. We are thankful for your bug report.

First, you should know that bulk_extractor has a restart system. You can simply re-run the command (hit up-arrow and return) and the program will continue where it left off, avoiding the data that made it crash.

Second, we are thrilled to get your data. If you don't want to post a link here, please email me at [email protected].

Third, if you want to try to debug this yourself, we have instructions for how to do that:

https://github.com/simsong/bulk_extractor/tree/main/doc/Diagnostic_Notes

Thanks again!

@zdavatz
Copy link
Author

zdavatz commented Apr 4, 2023

All the report files, that where generated after rerunning the command. report.xml.tar.gz

@simsong
Copy link
Owner

simsong commented Apr 5, 2023

I've obtained the .tar file of the directory and could process it without error, so the problem appears to be the directory iterator. If I can replicate this, I'll create a test case for the iterator that makes a large number of files in the directory tree and tries to process them.

@simsong simsong changed the title Bulk-extractor is giving me a SegFault midway Bulk-extractor generates SegFault midway through processing complex directory hierarchy with -R Apr 5, 2023
@simsong
Copy link
Owner

simsong commented Jan 25, 2024

Okay. I'm turning my attention back to this but find that the I cannot reproduce it. Do you have a dataset I can use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants