Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

Open
fedecerno opened this issue Aug 29, 2024 · 4 comments
Open

Question About Initial Input Seeds for Fuzzing Libraries in OSS-Fuzz #12422

fedecerno opened this issue Aug 29, 2024 · 4 comments

Comments

@fedecerno
Copy link

I have some questions regarding the initial input seeds used for fuzzing these libraries.
The libraries in question are binutils, cairo, libzip, llvm, mupdf, and sqlite3.
I would like to know:

  1. Are the initial input seeds used by OSS-Fuzz manually created by humans, or are they generated through fuzzing campaigns?

  2. If there are human-made initial input seeds, can the most recent versions be accessed?

  3. Alternatively, if only seeds from fuzzing campaigns are available, could I obtain the oldest initial input seeds that have undergone the fewest fuzzing iterations?

@maflcko
Copy link
Contributor

maflcko commented Aug 29, 2024

The initial seeds are usually added in the build.sh of that project. For example:

zip -j "${OUT}/clang-objc-fuzzer_seed_corpus.zip" $SRC/$LLVM/../clang/tools/clang-fuzzer/corpus_examples/objc/*
zip -j "${OUT}/clangd-fuzzer_seed_corpus.zip" $SRC/$LLVM/../clang-tools-extra/clangd/test/*
zip -j "${OUT}/clang-fuzzer_seed_corpus.zip" $SRC/llvm-project/clang/test/Parser/*.cpp

If you want to go back in time, you'll have to follow the git history of the build.sh, or the corresponding source of the inputs.

@DavidKorczynski
Copy link
Collaborator

Could you clarify what you mean by "created by humans"? I think for each OSS-Fuzz project there has been a human involved in setting up the seeds, however, there is perhaps a spectrum of involvement, e.g. whether the seed files were pre-existing and just copied out to the harness corpus folder, whether there were some involvement e.g. finding relevant pre-existing images that can be used as seeds, whether a human actively assembled the seeds in a programmatic manner like structured generation or whether the human assembled a given seed file byte-by-byte manually.

I don't think there are any cases of the latter, but there are many different variations of the three former -- do they constitute "created by humans" though?

There are no initial input seeds used by OSS-Fuzz that are "generated through fuzzing campaigns", at least not OSS-Fuzz running it -- it may be that a developer has run things locally and uploaded it, and that's not something OSS-Fuzz maintainers would be keeping track of.

@fedecerno
Copy link
Author

By "created by humans," I mean that there was human involvement in creating the seeds, as opposed to the seeds simply resulting from a series of fuzzing campaigns where the "best" seeds from one campaign are taken as input for the next.

I think the last part of your message has clarified my doubts — OSS-Fuzz uses seeds uploaded by developers rather than seeds generated through automatically run fuzzing campaigns.
That said, it's possible that the input seeds currently being used result from prior fuzzing campaigns, but there's no way to know for sure.

@DavidKorczynski
Copy link
Collaborator

as opposed to the seeds simply resulting from a series of fuzzing campaigns where the "best" seeds from one campaign are taken as input for the next.

OSS-Fuzz naturally saves the corpus generated and carries it forward in iterations, which as far as I can tell is what you describe here. OSS-Fuzz also does corpus minimization to "narrow down the corpus to a set of optimal inputs" -- but that is all done by https://github.com/google/clusterfuzz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants