Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage Guided / Mutation Based Fuzzing #687

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from
Draft

Coverage Guided / Mutation Based Fuzzing #687

wants to merge 21 commits into from

Conversation

d-xo
Copy link
Contributor

@d-xo d-xo commented Jul 2, 2021

Still very much WIP (and currently based off a pretty outdated commit).

This PR introduces a coverage guided / mutation based approach to fuzzing. All fuzz tests are now run with coverage enabled. We store each path seen, along with the input that we used to get there. When generating calldata for a fuzz run, we sometimes use the existing strategy (random), and sometimes choose instead to mutate one of the examples from the corpus of visited paths.

This should hopefully help us get even deeper inside complex contracts.

Still TODO:

  • rebase on master
  • general tidyup
  • generate coverage reports
  • use the symbolic execution engine to prefill the corpus with interesting calldata
  • apply a weighting to some examples (e.g. paths where the symbolic execution engine timed out can be targeted for fuzzing)

@@ -134,6 +143,10 @@ data AbiType
| AbiTupleType (Vector AbiType)
deriving (Read, Eq, Ord, Generic)

instance ToJSON AbiType
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these ToJSON/FromJSON instances are not neeeded anymore, they're just leftover from a time when I was serializing the corpus as json.

@@ -101,6 +106,12 @@ data TestVMParams = TestVMParams
, testChainId :: W256
}

-- | For each tuple of (contract, method) we store the calldata required to
-- reach each known path in the method
type Corpus = Map (W256,Text) (Map [(W256, Int)] AbiValue)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W256 is the codehash


type TraceIdState = (VM, [(W256, Int)])

-- | This interpreter is similar to interpretWithCoverage, except instead of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a lie, the hash accumulator was an old idea that actually ended up slowing things down. However I did end up going with a much faster data structure for the traces. Instead of a MultiSet OpLocation where OpLocation is a Contract and a bytecode index, we store insted a list of (codehash, bytecode index).

Using a codehash keeps the size of the corpus on disk small, and using a list means we can get O(1) insertion of new elements by just consing them onto the head of the list (instead of O(log n) for a MultiSet).

With these optimizations in place the extra overhead introduced by the serilization / instrumentation is somewhere between 10 and 15 percent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the use of codehash means we'll need some extra state if we want to show pretty coverage reports to the user (perhaps a mapping from codehash to SolcContract or smth in the Corpus?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to rewrite the existing coverage based interpreter to used the new data structures before merging, but I was still just trying to move fast and figure out what worked best so I used a seperate one to start with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh also it's worth noting that using a MultiSet or a list instead of a Set for the traces means that traces that get further into a loop will be treated as a new trace, which seems like a nice thing.

@transmissions11
Copy link
Contributor

does this include constant mining? crytic/echidna#262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants