-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coverage Guided / Mutation Based Fuzzing #687
base: master
Are you sure you want to change the base?
Conversation
@@ -134,6 +143,10 @@ data AbiType | |||
| AbiTupleType (Vector AbiType) | |||
deriving (Read, Eq, Ord, Generic) | |||
|
|||
instance ToJSON AbiType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all these ToJSON
/FromJSON
instances are not neeeded anymore, they're just leftover from a time when I was serializing the corpus as json.
@@ -101,6 +106,12 @@ data TestVMParams = TestVMParams | |||
, testChainId :: W256 | |||
} | |||
|
|||
-- | For each tuple of (contract, method) we store the calldata required to | |||
-- reach each known path in the method | |||
type Corpus = Map (W256,Text) (Map [(W256, Int)] AbiValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W256
is the codehash
|
||
type TraceIdState = (VM, [(W256, Int)]) | ||
|
||
-- | This interpreter is similar to interpretWithCoverage, except instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a lie, the hash accumulator was an old idea that actually ended up slowing things down. However I did end up going with a much faster data structure for the traces. Instead of a MultiSet OpLocation
where OpLocation
is a Contract
and a bytecode index, we store insted a list of (codehash, bytecode index)
.
Using a codehash keeps the size of the corpus on disk small, and using a list means we can get O(1)
insertion of new elements by just consing them onto the head of the list (instead of O(log n)
for a MultiSet).
With these optimizations in place the extra overhead introduced by the serilization / instrumentation is somewhere between 10 and 15 percent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the use of codehash means we'll need some extra state if we want to show pretty coverage reports to the user (perhaps a mapping from codehash to SolcContract
or smth in the Corpus
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to rewrite the existing coverage based interpreter to used the new data structures before merging, but I was still just trying to move fast and figure out what worked best so I used a seperate one to start with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh also it's worth noting that using a MultiSet
or a list instead of a Set
for the traces means that traces that get further into a loop will be treated as a new trace, which seems like a nice thing.
does this include constant mining? crytic/echidna#262 |
Still very much WIP (and currently based off a pretty outdated commit).
This PR introduces a coverage guided / mutation based approach to fuzzing. All fuzz tests are now run with coverage enabled. We store each path seen, along with the input that we used to get there. When generating calldata for a fuzz run, we sometimes use the existing strategy (random), and sometimes choose instead to mutate one of the examples from the corpus of visited paths.
This should hopefully help us get even deeper inside complex contracts.
Still TODO: