Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can i directly infer the type for a stripped binary not in train or test set #1

Open
hackhaye opened this issue Dec 11, 2024 · 3 comments

Comments

@hackhaye
Copy link

According to the Readme, when the model is in the test phase, the input of the model is pkl, I would like to ask is there any way to predict the type of stripped binary directly and output it?

@hackhaye
Copy link
Author

I compiled a binary with flag "-g -fcf-protection=none -fno-eliminate-unused-debug-types -frecord-gcc-switches -pipe -fno-lto -fno-inline-functions -fno-inline-small-functions -fno-inline-functions-called-once -fno-inline -O2",and use the command "./TYGR datagen ./binaryDATASET.pkl" , why i get the following results:

Source #functions: 0 Well Formed #functions: 0 Source #vars: 0 Well Formed #vars: 0

it seems that no var and function at all,but it does has vars and functions in binary

@ChangZhu1997
Copy link
Member

Hey! The reason why it shows no source functions/variables is bec there is no dwarf info. Right now the logic is if there is no dwarf info matched skip that variable.

Dwarf info is used to match each nodes to the ground truth type. Since it is in training/testing phase we need dwarf to evaluate the performance(e.g. accuracy), the dwarf info is only used for evaluation and the model does not know that.

If you want to directly output predicted result for stripped bins I would suggest you to modify glow/datagen.py to create pkl files without dwarf info(ground truth). Then modify test module to only output predicted result.

@hackhaye
Copy link
Author

This work is great, but I found that on my 32GB device, using "--parallel", it was extremely easy to get stuck on a 5MB program. Even if I change the smaller pool_size, for example, pool_size=3, it will freeze.

Are there any optimization measures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants