The artifact that we are submitting for evaluation is the func2vec tool (originally called program2vec as will be evident below). Anything named program2vec is simply our internal name for what became the func2vec tool.

This tool can be used to produce an embedding for any program compilable by LLVM, and includes the accompanying Linux dataset used to evaluate the ability of func2vec to detect function synonyms.

Running func2vec consists of a pipeline with several stages:

Produce a bitcode file for the software system
A set of LLVM passes that extract interprocedural control flow from a bitcode file
The random walker which translates the LLVM control flow data into an l-PDS and performs the walks
Commands for training a model from these random walks using the Gensim implementation of word2vec.
Commands for evaluating the model, given a gold standard of function synonyms.

NOTE: Unzipping the artifact file creates an "artifact" directory. It is assumed that all commands are run from the root of that directory. This artifact directory should be the /program2vec in the docker container (see "Running the docker container").

The data directory (download separately)

data/definedfunctions.txt A list of all functions in linux (generated as a byproduct of func2vec): data
data/pathbased.walks: Path-based random walks from Linux
data/flat.walks: Flat random walks from Linux
data/pathbased.model: Path-based func2vec embedding of Linux kernel (text word2vec format)
data/flat.model: Path-based func2vec embedding of Linux kernel (text word2vec format)
data/pcisound.sqlite: The error handler database (input to miner) for PCI sound drivers in the Linux kernel.
data/5fs.sqlite: The input to the error-handling specification miner for file system mining
data/manual-golden.txt: The full gold standard that was manually reviewed
data/underscores.txt: The underscore naming convention gold standard

Running the docker container

Download program2vec.tar which contains the docker container for running this artifact.

$PATHTOARTIFACT should be the unzipped artifact directory (contains src directory).

docker load --input program2vec.tar
docker run -it -v $PATHTOARTIFACT:/program2vec program2vec

The run command starts a docker container with the local artifact folder mounted inside the container at /program2vec.

* IT IS IMPORTANT THAT THE DOCKER VOLUME IS MOUNTED AT THIS LOCATION *

ls /program2vec should show a "src" directory.

Re-creating the docker image (optional)

cd docker
docker build -t program2vec .

Building

Issue the following commands inside the docker container:

cd /program2vec
mkdir build
cd build
cmake ..
make -j$(nproc)

Running the full replication script (optional)

The run.sh in script in the root of the artifact replicates all of the main paper results. This takes several hours and 20G to run (three hours on an i7-4500). Alternatively, individual steps are described in this document.

cd /program2vec
run.sh

Explanation of output:

The run.sh replication script creates an output folder with all of the main results of the evaluation of func2vec.

output/pathbased.walks: The random walks generated by the random walker using path-based func2vec
output/flat.walks: The random walks generated by the random walker using flat func2vec
output/pathbased.model: This is the path-based model
output/flat.model: The flat model
output/pathbased-manual-roc.png: ROC curve image for Figure (path-based manual)?
output/pathbased-flat-roc.png: ROC curve image for Figure (flat manual)?
output/pathbased-underscores-roc.png: ROC curve image for Figure (path-based underscores)?
output/flat-underscores-roc.png: ROC curve image for Figure (flat underscores)?
output/pathbased-clusters.txt: Path-based k-Means clusters
output/flat-clusters.txt: Flat k-Means clusters
output/pathbased-manual-f1.txt: F1 score for path-based manual gold standard. Result at the end of the file.
output/flat-manual-f1.txt: F1 score for flat manual gold standard. Result at the end of the file.
output/pathbased-underscores-f1.txt: F1 score for path-based underscores gold standard
output/flat-underscores-f1.txt: F1 score for flat underscores gold standard
output/sound-unmerged.specs: Sound PCI mining results without using function synonyms
output/sound-merged.specs: Sound PCI mining results with using function synonyms
output/fs-unmerged.specs: File system mining results without using function synonyms
output/fs-merged.specs: File system mining results with using function synonyms
output/table3.txt: Matches table 3 in the paper. Shows increase in support of mining rules discussed in paper. These results will differ in absolute numbers, but increase in support for the rules is consistent with what is reported in the paper.

Running func2vec on any C file

For any single C file, func2vec can be run as follows:

clang -c -emit-llvm example.c -o example.bc
python2 -m walker walk --bitcode example.bc --output example.walks
python2 -m walker train --input examples.walks --output example.model

Walking the Linux bitcode file

Memory requirement: Approximately 20G

Time: Approximately an hour

INPUT: data/vmlinux.o.bc

OUTPUTS: data/pathbased.walks data/flat.walks

The data directory contains a pre-generated bitcode file that corresponds to Linux ??. The only modifications to the source code where patches provided by the LLVMLinux project required to compile Linux with clang.

Our evaluation of func2vec compares the standard invocation of func2vec, which is path-based, with a flat version that does not use paths.

The parameters used in the paper for the "path-based" model are:

bias 2.0
length 100
numwalks 100
interproc 1
remove-cross-folder

The parameters used in the paper for the "flat" model are:

length 100
numwalks 100
interproc 0
flat

(The other parameters are not applicable to the flat method of walking)

Run these commands from /program2vec/data:

python2 -m walker walk --bitcode vmlinux.o.bc --output pathbased.walks --length 100 --walks 100 --interprocedural 1 --bias 2.0 --remove-cross-folder
python2 -m walker walk --bitcode vmlinux.o.bc --output flat.walks --length 100 --walks 100 --interprocedural 0 --flat

Training Linux models

INPUTS: data/pathbased.walks data/flat.walks

OUTPUTS: data/pathbased.model data/flat.mode1l

Training takes approximately 10 minutes.

python2 -m walker train --input pathbased.walks --output pathbased.model --mincount 5 --window 1
python2 -m walker train --input flat.walks --output flat.model --mincount 5 --window 1

The Gold Standards

We constructed two different gold standard data sets to evaluate func2vec.

The first data set consists of 2,652 unique functions that were identified by hand and reviewed with the assistance of a Linux kernel developer. These functions form 265 equivalence classes of function synonyms. This data set, along with the explicit false relations used to check for conflicts, is in the file data/manual-golden.txt
The second data set consists of pairs of functions that follow an underscore naming scheme. This is a strong naming convention in Linux. These pairs have been randomly sampled and spot checked, but not every pair has been examined by hand. This data set is located in the file data/underscores.txt.

Evaluating the model against a gold standard (ROC)

There are two metrics that each gold standard uses, that correspond to the two different data sets that are a part of this artifact.

The first data set is manual-synonyms.txt, which are the synonyms that were manually reviewed. The second data set is underscores.txt, which are synonyms that were identified based on the use of the underscore naming convention.

Each command will output the full ROC curve to a .png file, and print the AUROC score to the screen.

(From the /func2vec/data directory)

python2 -m walker metric roc --model linux-paths.model --reference manual-synonyms.txt --png pathbased-manual-roc.png
python2 -m walker metric roc --model linux-paths.model --reference underscores.txt --png pathbased-underscores-roc.png
python2 -m walker metric roc --model linux-flat.model --reference manual-synonyms.txt --png flat-manual-roc.png
python2 -m walker metric roc --model linux-flat.model --reference underscores.txt --png flat-underscores.roc.png

Evaluating the model against a gold standard (Clustering)

Clustering the models with k-Means

INPUTS: data/pathbased.model data/flat.model

OUTPUTS: data/pathbased_clusters.txt data/flat_clusters.txt

python2 -m walker cluster kmeans --k 3500 --model pathbased.model --output pathbased-clusters-k3500.txt
python2 -m walker cluster kmeans --k 3500 --model flat.model --output flat-clusters-k3500.txt

Evaluating the f1 score (overlap between clusters and gold standard)

python2 -m walker metric f1cluster --reference manual-synonyms.txt --clusters pathbased-clusters.txt
python2 -m walker metric f1cluster --reference manual-synonyms.txt --clusters flat-clusters.txt
python2 -m walker metric f1cluster --reference underscores.txt --clusters pathbased-clusters.txt
python2 -m walker metric f1cluster --reference underscores.txt --clusters flat-clusters.txt

Note that the k-Means clustering implementation is not entirely deterministic, and replication may produce slightly different results. Nonetheless, the numbers should be close and consistent with the reported results in Table 1.

Exploratory queries of the model

ipython2
from gensim.models import KeyedVectors
model = gensim.load_word2vec_format('MODELPATH', binary=False)
model.similarity('FN1', 'FN2')

Case study application: Mining Error-handling Specifications

The example application of func2vec presented in the paper is the enhancement of specification mining using function synonyms. This section reproduces the main results of Tables 2 and 3, which shows that function synonyms can improve the support of individual specifications.

The input to the miner is a mining database that consists of a list of error handling locations, and the function calls that are in the context and response of the

Detecting error-handler locations is outside the scope of this paper and relies on other tools that are not included in this artifact. Given an error-handling database that is to be mined, this artifact shows how function synonyms can be applied.

Counting error handlers (Table 2)

Support in our mining setup is counted in terms of the number of individual error handlers that support an association rule. Each error handler has a corresponding row in the Handler table of the database. The following SQL statement counts the number of error handlers for the PCI sound database:

sqlite3 pcisound.sqlite
select count(*) from handler;
3173

Case study rank improvement (Table 3)

Execute the runminer python script. This wil put table3-* files in output/

::: python2 runminer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

The data directory (download separately)

Running the docker container

Re-creating the docker image (optional)

Building

Running the full replication script (optional)

Running func2vec on any C file

Walking the Linux bitcode file

Training Linux models

The Gold Standards

Evaluating the model against a gold standard (ROC)

Evaluating the model against a gold standard (Clustering)

Clustering the models with k-Means

Evaluating the f1 score (overlap between clusters and gold standard)

Exploratory queries of the model

Case study application: Mining Error-handling Specifications

Counting error handlers (Table 2)

Case study rank improvement (Table 3)

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

The data directory (download separately)

Running the docker container

Re-creating the docker image (optional)

Building

Running the full replication script (optional)

Running func2vec on any C file

Walking the Linux bitcode file

Training Linux models

The Gold Standards

Evaluating the model against a gold standard (ROC)

Evaluating the model against a gold standard (Clustering)

Clustering the models with k-Means

Evaluating the f1 score (overlap between clusters and gold standard)

Exploratory queries of the model

Case study application: Mining Error-handling Specifications

Counting error handlers (Table 2)

Case study rank improvement (Table 3)