[8] Find a way how to get project name from NVD CVE data #2485

msrb · 2018-03-07T08:15:16Z

Description

Mapping CVE entries to actual package names is much easier when we at least know name of a project (e.g. "Apache NiFi", or "Apache POI") that is affected by given vulnerability. Knowing the project name will help us to get better results and less false positives.

The output of this task should be a function that takes one NVD CVE record on input and returns list of possible project name candidates. Having confidence score for each candidate would be nice, but is not necessary.

Sub tasks for sprint #2433

Have a set of labeled data to train, validate and test accuracy with.
Update: It is possible to use label the NVD feeds that reference GitHub and hence the project name can be infered from the description of the CPE. This set however might not be sufficient and the approach should be further discussed.
Discover whether the data evinces latent pattern.
Model selection based on the description pattern properties
Update: Based on the description properties, Naive Bayes classifier was selected for the implementation.
Classifier implementation
Accuracy evaluation
Update: Accuracy has been evaluated on a relatively small dataset (cca 20% of real data) due to lack of labeled data.

CermakM · 2018-03-07T09:48:28Z

A sub task has been added. In order to be able to at least estimate the accuracy of a model, we need an accessible toy set of labeled data.

krishnapaparaju · 2018-03-14T12:11:17Z

@CermakM can you please share the approach for retrieving the project name ? Which ecosystems being considered for this work ?

- commit jupyter notebook DISCLAIMER: the code in the notebook is in NON-production quality and serves only as sketch of possible solution - the notebook provides a POC for project name inference from cpe description - the notebook is supposed to visualize possible results when implementing such kind of classifier for the task GitHub issue: openshiftio/openshift.io#2485 Signed-off-by: Marek Cermak <[email protected]> new file: cve-desrciption-cracker.ipynb

CermakM · 2018-03-14T16:45:32Z

@krishnapaparaju sure thing,
This approach is not ecosystem specific, since it is based on project name extraction from CPE description only, without using any external search engines.

In the notebook you can see a suggestion of the approach that could greatly improve our current approach (also a slight comparison is present in the ntb).
This is still a WIP and a lot of things need to be discussed / implemented / verified.
Also, please expect and tolerate some slapdash code.

msrb · 2018-03-27T09:17:12Z

This experiment will continue in Sprint 147.

@CermakM please update this issue, thanks 👍

CermakM · 2018-03-27T10:49:34Z

Conclusion for the current sprint #2433

We were able to prove that the description data evince a pattern. With a suitable feature extractor, a classifier can be trained to provide a decent predictions of a project name candidate. Such candidates are evaluated with a numeric confidence score and can be further processed (ordered, filtered, etc.)

To be done in sprint #2775 :

It is yet unknown whether the accuracy of the classifier is sufficient, only basic accuracy evaluation has been implemented atm, cross-validation remains to be done, as suggested by @rootAvish .
Also, the classifier used for this proof was an implicit one and can be further improved to provide more flexible and more suitable way to integrate with our current tools. The decision about the implementation will be based on cross-validation results and team members opinions.

cc @msrb

msrb added type/task area/analytics team/analytics team/analytics/core labels Mar 7, 2018

msrb added this to the Sprint 146 milestone Mar 7, 2018

krishnapaparaju mentioned this issue Mar 7, 2018

Sprint plan for Fabric8 Analytics: #146 #2433

Closed

41 tasks

msrb mentioned this issue Mar 7, 2018

Determine version ranges from NVD data #2006

Closed

CermakM self-assigned this Mar 7, 2018

CermakM changed the title ~~Find a way how to get project name from NVD CVE data~~ [8] Find a way how to get project name from NVD CVE data Mar 7, 2018

krishnapaparaju mentioned this issue Mar 27, 2018

Sprint plan for Fabric8 Analytics: #147 #2775

Closed

56 tasks

msrb modified the milestones: Sprint 146, Sprint 147 Mar 27, 2018

CermakM mentioned this issue Mar 27, 2018

[8] Implementation and improvements of project name acquiring approach from NVD CVE data #2783

Closed

4 tasks

CermakM closed this as completed Mar 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8] Find a way how to get project name from NVD CVE data #2485

[8] Find a way how to get project name from NVD CVE data #2485

msrb commented Mar 7, 2018 •

edited by CermakM

Loading

CermakM commented Mar 7, 2018

krishnapaparaju commented Mar 14, 2018

CermakM commented Mar 14, 2018

msrb commented Mar 27, 2018

CermakM commented Mar 27, 2018

[8] Find a way how to get project name from NVD CVE data #2485

[8] Find a way how to get project name from NVD CVE data #2485

Comments

msrb commented Mar 7, 2018 • edited by CermakM Loading

Description

Sub tasks for sprint #2433

CermakM commented Mar 7, 2018

krishnapaparaju commented Mar 14, 2018

CermakM commented Mar 14, 2018

msrb commented Mar 27, 2018

CermakM commented Mar 27, 2018

Conclusion for the current sprint #2433

msrb commented Mar 7, 2018 •

edited by CermakM

Loading