Wikipedia is a distributed content repository which exhibits large scale collaboration. It is a public encyclopedia that can be edited by anyone. The quality of the Wikipedia's content is managed by a large community which actively reverts edits caused by spammers and vandals. In a large-scale collaboration, we have a large number of users and we cannot remember all collaborators and their performance. It would be good to have a trust score of a user in order to help us choose our collaborators. In this paper, we propose a reputation system based on trust score, that predicts the behavior of an author based on the survival rate of its past edits. We desire to use these scores to determine the chances of vandalism and predict longevity of future revision. Our results show that there is less chance of vandalism if the obtained scores stabilize over time. Our system is solely based on evolution of past contributions and helps us highlight users with low trust scores. We also devised an algorithm to obtain trust score of a user based on the quality of their contribution. We calculated past user contribution in terms of the quality of the article. Determining quality of an article is another resource intensive task performed manually by the community. This classification is essential in providing users with high quality articles and identifying low quality content for improvement. Due to this massive throughput, the community requires an automated tool to classify quality of articles. We used readability scores, textual and structural features of an articles to predict its quality class. Our approach is based on Deep Neural Network (DNN) and feature selection. Our results are comparable with existing approaches and shows improvements in terms of accuracy and information gain.
File | Description |
---|---|
testAlorithmReadiability.py | This file contains several functions realted to "Feature Selection, Buidling testing multiple machine learning classifer, and Hyper-tune the classifer". It will output graph realted to feature importance, confusion metrics of different classifers before and after hypertuned. All the methods are called in the main section. You can run the execute the file by: "python testAlorithmReadiability.py" |
UserContributionMetric.py | The file contains function realted to plotting graph for user contribution and trust values of those contribution that are used in our thesis. It requires that the user computation ot be calculated first using calcFeatureContrib.py. All the methods are called in the main section. You can execute the file by: "python calcFeatureContrib.py" |
UserLongevityMetric.py | The file contains function realted to calculating text longevity, plotting graph for text longevity and its trust values that are used in our thesis. All the methods are called in the main section. You can execute the file by: "python UserLongevityMetric.py" |
LongevityRegressionGraph.py | The file contains function realted to calculating multi variable regression of Longevity and Trust Score. All the methods are called in the main section. You can run the execute the file by: "python LongevityRegressionGraph.py" |
TrustScore.py | The file contains function realted to calculating Trust Score. It is python implementation of Trust Metric. You can execute the file by: "python TrustScore.py" |
calcFeatureContrib.py | The file contains function realted to calculating user contribution and quality of contribution. All the methods are called in the main section. You can run the execute the file by: "python calcFeatureContrib.py" |
calcFeatureNew.py | The file contains function realted to calculating readability scores and new text based features. It requires to downlaod 2015 wikipedia quality dataset from 2015 dataset. All the methods are called in the main section. You can execute the file by: "python calcFeatureNew.py" |
Machine Learning Classifiers | The directory contains all the machine leanring classifers that we used for our prediction. |
You can report the bugs at the issue tracker
OR
You can message me if you can't get it to work. In fact, you should message me anyway.
This program was developed to support results of my thesis, so the coding standards might not be up the mark. Don't be shy to make a Pull request :)
For making contribution:
- Fork it
- Clone it
git clone https://github.com/wahabjawed/wiki-maya.git
cd wiki-maya
use pycharm studio to open the project
Built with ♥ by Abdul Wahab(@wahabjawed) under MIT License
This is free software, and may be redistributed under the terms specified in the LICENSE file.
You can find a copy of the License at http://wahabjawed.mit-license.org/