Skip to content

Commit

Permalink
Merge branch 'master' into demo
Browse files Browse the repository at this point in the history
  • Loading branch information
c-martinez committed Oct 19, 2016
2 parents 3695c7e + e91849f commit a73fd40
Show file tree
Hide file tree
Showing 9 changed files with 189 additions and 79 deletions.
78 changes: 5 additions & 73 deletions README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -9,81 +9,13 @@ ShiCo is a tool for visualizing time shifting concepts. We refer to a concept as
![Mock concept shift](./docs/mockConcept1.png)
![Mock concept shift](./docs/mockConcept2.png)

You can find more details of how the concept shift works [here](./docs/howItWorks.md).
You can find more details of how the concept shift works [here](./docs/howItWorks.md) and you can read the user documentation [here](./docs/ui.md).

- *How is it structured (backend/frontend)*
- *What the back end does?*
- *What do I see in the front end?*
## How to use it?
You can read how to get your own instance of ShiCo up and running [here](./docs/deploy.md).

*How to use it?*
- *What do I need to run it? (python, some web server, word2vec models)*

*How to extend it?*
- *Use different semantic model (other than word2vec)*

## Launching server

To launch the server run:
```
# python shico/server.py -f "word2vecModels/195?_????.w2v"
```

*Note:* loading the word2vec models takes some time and may consume a large amount of memory.

Then you can access trace a concept by connecting to the server using curl (or your web browser). Examples:

```
http://localhost:5000/track/oorlog
http://localhost:5000/track/oorlog?startKey=1952_1961
http://localhost:5000/track/oorlog?startKey=1952_1961&maxTerms=5
http://localhost:5000/track/oorlog?startKey=1952_1961&maxTerms=5&forwards=
http://localhost:5000/track/nederland?maxTerms=5&sumDistances=true
http://localhost:5000/track/nederland?maxTerms=5&sumDistances=
http://localhost:5000/track/oorlog,oorlogse
```

## Web app

### Adding hooks

You can add your own custom behaviour to the force directed graphs like this:
```
(function() {
'use strict';
angular
.module('shico')
.run(runBlock);
function runBlock(GraphConfigService) {
GraphConfigService.addForceGraphHook(function(node) {
node.select('circle').attr('r', function(d) {
return d.name.length;
});
});
}
})();
```

This snippet modifies the size of the force directed graph nodes, and makes them dependent on the length of the name in the node's data.


## Unit testing
To run Python unit tests, run:
```
$ nosetests
```

## Cleaning functions
In some cases, resulting vocabularies may contain words which we would like to filter. ShiCo offers the possibility of using a *cleaning* function, for filtering vocabularies after they have been generated. To use this option, it is necessary to indicate the name of the cleaning function when starting the ShiCo server. A sample cleaning function is provided (*shico.extras.cleanTermList*). You can use this function as follows:
```
$ python shico/server.py -c "shico.extras.cleanTermList"
```

## Speeding up ShiCo

Current implementation of ShiCo relies on gensim word2vec model `most_similar` function, which in turn requires the calculation of the dot product between two large matrices, via `numpy.dot` function. For this reason, ShiCo greatly benefits from using libraries which accelerate matrix multiplications, such as OpenBLAS. ShiCo has been tested using [Numpy with OpenBLAS](https://hunseblog.wordpress.com/2014/09/15/installing-numpy-and-openblas/), producing a significant increase in speed.
## How to extend it
If you would like to modify ShiCo, read the developer manual [here](./docs/develop.md).

## Licensing

Expand Down
82 changes: 77 additions & 5 deletions docs/deploy.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,78 @@
# Making a release
# Deploying ShiCo
If you want to run your own instance of ShiCo, there are a few things you will need:

- Merge changes on branch `demo`
- Run `gulp build`
- Make github release

- A set of word2vec models which your ShiCo instance will use.
- Run the python back end on your a server (you will need a server with enough memory to hold your word2vec models).
- Run a web server to serve the front end to the browser.

## Word2vec models

You are welcome to use our [existing w2v models](https://github.com/NLeSC/ShiCo/tree/master/word2vecModels); you might need to use [git-lfs](https://git-lfs.github.com/) to download them. If you do, please contact us for more details on how the models were build and to know how to cite our work. You can also [create your own](./docs/buildingModels.md) models, based on your own corpus.

## Launching the back end

Once you have downloaded the code (or clone this repo), and install all Python requirements (contained in *requirements.txt*), you can launch the flask server as follows:
```
$ python shico/server/app.py -f "word2vecModels/????_????.w2v"
```

*Note:* loading the word2vec models takes some time and may consume a large amount of memory.

You can check that the server is up and running by connecting to the server using curl (or your web browser):
```
http://localhost:5000/load-settings
```

Alternatively you use [Gunicorn](http://gunicorn.org/), by setting your configuration on *shico/server/config.py* and then running:

```
$ gunicorn --bind 0.0.0.0:8000 --timeout 1200 shico.server.wsgi:app
```

## Launching the front end

The necessary files for serving the front end are located in the *webapp* folder. You will need to edit your configuration file (*webapp/srs/config.json*) to tell the front end where your back end is running. For example, if your backend is running on *localhost* port 5000 as in the example above, you would set your configuration file as follows:

```
{
"baseURL": "http://localhost:5000"
}
```

If you are familiar with the Javascript world, you can use the *gulp* tasks provided. You can serve your front end as follows (from the *webapp* folder):
```
$ gulp serve
```

You can build a deployable version (minified, uglified, etc) as follows:
```
$ gulp build
```
This will build a deployable version on the *webapp/dist* folder.

## Pre-build deployable version

If you are not familiar with the Javascript world (or just don't feel like building your own deployable version), the *demo* branch of this repository contains a pre-build version of the front end. You can checkout (or download) that branch, and then you are ready to go.

## Serve with your favorite web server

Once you have a *webapp/dist* folder (whether downloaded or self built) you can serve the content of it using your favorite web server. For example, you could use Python SimpleHTTPServer as follows (from the *webapp/dist* folder):
```
$ python -m SimpleHTTPServer
```

## Cleaning functions
In some cases, resulting vocabularies may contain words which we would like to filter. ShiCo offers the possibility of using a *cleaning* function, for filtering vocabularies after they have been generated. To use this option, it is necessary to indicate the name of the cleaning function when starting the ShiCo server. A sample cleaning function is provided (*shico.extras.cleanTermList*). You can use this function as follows:
```
$ python shico/server/app.py -c "shico.extras.cleanTermList"
```

If you are using gunicorn, in your *config.py*, you can set `cleaningFunctionStr` to the name of your cleaning function, for instance:

```
cleaningFunctionStr = "shico.extras.cleanTermList"
```

## Speeding up ShiCo

Current implementation of ShiCo relies on gensim word2vec model `most_similar` function, which in turn requires the calculation of the dot product between two large matrices, via `numpy.dot` function. For this reason, ShiCo greatly benefits from using libraries which accelerate matrix multiplications, such as OpenBLAS. ShiCo has been tested using [Numpy with OpenBLAS](https://hunseblog.wordpress.com/2014/09/15/installing-numpy-and-openblas/), producing a significant increase in speed.
48 changes: 48 additions & 0 deletions docs/develop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# What should you do if you want to modify ShiCo?

Be brave! And get in touch if you need help. Pull requests are very welcome.

## Backend

Written in Python.

### Unit testing
If you modify ShiCo back end, make sure to write your unit tests for your code.

To run Python unit tests, run:
```
$ nosetests
```

## Web app

Written in Javascript (Angular).

### Adding hooks

You can add your own custom behaviour to the force directed graphs like this:
```
(function() {
'use strict';
angular
.module('shico')
.run(runBlock);
function runBlock(GraphConfigService) {
GraphConfigService.addForceGraphHook(function(node) {
node.select('circle').attr('r', function(d) {
return d.name.length;
});
});
}
})();
```

This snippet modifies the size of the force directed graph nodes, and makes them dependent on the length of the name in the node's data.

## Making a release on GitHub
- Merge changes on branch `demo`
- Run `gulp build`
- Make github release
Binary file added docs/embeddingGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/networkGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/searchBar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/streamGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 57 additions & 0 deletions docs/ui.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# How to use ShiCo?

This guide will instruct you in the elements for using ShiCo's user interface.

## User interface components

When you first open ShiCo on your browser, you will see a simple search bar:

![Search bar](./searchBar.png)

You can enter one or multiple (comma separated) *seed terms*. These seed terms are the entry point for your concept search. Click *Submit* to begin your search. The results from your search will be displayed in the results panel below the search bar.

The search bar has some additional features:
- It allows you to modify the search parameters. Click the *+* button to display additional search parameters.
- It allows you to save the parameters of your current search, or load the parameters of a previous search.

## Search parameters

The following is the list of parameters (with a link to a brief explanation) which can be used to control your concept search:

- [Max Terms](/webapp/src/help/maxTerms.md)
- [Max related terms](/webapp/src/help/maxRelatedTerms.md)
- [Minimum concept similarity](/webapp/src/help/minSim.md)
- [Word boost](/webapp/src/help/wordBoost.md)
- [Boost method](/webapp/src/help/boostMethod.md)
- [Algorithm](/webapp/src/help/algorithm.md)
- [Track direction](/webapp/src/help/direction.md)
- [Years in interval](/webapp/src/help/yearsInInterval.md)
- [Words per year](/webapp/src/help/wordsPerYear.md)
- [Weighing function](/webapp/src/help/weighFunc.md)
- [Function shape](/webapp/src/help/wFParam.md)
- [Do cleaning ?](/webapp/src/help/doCleaning.md) (only shown if your backend uses a cleaning function).
- [Year period](/webapp/src/help/yearPeriod.md)

## Produced graphics

Once a search is complete, ShiCo displays results in the results panel. Results are displayed using various graphs:

- Stream graph -- this shows each word of the resulting vocabulary as a stream over time. The stream gets wider or narrower according to the weight the word is given in the vocabulary.

![Stream graph](./streamGraph.png)

- Network graphs -- this shows a collection of graphs displaying the resulting vocabulary as a network graph. Words which are related to each other are connected with an arrow. The direction of the arrow indicates which word was the product of which seed word.

![Network graph](./networkGraph.png)

- Space embedding -- this shows an estimate of the spatial relationship between words in the final vocabulary at every time step. Please keep in mind that these spatial relations are approximate and should be considered with care.

![Space embedding graph](./embeddingGraph.png)

- Plain text vocabulary -- this shows a text representation of the concept search. This consists, for each time step, of the seed words used and the produced vocabulary.

## Saving and loading search parameters

When you click the *Save parameters* button, a text box with your search parameters will be displayed. Copy these parameters and save them somewhere. Click *Ok* to hide the text box.

When you click the *Load parameters* button, another text box will be displayed. Enter previously saved search parameters in this box and click *Ok* to load the parameters.
3 changes: 2 additions & 1 deletion shico/server/app.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
'''ShiCo server.
Usage:
server.py [-f FILES] [-n] [-d] [-p PORT] [-c FUNCTIONNAME]
app.py [-f FILES] [-n] [-d] [-p PORT] [-c FUNCTIONNAME]
-f FILES Path to word2vec model files (glob format is supported)
[default: word2vecModels/195[0-1]_????.w2v]
Expand Down Expand Up @@ -48,6 +48,7 @@ def trackWord(terms):
response.'''
params = app.config['trackParser'].parse_args()
termList = terms.split(',')
termList = [ term.strip() for term in termList ]
termList = [ term.lower() for term in termList ]
results, links = \
app.config['vm'].trackClouds(termList, maxTerms=params['maxTerms'],
Expand Down

0 comments on commit a73fd40

Please sign in to comment.