This repository contains ipython notebooks for the evaluation of the e-mission platform. These notebooks re-use code from the e-mission-server codebase, so it needs to be included while running them.
-
Install the e-mission server, including setting it up https://github.com/e-mission/e-mission-server
-
Set the home environment variable
$ export EMISSION_SERVER_HOME=<path_to_emission_server_repo>
To verify, check the environment variables using
$ env
and ensure
ENV_SERVER_HOME
is present in the list and has the right path (as mentioned above). -
If you haven't setup before, set up the evaluation system
$ source setup.sh
-
If you have, activate
$ source activate.sh
-
Access the visualizations of interest and copy the config over. The
<eval_folder>
mentioned below can be any folder containing notebooks and/or .py files for visualisation or other purposes. E.g. :TRB_label_assist
is one such folder.
$ cd <eval_folder>
$ cp -r ../conf .
- Start the notebook server
$ ../bin/em-jupyter-notebook.sh
- To get the data for the notebooks to run on, look at the dataset listed at the top of the notebook, and request the data for research purposes using https://github.com/e-mission/e-mission-server/wiki/Requesting-data-as-a-collaborator
After completing analysis, tear down
$ source teardown.sh
Note that all notebooks checked in here are completely public. All results included in them can be viewed by anybody, even malicious users. Therefore, you need to split your analysis into two groups:
- aggregate only: results are not specific for a single user. The scripts in such notebooks should not include uuids, and should use the aggregate timeseries instead of the default timeseries.
- example: number of walking and biking trips over all users in the control group
- individual analyses: results are specific for a single user. The scripts in such notebooks can include uuids, and potentially even user emails or tokens.
- example: varation in walking and biking trips over time for user
uuid1
- example: varation in walking and biking trips over time for user
Notebooks that include aggregate analyses can be checked in with outputs included. This is because it is hard to tease out the contributions by individuals to the aggregate statistics, and so the chances of leaking information are low. However, notebooks that include individual analyses should be checked in after deleting all outputs (Kernel -> Restart and clear output).
Aggregate results | Individual results | |
---|---|---|
with outputs | Y | N |
after clearing outputs | Y | Y |