Tanagra is a project to build a configurable cohort builder and data explorer. Our goal is to make it easy to set up a new dataset for exploring with little or no custom code required, so everything we've built is configuration-driven.
The project has three main pieces: indexer, service, UI. All three pieces are highly interconnected and are not intended to be used or deployed separately. Everything lives in this single GitHub repository.
The indexer takes the source dataset and produces a logical copy that's better suited to the types of queries the UI needs to run. It denormalizes some data, precomputes some things, and reorganizes tables. The goal is not to meet some query benchmark, only to have the UI not time out.
The service processes queries for the UI and manages the application database, which stores user-managed artifacts like cohorts and data feature sets.
The UI includes the cohort builder, data feature set builder, export, and cohort review interfaces.
Tanagra supports data patterns, rather than specific SQL schemas. Check the list of currently supported patterns to see how they map to your dataset.
Tanagra defines a custom object model on top of the underlying relational data. The dataset configuration language is based on this object model, so it's helpful to be familiar with the main concepts.
A dataset configuration is spread across multiple files, to improve readability and allow easier sharing across datasets. See an overview of the different files and directory structure, as well as pointers to example files. Check the full dataset configuration schema documentation to lookup specific properties. Documentation for protocol buffers used for visualizations and criteria plugins is here.
Choose a deployment pattern and configure the GCP project(s).
Once you've defined the configuration files for a dataset, run the indexer. Check the full indexer CLI documentation to lookup specific commands.
Tanagra does not provide an API for managing access control for a population of users. Instead, we provide an interface for calling an external access control service. (e.g. The VUMC admin service serves as the external access control service for the SD deployment.) Either reuse an existing access control implementation, or add your own.
We expect deployments to require varied methods of exporting data. Either reuse an existing export implementation, or add your own.
Check the full application configuration documentation to lookup specific deployment properties.
Once your deployment is up and running, create a regression test suite to detect unexpected changes due to config or underlying data changes and run it regularly.
Tanagra supports multiple deployments, all with different release cadences. See more details about the codebase versioning and release process, and how you can manage the version for a specific deployment.
Use this tool to diff two release tags, when you're planning on bumping a deployment to a newer version of this codebase.
Check the guidelines for developers, including instructions for getting things running locally on your machine.
See an overview of the codebase structure, and information specifically about the UI.
These are all linked in the sections above. This is just in list format if you already know what you're looking for.
Project overview
Configure a new dataset
Set up a new deployment
- Deployment Overview
- Indexing
- Indexer CLI Manpage
- Access Control
- VUMC Admin Service
- Data Export
- Deployment Config Properties
- Regression Testing
Manage releases
Contribute to the codebase