👍🎉 First off, thanks for taking the time to contribute! 🎉👍
This is a guide for contributing to the CG package. Please check here first if you want to set up an environment and develop, open and issue, suggest an enhancement, open a pull request etc.
Communicating around code can be a sensitive thing so please do your best to keep a positive tone. Remember that people are putting significant amount of work behind a PR or a review, stay humble ⭐
CG is using github flow branching model as described in our development manual.
This section guides you through submitting a bug report to CG. Following these guidelines helps other developers and contributors understand your report :pencil: reproduce the behavior :computer: :computer: and find related issues and reports :mag_right:
Before creating bug reports, please try to search the issues (opened and closed) if the problem has been described before, there might be no reason to create one. When creating a bug report, please include as many details as possible.
Note: If you find a Closed issue that seems like it is the same thing that you're experiencing, open a new issue and include a link to the original issue in the body of your new one.
Bugs are tracked as GitHub issues.
Explain the problem and include additional details to help maintainers reproduce the problem:
- Use a clear and descriptive title for the issue to identify the problem.
- Describe the exact steps which reproduce the problem in as many details as possible. For example, start by explaining where CG was run and how it was used, i.e. which command exactly you used in the terminal.
- Provide specific examples to demonstrate the steps. Include links to files or case IDs, or copy/pasteable snippets, which you use in those examples. If you're providing snippets in the issue, use Markdown code blocks.
- Describe the behavior you observed after following the steps and point out what exactly is the problem with that behavior.
- Explain which behavior you expected to see instead and why.
Provide more context by answering these questions:
- Can you reproduce the problem?
- Did the problem start happening recently (e.g. after updating to a new version of CG) or was this always a problem?
- If the problem started happening recently, can you reproduce the problem in
an older version of CG? What's the most recent version in which the problem
doesn't happen? You can test and run older versions of CG in the stage
environments by using the
update-cg-stage.sh
script.
Include details about your configuration and environment:
- Which version of CG are you using? You can get the exact version by
running
cg --version
in your terminal. - What's the name of the environment you're using?
This section guides you through submitting an enhancement suggestion for CG, including completely new features and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion 📝 and find related suggestions 🔎
Enhancement suggestions are tracked as GitHub issues. To suggest an enhancement create an issue on that repository and provide the following information:
- Use a clear and descriptive title for the issue to identify the suggestion.
- Provide a step-by-step description of the suggested enhancement in as many details as possible.
- Provide specific examples to demonstrate the steps. Include copy/pasteable snippets which you use in those examples, as Markdown code blocks.
- Describe the current behavior and explain which behavior you expected to see instead and why.
- Explain why this enhancement would be useful
NEVER USE PREINSTALLED PYTHON
First of all, make sure that you are managing your python versions that are used on your machine, never use the OS native python. Suggested ways to handle python version are either through homebrew(OSX), pyenv or conda.
For local development, it is recommended to use Poetry. Ensure that you have Poetry installed and run
poetry install
On our servers where the production and stage versions of CG are run the packages are maintained by using conda environments. For local development it is suggested to follow the python packaging guidelines where it is suggested to manage your local python environment with poetry.
The process described here has several goals:
- Maintain CG's quality
- Engage the developers in working toward the best possible CG
- Enable a sustainable system for CG's maintainers to review contributions
Please follow these steps to have your contribution considered by the maintainers:
- Follow all instructions in the template
- Follow the styleguides
- After you submit your pull request, verify that all status checks
are passing
What if the status checks are failing?
If a status check is failing, and you believe that the failure is unrelated to your change, please leave a comment on the pull request explaining why you believe the failure is unrelated. A maintainer will re-run the status check for you. - Update CHANGELOG.md with relevant information
While the prerequisites above must be satisfied prior to having your pull request reviewed, the reviewer(s) may ask you to complete additional design work, tests, or other changes before your pull request can be ultimately accepted.
- Use the present tense ("Add feature" not "Added feature")
- Limit the first line to 72 characters or less
- Reference issues and pull requests liberally after the first line
We use black to format all files, this is done automatically with each push on
GitHub so don't forget to update your local branch with git pull
after
pushing to the origin. More details are described in the general development manual.
This package is a little special. Essentially it should include all the "Clinical"-specific code that has to be integrated across multiple tools such as LIMS, Trailblazer, Scout etc. However, we still aim to structure it in such a way as to make maintainance as smooth as possible!
This part of the package contains connectors to the various tools that we
integrate with. An app interface can be a wrapper for an external tool like
Trailblazer (tb
) or be implemented completely in cg
like lims
. It's very
important that the code stays confined to each individual tool. The Housekeeper
connector cannot directly talk to Trailblazer for example - such
communication has to go through a meta
module.
We also try to group all app-related imports and functionality in these interfaces. You shouldn't import e.g. a function from Scout from any other place than its app interface. This way, it's easier to overview if an update to an external package will affect the rest of the system.
Interface to Chanjo. It is used to load coverage information from Sambamba output.
Internal app for working with invoices of groups of samples or pools.
Internal app for interfacing with the Clarity LIMS API. We use the genologics
Python API as much as possible. Some actions are not supported, however, and
then we fall back to using the official XML-based REST API directly.
We convert all the info that we get from LIMS/genologics
to dictionaries
before passing it along to other tools. We don't pass around objects that have
some implicit connection to update things in LIMS - such actions needs to go
through the lims
app interface explicitly.
Interface to Trailblazer.
- Monitor analysis workflow status
Interface to Genotype. For uploading results from the workflow about genotypes to compare and validate that we are clear of sample mix-ups.
Interface to Housekeeper. For storing files from analysis runs and FASTQ files from demultiplexing.
Interface to LoqusDB. For loading observation counts from the analysis output.
Internal app for opening tickets in SupportSystems. We use this mainly to link a ticket with the opening of an order for new samples/analyses.
Interface to Scout. For uploading analysis results to Scout. It's also used to access the generation of gene panels files used in the analysis workflow.
Module to generate Delivery Reports. This module is designed to convey the results of genetic analysis to the customer. It includes information on sample characteristics, laboratory preparation, sequencing attributes, as well as data analysis performance and limitations.
The command line code is written in the Click framework.
This set of commands let's you quite easily add things to the status database. For example when a new customer is signed you could run:
cg add customer cust101 "Massachusetts Institute of Technology"
You can also accomplish simliar tasks through the admin interface of the REST server.
Includes: status
, lims
Some info if primarily stored in LIMS and needs to be syncronized over to
status
. This is the case for both the date when a samples was received and
when it was finally delivered. This interface is intended to run continuously
as part of a crontab job.
cg transfer lims --status received
And similarly for filling in the delivery date:
cg transfer lims --status delivered
Includes: stats
, hk
, status
The API accepts the name of a flow cell which will be looked up in stats
. For
all samples on the flow cell it will:
- Check if the quality (Q30) is good enough to include the sequencing results
- update the number of reads that the sample has been sequenced overall and match this with the requirement given by the application.
- accordingly, the interface will look up FASTQ files and store them using
hk
. - if a sample has sufficient number of reads, the
sequenced_at
date will be filled in (status
) according to the sequencing date of the most recent flowcell.
This is the interface that bridge various apps. An example is the "orders"
module. When placing orders we need to coordinate information/actions between
the apps: lims
, status
, and osticket
. It also provides some additional
functionality such as setting up the basis for the orders API and which fields
are required for different order types.
Includes: lims
, status
, osticket
The API exposes a single endpoint for submitting a batch of new samples/external samples for analysis. It handles a mix of updates to existing samples with entierly new ones. The API is designed to work well in a REST API context.
The interface supports:
- samples for sequencing + analysis (scout)
- samples for sequencing only (fastq)
- sequencing of ready-made libraries (rml)
- analysis of externally sequenced samples (external)
- sequencing and analysis of microbial whole genomes (microbial)
It opens a ticket using the osticket
API for each order which it links with
the ticket number. It stores information in both LIMS and status
for samples
and pools linked by LIMS id. It stores only a minimum of information about each
sample in LIMS. Most of the critial information is stored in status
and this
is also the primary place to go if we need to update e.g. application tag for a
sample.
Includes: status
, hk
, coverage
The API will fetch information about an analysis like name and case ID and
related samples from status
. It will get the Sambamba output from hk
and
use the coverage
app interface to upload the data to Chanjo for each
individual sample.
Includes: status
, hk
, tb
, gt
Given an analysis, the API will fetch information about the family. It will
fetch the gBCF + qcmetrics files from hk
. It will parse the qcmetrics file
using tb
to find out the predicted sex of each sample. It will then upload
the results to Genotype. Subsequent upload of the same samples will overwrite
existing information while logging the event.
Includes: status
, hk
, loqus
Given an analysis record, it will fetch required files from hk
and upload the
variants to the observation cound database using loqus
. This only works for
cases with at least one affected individual.
Includes: status
, hk
, scoutapi
Given the analysis record it will generate a Scout config file using
information from status
. It will use the "meta/analysis" API to convert the
default panels for the family to the corresponding set of panels used to run
the analysis. It will fetch all related VCF files and others from hk
. Finally
it will use the config to upload the resuls to Scout. The scoutapi
interface
will figure out if there's an existing analysis that needs to be replaced.
Includes: status
, lims
The creation of an invoice is initiated in LIMS. You then use the process ID to parse out which samples should be part of the invoice. This process is setup to be run when a button is clicked in LIMS but can be run externally from any server with access to LIMS.
The invoice itself will be stored in an intermediate state in status
while
links will be created in LIMS to be able to keep track of which samples was
part of which invoice.
The REST API server handles a number of actions. It's written in Flask and exposes an admin interface for quickly editing information in the backend MySQL database. The admin interface is served under a hidden route but the plan is to move it to Google OAuth.
The API is protected by JSON Web Tokens generated by Google OAuth. It authorizes access using the user table in the database.
The /order/<type>
endpoint accepts orders for new samples. If you supply a
JSON document on the expected format, a new order is opened in status
and
LIMS.
This really is the status
app more or less. It's the interface to the central
database that keeps track of samples and it which state they are currently in.
All records that enters the database go through this API. Simple updates to
properies on records are handled directly on the model instances followed by a
manual commit.
There's one file for storing all constants like how priority levels are translated between the database representation and the human readable equivalent.
Another module /exc.py
contains the custom Exception classes that are used
across the package.