Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study Comparison Support #91

Open
Luke-Sikina opened this issue Feb 22, 2022 · 15 comments
Open

Study Comparison Support #91

Luke-Sikina opened this issue Feb 22, 2022 · 15 comments

Comments

@Luke-Sikina
Copy link
Member

Luke-Sikina commented Feb 22, 2022

Background:

The cBioPortal is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets, which are collected from a multitude of sources such as published research papers, publicly available data repositories, and private data sets. Please refer to the cBioPortal home page for an overview.

The public instance of the cBioPortal hosts hundreds of curated cancer genomics data sourced from public data repositories and published research articles. Many of these studies share data sources, sequencing platforms, gene panels, etc., and being able to compare two studies would be a powerful tool for users looking to understand how studies differ from one another.

Furthermore, the addition of a Cancer Study Comparison tool would useful in other ways as well such as for data curation and comparing mutation and annotation tools used on the same set of data, among many other potential uses.


Goal:

Create an in app tool that allows end users to compare two studies.

The tool should show differences in:

  • Samples / Patients
  • Gene panels
  • Molecular Profiles

Approach:

There should be a backend API that accepts a list of study IDs and returns a structured diff of the requested studies. The backend should be integrated into the existing cBioPortal codebase, and should have the route /api/study_comparison?study_ids=study_a,study_b.

There should be a frontend that consumes that API and presents it. The presentation of the information is up to you. You should try and find a way to categorize the various types of information, so that diffs of different data types don't blend together. Within datatypes, you should reference how established diffing tools display their output when designing your UI.

Resources:
Here are some API endpoints that provide information for studies. You shouldn't use these directly, as we want the comparison done on the backend, but you might want to use their underlying service methods when making a comparison endpoint. Running these curls might also give you a better idea of what these different data objects look like.

  • Samples: curl -X GET "https://www.cbioportal.org/api/studies/acc_tcga/samples?direction=ASC&pageNumber=0&pageSize=10000000&projection=SUMMARY" -H "accept: application/json"
  • Patients: curl -X GET "https://www.cbioportal.org/api/studies/acc_tcga/patients?direction=ASC&pageNumber=0&pageSize=10000000&projection=SUMMARY" -H "accept: application/json"
  • Gene Panels: curl -X POST "https://www.cbioportal.org/api/gene-panel-data/fetch" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"molecularProfileIds\": [ \"acc_tcga_rppa\", \"acc_tcga_rna_seq_v2_mrna_median_Zscores\", \"acc_tcga_linear_CNA\" ]}"
  • Molecular Profiles: curl -X GET "https://www.cbioportal.org/api/studies/acc_tcga/molecular-profiles?direction=ASC&pageNumber=0&pageSize=10000000&projection=SUMMARY" -H "accept: application/json"

Codebase
When building a REST endpoint in cBioPortal, you need to add a controller method to either an existing class or a make a new controller class. You can look at some of our existing controllers here: https://github.com/cBioPortal/cbioportal/tree/master/web/src/main/java/org/cbioportal/web In general, controllers call service methods. Service methods retrieve data from repository methods and process that data, returning the result to the controller. This means that in addition to making a new controller method, you should plan on adding a new service as well. You can find examples of service classes here: https://github.com/cBioPortal/cbioportal/tree/master/service/src/main/java/org/cbioportal/service/impl


Need skills:

  • Java, JavaScript, SQL
  • General good programming skills and willingness to learn.

Possible mentors:

@Luke-Sikina

@ao508
Copy link

ao508 commented Feb 25, 2022

@Luke-Sikina What do you mean by data source? Institution? Sequencing platform? Sample source? It's a bit unclear.

@ao508 ao508 changed the title Study Comparison Study Comparison Support Feb 28, 2022
@ao508 ao508 removed the help wanted label Feb 28, 2022
@devanshcache
Copy link

Hi! I'm Devansh a student and web developer intern. I have some experience in Reactjs and Java. I would like to work on this task. Is this task still available?

@abhijain2003
Copy link

Hey! I am abhi jain a frontend web developer i can code in javascript and reactjs. I would like to work on this issue can you explain it further

@ao508
Copy link

ao508 commented Mar 11, 2022

Hi @git-devansh! Thank you for reaching out :) I will make sure @Luke-Sikina reaches follows up with you soon about applying.

@ao508
Copy link

ao508 commented Mar 11, 2022

Hey! I am abhi jain a frontend web developer i can code in javascript and reactjs. I would like to work on this issue can you explain it further

Hi @abhijain2003 Thank you for reaching out :) I will make sure @Luke-Sikina reaches follows up with you soon about applying.

@abhijain2003
Copy link

abhijain2003 commented Mar 12, 2022 via email

@devanshcache
Copy link

Hi @git-devansh! Thank you for reaching out :) I will make sure @Luke-Sikina reaches follows up with you soon about applying.

Thank you! Looking forward to it.

@abhijain2003
Copy link

abhijain2003 commented Mar 13, 2022 via email

@Luke-Sikina
Copy link
Member Author

@git-devansh @abhijain2003 Yes this issue is open and we are looking for applicants. If you have any questions, you can ask them here. I'm looking forward to reading your proposals!

@OmarAshraf1
Copy link

Hi, I am Omar
I am interested in applying to this project. Could i ask a question please.
are the Samples / Patients, Gene panels and Molecular Profiles stored in a relational database ?
Thanks.

@Luke-Sikina
Copy link
Member Author

Luke-Sikina commented Mar 14, 2022

Hi, I am Omar I am interested in applying to this project. Could i ask a question please. are the Samples / Patients, Gene panels and Molecular Profiles stored in a relational database ? Thanks.

Hi Omar,

Great question. Yes, all data is stored in a MySQL 5.7 database. You can find the schema here: https://github.com/cBioPortal/cbioportal/blob/master/db-scripts/src/main/resources/cgds.sql
Within the schema here are the tables for the respective objects:

  • Patients: patient
  • Samples: sample
  • Gene Panels: gene_panel
  • Molecular Profiles: genetic_profile

@OmarAshraf1

@OmarAshraf1
Copy link

@Luke-Sikina
Thanks a lot Luke.
That is great, i understood the project better. I am looking forward to apply and i wish to be a part of this.
Thanks.

@MayaSatishRao
Copy link

Is this issue still open? I want to contribute and wanted to know whether this issue is open for GSOC 2023

@harsh2929
Copy link

@Luke-Sikina I'm a second year bachelor student in CSE and i would love to contribute regarding this issue, please let me know if its still up for contribution.

@RINO-GAELICO
Copy link

Hi @Luke-Sikina, I am just touching base to see if there is still interest in this enhancement. I am interested in studying this issue and make a proposal based on it. But first I'd like to see if there is still movement around here.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants