Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cBioPortal Data Collection Automation #84

Open
inodb opened this issue Jan 25, 2018 · 7 comments
Open

cBioPortal Data Collection Automation #84

inodb opened this issue Jan 25, 2018 · 7 comments

Comments

@inodb
Copy link
Member

inodb commented Jan 25, 2018

Background:

The cBioPortal is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets, which are collected from a multitude of sources such as published research papers, publicly available data repositories, and private data sets. Please refer to the cBioPortal home page for an overview.

Whenever data submissions come from external sources, a lot of manual curation needs to be performed to make sure the data is imported smoothly and rendered correctly in the cBioPortal. We would like to automate parts of this data curation process which will be in part handled through our datahub, a data repository that stores all cancer study data that is currently available in the cBioPortal.

Currently, whenever a Pull Request is made to datahub, the data undergoes a series of validation steps run by our data validation tool. However, to ensure that the data looks and renders as expected in the cBioPortal, one must manually import the data into a live instance of the portal. Automating this step in particular will be hugely beneficial to the QC process and greatly improve the turnaround time from data submission to import and visualization in the cBioPortal.


Goal:

Streamline and improve the turnaround time and review process for cancer study data submissions by automating the import of validated data files into a live instance of the cBioPortal.

Approach:

One option for spinning up review apps includes Heroku, which we use for reviewing changes to the backend of cBioPortal.

Another option might be Github Action for AWS Lightsail.

Both platforms support docker compose, for which configuration files already exist.


Needed skills:

  • General problem solving skills.
  • Some basic knowledge of *nix, bash and devops would be useful, but can be learned during the project.

Possible mentors:
@inodb

@inodb inodb self-assigned this Jan 25, 2018
@css911
Copy link

css911 commented Feb 28, 2019

Hello!. It's Chetan. The idea is quite interesting. would like to work on it. To start with what task should I perform?

@ao508 ao508 transferred this issue from cBioPortal/GSoC Jan 24, 2020
@inodb
Copy link
Member Author

inodb commented Aug 10, 2020

@ao508 I noticed this was transferred from GSoC. If we are not working on it, maybe we can transfer it back?

@ao508
Copy link

ao508 commented Aug 10, 2020

@inodb that's okay with me

@inodb inodb transferred this issue from cBioPortal/datahub Aug 10, 2020
@inodb inodb added the GSoC-2021 GSoC 2021 Candidate Projects label Nov 16, 2020
@cBioPortal cBioPortal deleted a comment from pieterlukasse Jan 25, 2021
@inodb inodb added GSoC-2022 GSoC 2022 Candidate Projects devops Size: Medium (175h) and removed GSoC-2021 GSoC 2021 Candidate Projects labels Feb 17, 2022
@inodb inodb removed their assignment Feb 22, 2022
@cBioPortal cBioPortal deleted a comment from stale bot Feb 24, 2022
@daniocionini
Copy link

Very interesting idea. I would like to have a go at it, where is the open source code to start from?

@jagnathan
Copy link

the source code is available in github. https://github.com/cBioPortal

@jagnathan jagnathan reopened this Apr 14, 2022
@inodb inodb added GSoC-2023 GSoC 2023 Candidate Projects and removed GSoC-2022 GSoC 2022 Candidate Projects labels Jan 25, 2023
@devharsh2k4
Copy link

hey am interested in this project can u guide me further @inodb

@muskan-k
Copy link

muskan-k commented Feb 25, 2023

Hi @inodb ! I'm Muskan Kothari, currently a CSE senior at PES University, India. I'm here to contribute to this project through GSoC '23. I studied biology prior to starting undergrad in CSE and I'm highly interested in applying CSE to interdisciplinary domains. Having said that, I do have multiple projects involving computer science fundamentals to biology (Measures of lexical diversity and Alzheimer's detection) and physics (Tree based models for critical temperature of super conductors).

I also have experience working in big data and devops technologies like Docker and Kubernetes (converting monolith application to micro-services), PySpark and Hadoop (sentiment analysis of twitter).

I am proficient in programming languages likePython, C++ and Java and comfortable using Git.

I found the cBioPortal organization a perfect mix of my interests in interdisciplinary projects and my skills in various technologies that particularly help this project - cBioPortal Data Collection Automation. I'd love to learn and contribute to this project.

I understand that working on some issues would strengthen my application and I will also be spending time understanding the organization. I'd like to get started with my proposal. I've joined the slack as well.

Could we perhaps set up a discussion call? Could you tell me what technologies would be involved under DevOps?

Thanks!
Muskan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants