Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r-client] cBioPortalData: Example Bioconductor Workflow #83

Open
LiNk-NY opened this issue Feb 4, 2020 · 17 comments
Open

[r-client] cBioPortalData: Example Bioconductor Workflow #83

LiNk-NY opened this issue Feb 4, 2020 · 17 comments

Comments

@LiNk-NY
Copy link

LiNk-NY commented Feb 4, 2020

Background:

The cBioPortal R client opens up cancer sequencing data hosted on the cBioPortal for Cancer Genomics to alternative analysis platforms such as Bioconductor, an open source software for bioinformatics built on R.

Bioconductor provides many workflows for demonstrating use-cases for particular packages, analyses, visualizations, and technologies including (but not limited to):

All of the available Bioconductor workflows may be found here: http://bioconductor.org/packages/release/BiocViews.html#___Workflow

The cBioPortal provides a REST API for programmatic access to the data and leverages this same service to generate the visualizations and reports seen throughout the site. Although the types of visualizations and reports already available and provided by the cBioPortal are extensive, one may require additional customization options for their specific needs that cannot yet be done through the cBioPortal itself. Connecting to the API directly allows anyone to build their own custom visualizations and reports to suit their needs.

Users may access the REST API through command line tools, such as curl, or through API clients. The cBioPortal team has made 2 such API clients available: one written in R and another written in python. More information on these API clients and how to access and use them can be found here.

R is one of the leading programming languages in Data Science. As such, building an example Bioconductor workflow demonstrating the use of the cBioPortalData R client will be greatly beneficial to the cancer research community as a whole by making analyses and visualization of cancer sequencing data even more accessible.


Goal:
To create an example Bioconductor workflow and iPython notebook demonstrating the use of cBioPortalData R client and a general Bioconductor approach to data analysis. To write supporting functions for visualizing and parsing metadata from the cBioPortalData endpoints as provided in the MultiAssayExperiment object obtained from cBioPortalData.

Approach:

  • Provide a template workflow using cBioPortalData á la Bioconductor Workflows (package)
  • Implement exploratory visualizations using MultiAssayExperiment (e.g., from trackViewer)
  • Incorportate metadata from cBioPortalData effectively using Bioconductor data classes

Needed skills:

  • Some basic knowledge of working with web services
  • R (analysis and pkg dev), Bioconductor

Possible mentors:
@LiNk-NY @lwaldron

@inodb inodb added the GSoC-2020 GSoC 2020 Candidate Projects label Feb 5, 2020
@banerjeeshayantan
Copy link

I work in the area of cancer informatics and develop machine learning models to distinguish between driver and passenger mutations. This link contains more details about my work. I have extensively used R/Bioconductor for my research. Can I take up this project? I am aware of the fact that GSoC application period is over but I want to contribute anyways.

@alisman
Copy link

alisman commented Apr 14, 2020 via email

@lwaldron
Copy link

Dear @banerjeeshayantan, thanks for your interest! It would be great to have you take up this project. We can start a project and define some concrete issues at https://github.com/waldronlab/cBioPortalData/projects. Would be happy to set up a call to meet and discuss.

@lwaldron
Copy link

Update: I have created a long list of potential TODO items ranging from relatively quick to potentially hard at https://github.com/waldronlab/cBioPortalData/projects. @LiNk-NY @lgeistlinger feel free to edit/add.

@banerjeeshayantan
Copy link

I apologise for not replying earlier due to some other commitments. Can we talk sometime this week? I have already downloaded the package with all the necessary dependencies. Please let me know.

@lwaldron
Copy link

lwaldron commented May 6, 2020

No problem @banerjeeshayantan, thanks for your continued interest. I will touch base with @LiNk-NY today to plan, and we'll be in touch again soon.

@inodb inodb added GSoC-2021 GSoC 2021 Candidate Projects and removed GSoC-2020 GSoC 2020 Candidate Projects labels Nov 16, 2020
@inodb inodb added GSoC-2022 GSoC 2022 Candidate Projects R Size: Medium (175h) and removed GSoC-2021 GSoC 2021 Candidate Projects R labels Feb 17, 2022
@martinnnuez
Copy link

My name is Martin Rodríguez Nuñez. I graduated in 2020 as an environmental engineer at the National University of Córdoba, Argentina (UNC). I am actually enrolled in a PhD program in engineering sciences focused on modeling fine particulate matter (PM2.5) levels employing meteorologic, geographic, remote sensing and land use variables as predictors. After finishing college I started a master's degree in applied statistics, where I realized that this was my true passion. I only owe the thesis to obtain my master's degree.
I have a passion for statistical data analysis and predictive modeling, I have experience in these topics in R and python. I am currently working in the analysis of data as time series and the predictive modeling of these. I am interested in the project and especiallyespecially its goal, since it will be very beneficial to the cancer research community. I have no experience in this particular topic but I do know a lot about R and data analysis and I know that I am suitable to do it, besides, having a professional as a mentor would help to enhance my skills.
Before submitting an application I have some doubts that I want to resolve:
1- I would like to know what would be the process to apply for the position and if it is available.
2- I would also like to ask if you know what support functions you would like to develop.
Thank you very much in advance. I look forward to hearing from you.
Best regards,
Martin Rodriguez Nuñez.

@lwaldron
Copy link

Hi @martinnnuez, thanks for your interest! I think we have a good project for someone of your background, focusing on increasing the coverage of cBioPortal data imported by its Bioconductor client. There are many datasets which it fails to import for one reason or another (https://waldronlab.io/cBioPortalData/articles/cBioPortalDataErrors.html) that will require some combination of custom dataloaders, additional rules for the existing one, or correction of the data to resolve. @LiNk-NY is the developer and can describe in more detail, then we could all meet on a zoom call.

@LiNk-NY
Copy link
Author

LiNk-NY commented Mar 28, 2022

Hi Martin, @martinnnuez

Thank you for your interest! We are glad to have you help us.
Please let us know what day works for you and either myself or Levi (or both) can meet with you.
I can go over the details with regards to the getting the datasets in analysis ready shape.

Saludos,
Marcel

@martinnnuez
Copy link

martinnnuez commented Apr 1, 2022 via email

@lwaldron
Copy link

lwaldron commented Apr 1, 2022 via email

@martinnnuez
Copy link

Perfect, @LiNk-NY let me know when we can coordinate a meeting. Thank you very much.

@LiNk-NY
Copy link
Author

LiNk-NY commented Apr 6, 2022

Hi Martin, @martinnnuez

I can meet on Friday or next week.
Afternoons work best for me.
You can reach me on the Bioc-community Slack (my handle there is mramos148).
Register at https://bioc-community.herokuapp.com/
Looking forward to meeting with you!

Best,
Marcel

@imsarath
Copy link

imsarath commented Apr 7, 2022

Hi, @LiNk-NY @lwaldron,

I am interested to work on this project. Having read Levi's comment above, I would like to know more about the data wrangling aspect of it. Please let me know how to proceed.

Thanks,
Sarath

@lwaldron
Copy link

lwaldron commented Apr 11, 2022 via email

@LiNk-NY
Copy link
Author

LiNk-NY commented May 4, 2022

Hi Sarath, @imsarath
Any updates on this? Are you still interested?
Thanks!
-Marcel

@bhavy2202
Copy link

hey, is this issue still open for GSOC 23? Im interested to work in this project.
thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants