tags |
---|
ggg, ggg2024, ggg201b |
Permanent link: ngs-docs/2024-ggg-201b-lab
[toc]
C. Titus Brown, [email protected], UC Davis
GitHub site: github.com/ngs-docs/2024-ggg201b-lab
This site will contain all of the actual lab lessons, and I'll post links to them each week to canvas.
Please abide by my lab's Code of Conduct in this course.
In particular, this is not an intellectual contest, and please realize that we all have plenty of things to learn.
What: hands-on computational work
When: Friday 9-10:50am
Where: Shields 360; Zoom.
Lab sessions will be recorded and available for a month; the links to lab sessions and any associated notes will be in the comments under the announcement for that lab.
We'll be using the UC Davis High Performance Compute Cluster for all our work; all you'll need is an Internet-connected laptop.
There will be four homeworks, submitted via GitHub Classroom.
Homeworks can be done collaboratively, but you need to hand it in individually. You are responsible for what you hand in.
I grade each homework S/U. The division of grading between labs and the rest of the course is in the whole course syllabus.
C. Titus Brown (IOR) ([email protected])
Office hours will be by arrangement. Mondays and Tuesday afternoons I will be in DataLab; Wed and Thursday mornings I will be in VetMed.
Please contact Titus via e-mail at [email protected] at least a day in advance if you want to come to office hours.
In this lab, we will work with three different automated workflows, for three common bioinformatics tasks: variant calling, genome assembly, and RNAseq differential expression. (These will not be cutting edge workflows and should not be directly used for your own work, but they will be complete and functional.)
The overall learning goals for the lab are to:
- familiarize you with the basic operational concepts involved in variant calling, genome assembly, and RNAseq differential expression.
- introduce the use of workflow management tools as a core aspect of biological data analysis.
- describe scientific issues surrounding data analysis techniques and processes, including statistical issues, reproducibility, provenance, and publication.
In terms of technology, we'll be using the snakemake workflow system, running on the farm cluster. We'll touch briefly on git/GitHub and conda, but those topics will be discussed in much more detail in GGG 298, Tools for Data Intensive Research (winter 2024)!
- Labs 1-4 - variant calling and snakemake workflows
- Labs 5-7 - de novo genome assembly
- Labs 8-10 - RNAseq for differential expression analysis
The labs are taught in a very hands-on way, with an emphasis on running things first, and then going back and exploring what the commands are doing. That means that not everything is explained in detail the first, or even the second time, you see it! For the first few sessions, please bear with - hopefully it will all make sense by the end, and if not, ask questions!
Lab will be on Fridays 9-10:50am in Shields 360. You are welcome to attend remotely if need be, but in-person attendance is encouraged, even if it's really cold and dark outside.
Shields 360 is the DataLab classroom.
It is not easy to find the first time! So plan to take a few extra minutes to get there!
You can find it by going into the Shields Library, then climbing up the two flights of stairs and going to the right (or taking an elevator up to the 3rd floor). It is all the way in the back of the library on the 3rd floor, in the southeast corner. You should see a big “DataLab” sign on the wall in front of you as you walk back from the stairs in front.