Skip to content

Latest commit

 

History

History
198 lines (128 loc) · 7.61 KB

SCHEDULE.md

File metadata and controls

198 lines (128 loc) · 7.61 KB
tags
summer2023, collaboratory

[toc]

Collaboratory schedule - DataLab June 2023

Edit: hackmd-github-sync-badge or on github

When: June 20th-30th, 2023, ~9am-5pm each day

Where: Shields Library 360 (DataLab), UC Davis main campus.

Contact: [email protected], [email protected]

Parking suggestion: lots 5/5A are not that far, and you can walk through a small redwood grove! MU parking structure is also nearby (A and C parking). TAPS enforces parking; day parking permits can be purchased through the ParkMobile app.

Food: DataLab staff have started a list of quick lunch options on and around campus. Note: on campus venues may be closed due to summer break.

All information will be posted to our GitHub repository, ngs-docs/2023-june-datalab-collaboratory, and will be available indefinitely.

Introduction and Expectations

This workshop is focused on enabling attendees to improve and expand their existing workflows. All activities are optional but we hope to keep it interesting enough that everyone will attend and participate in the all-hands sessions. But you are also welcome to hide out in a corner and work on your own problems and ask for help periodically!

In particular, we hope to fill in a lot of gaps for people in their mental models of computing, and provide many ideas for how to improve the efficiency with which you work and compute!

This is designed to be a super-friendly workshop where you can ask all those questions about computing that you never felt comfortable asking before.

We're looking forward to seeing you all!

Facilitators and Helpers!!!

Lead facilitators

  • Pamela
    • data science, R, team science, etc!
  • Hannah
    • VScode, github desktop(git GUI), R
  • Wes
    • Statistics and R programming, etc.
  • Sophie
    • pop gen, workflows
  • Dani
    • HPC

Helpers

  • Mo
    • ChatGPT and Git Colab, python
  • Nistara
    • R, some git (command line), emacs
  • Makan
    • Pop-up leader for AI/ML
  • Colton
    • workflow, machine learning, multiomics, image processing, benchmarking
  • Nick
    • statistics, R, Python, Julia, etc.

Daily schedule

Each all-hands session below will follow the same basic format:

  • intro to topic (15 min)
  • Q&A, discussion and comparison (30 min - 1 hour)
  • break out into facilitated co-working groups
  • reconvene at end to coalesce and retrospect; take notes for pop-ups.

We expect to have "pop-up" sessions on additional topics or techniques as needed/desired.

Days will start at ~9:15, with lunch from noon-2pm; we will end before 5pm every day!

Schedule of topics by day

Tue, June 20th - welcome; git

Setup:

  • make sure you're on wifi (eduroam) and slack (DataLab, #2023-june-collaboratory)!

9:30am: Morning: welcome & introductions

Sticky note exercise/questions: write 3 sticky notes and put them in groups on the back whiteboard!

  • Name + scientific domain
  • Name + computational tools/approach/??
  • Name + goal for workshop (automation, scalability, validation, ???), or "what you want to work on most".

Lunch: pizza!

2pm: Afternoon session: pinning your project down with version control (git and github)

Need help with git, R, python, or any other data science topics? Check the directory of Datalab workshops!

Wed, June 21st - slurm; conda

9:30am Morning session: (ab)using the HPC for fun and profit (slurm, srun, and sbatch)

2pm: Afternoon session: software installations that (usually) just work (conda)

Th, June 22nd - scripting; organization

Morning session: automating the heck out everything (shell scripts, R, Python)

Afternoon session: dude, where's my file? (organizing your files)

Fri, June 23rd - snakemake

Morning session: automating stuff even more with workflow systems (snakemake)

Afternoon session: finishing stuff off

Mon, June 26th

Work day + pop-up topics; schedule TBD

Tue, June 27th

Work day + pop-up topics; schedule TBD

Wed, June 28th

Work day + pop-up topics; schedule TBD

Th, June 29th

Work day + pop-up topics; schedule TBD

Fri, June 30th

Work day + pop-up topics; schedule TBD

(workshop ends at noon)

Advice for ongoing work goals!

Throughout this workshop, you may find you have downtime and/or need some sort of short term direction. Here are some ideas for what you can do!

  • write a brief markdown document (maybe on hackmd??) describing how to run your project.
  • make a small test or example data set or analysis that is (a) quick to run and (b) uses most of your scripts. This is a good way to make sure that your scripts still work and is really helpful for us in trying to help you!
  • refactor/edit your scripts to run out of a single working directory - without absolute paths, etc. etc.
  • diagram out your workflow!
  • play with technology in a safe environment! hackmd, etc.
  • take a step back and think about what you'd like to try out or achieve while you have lots of expert help around!

Topics for pop-ups

Pop-ups are as-needed demos, lessons, and discussions where someone (or someones) lead a short session on some interesting piece of technology or theory.

We'll run pop-ups as needed throughout the workshop!

For technical details on pop-ups, see the pop-up resources page!

Popup topics we anticipate:

Audience suggested

  • Google cloud
  • How to use ChatGPT for fun and profit
    • GPT3 is not reliable, but 4 is good
  • Github copilot
  • Github Actions
  • Github codespace
  • Github license
    • scholarly communications at UC Davis
  • Bash command line variables

Pre-existing ideas

  • hackmd for collaborative Markdowning
  • RStudio Server, VS Code, etc.
  • Screen, tmux
  • SSH tunneling - pop up
  • Editing bash profile
  • tmux/screen popup
  • Obsidian, and/or hackmd extras
  • Advanced github? Branches, PRs?
  • RMarkdown
  • JupyterLab
  • GitHub Desktop
  • How to post an issue on the repository of a package I’m having trouble with
  • Getting your data ready to apply ML and AI

Please add your own questions and suggestions to this section!

Background reading

Here are a few papers for encouragement!

Best Practices for Scientific Computing, Wilson et al., 2014.

Ten simple rules for making research software more robust, by Taschuk and Wilson, 2017.

Streamlining data-intensive biology with workflow systems, Reiter et al., 2021.

Principles for data analysis workflows, Stoudt, Vazquez, and Martinez, 2021.

Perspectives on automated composition of workflows in the life sciences, Lamprecht et al., 2021.

Funding and support acknowledgement

This workshop is supported by internal funding to the UC Davis DataLab!