tags |
---|
summer2023, collaboratory |
[toc]
When: June 20th-30th, 2023, ~9am-5pm each day
Where: Shields Library 360 (DataLab), UC Davis main campus.
Contact: [email protected], [email protected]
Parking suggestion: lots 5/5A are not that far, and you can walk through a small redwood grove! MU parking structure is also nearby (A and C parking). TAPS enforces parking; day parking permits can be purchased through the ParkMobile app.
Food: DataLab staff have started a list of quick lunch options on and around campus. Note: on campus venues may be closed due to summer break.
All information will be posted to our GitHub repository, ngs-docs/2023-june-datalab-collaboratory, and will be available indefinitely.
This workshop is focused on enabling attendees to improve and expand their existing workflows. All activities are optional but we hope to keep it interesting enough that everyone will attend and participate in the all-hands sessions. But you are also welcome to hide out in a corner and work on your own problems and ask for help periodically!
In particular, we hope to fill in a lot of gaps for people in their mental models of computing, and provide many ideas for how to improve the efficiency with which you work and compute!
This is designed to be a super-friendly workshop where you can ask all those questions about computing that you never felt comfortable asking before.
We're looking forward to seeing you all!
- Pamela
- data science, R, team science, etc!
- Hannah
- VScode, github desktop(git GUI), R
- Wes
- Statistics and R programming, etc.
- Sophie
- pop gen, workflows
- Dani
- HPC
- Mo
- ChatGPT and Git Colab, python
- Nistara
- R, some git (command line), emacs
- Makan
- Pop-up leader for AI/ML
- Colton
- workflow, machine learning, multiomics, image processing, benchmarking
- Nick
- statistics, R, Python, Julia, etc.
Each all-hands session below will follow the same basic format:
- intro to topic (15 min)
- Q&A, discussion and comparison (30 min - 1 hour)
- break out into facilitated co-working groups
- reconvene at end to coalesce and retrospect; take notes for pop-ups.
We expect to have "pop-up" sessions on additional topics or techniques as needed/desired.
Days will start at ~9:15, with lunch from noon-2pm; we will end before 5pm every day!
Setup:
- make sure you're on wifi (eduroam) and slack (DataLab, #2023-june-collaboratory)!
9:30am: Morning: welcome & introductions
Sticky note exercise/questions: write 3 sticky notes and put them in groups on the back whiteboard!
- Name + scientific domain
- Name + computational tools/approach/??
- Name + goal for workshop (automation, scalability, validation, ???), or "what you want to work on most".
Lunch: pizza!
2pm: Afternoon session: pinning your project down with version control (git and github)
Need help with git
, R
, python
, or any other data science topics? Check the directory of Datalab workshops!
9:30am Morning session: (ab)using the HPC for fun and profit (slurm, srun, and sbatch)
- additional topic: setting your default editor on Linux
2pm: Afternoon session: software installations that (usually) just work (conda)
Morning session: automating the heck out everything (shell scripts, R, Python)
Afternoon session: dude, where's my file? (organizing your files)
Morning session: automating stuff even more with workflow systems (snakemake)
Afternoon session: finishing stuff off
Work day + pop-up topics; schedule TBD
Work day + pop-up topics; schedule TBD
Work day + pop-up topics; schedule TBD
Work day + pop-up topics; schedule TBD
Work day + pop-up topics; schedule TBD
(workshop ends at noon)
Throughout this workshop, you may find you have downtime and/or need some sort of short term direction. Here are some ideas for what you can do!
- write a brief markdown document (maybe on hackmd??) describing how to run your project.
- make a small test or example data set or analysis that is (a) quick to run and (b) uses most of your scripts. This is a good way to make sure that your scripts still work and is really helpful for us in trying to help you!
- refactor/edit your scripts to run out of a single working directory - without absolute paths, etc. etc.
- diagram out your workflow!
- play with technology in a safe environment! hackmd, etc.
- take a step back and think about what you'd like to try out or achieve while you have lots of expert help around!
Pop-ups are as-needed demos, lessons, and discussions where someone (or someones) lead a short session on some interesting piece of technology or theory.
We'll run pop-ups as needed throughout the workshop!
For technical details on pop-ups, see the pop-up resources page!
Popup topics we anticipate:
Audience suggested
- Google cloud
- How to use ChatGPT for fun and profit
- GPT3 is not reliable, but 4 is good
- Github copilot
- Github Actions
- Github codespace
- Github license
- scholarly communications at UC Davis
- Bash command line variables
Pre-existing ideas
- hackmd for collaborative Markdowning
- RStudio Server, VS Code, etc.
- Screen, tmux
- SSH tunneling - pop up
- Editing bash profile
- tmux/screen popup
- Obsidian, and/or hackmd extras
- Advanced github? Branches, PRs?
- RMarkdown
- JupyterLab
- GitHub Desktop
- How to post an issue on the repository of a package I’m having trouble with
- Getting your data ready to apply ML and AI
Please add your own questions and suggestions to this section!
Here are a few papers for encouragement!
Best Practices for Scientific Computing, Wilson et al., 2014.
Ten simple rules for making research software more robust, by Taschuk and Wilson, 2017.
Streamlining data-intensive biology with workflow systems, Reiter et al., 2021.
Principles for data analysis workflows, Stoudt, Vazquez, and Martinez, 2021.
Perspectives on automated composition of workflows in the life sciences, Lamprecht et al., 2021.
This workshop is supported by internal funding to the UC Davis DataLab!