From b81ba9edf83a279cc7062262c04f172fe532d312 Mon Sep 17 00:00:00 2001 From: Aleksandra Badaczewska Date: Wed, 21 Feb 2024 17:11:06 -0600 Subject: [PATCH] added hands-on exercises with solutions and extensive cross-linking to other tutorials --- .../03-PRODUCTIVITY/03-reproducibility.md | 204 ++++++++++++++++++ 1 file changed, 204 insertions(+) diff --git a/09-ProjectManagement/03-PRODUCTIVITY/03-reproducibility.md b/09-ProjectManagement/03-PRODUCTIVITY/03-reproducibility.md index ef66d6fb..3e44ddae 100644 --- a/09-ProjectManagement/03-PRODUCTIVITY/03-reproducibility.md +++ b/09-ProjectManagement/03-PRODUCTIVITY/03-reproducibility.md @@ -557,6 +557,210 @@ Open-source statistical and computational tools offer powerful, transparent, and --- +# Practice Your Skills + +These exercises are crafted to provide you with practical skills and hands-on experience necessary for conducting reproducible research. By engaging with these tasks, you'll not only learn to use essential tools but also understand how each contributes to the overall reproducibility of your research projects. + +### E1: Version Control Basics + +This exercise is designed to illustrate the critical role of version control in tracking and managing changes in research projects, promoting collaboration, and ensuring that all project versions are accessible and reproducible. + +
+Version Control Basics with Git +

Objective:
+Familiarize yourself with Git for version control to manage changes in your research project efficiently. +

Instructions:
+1. Install Git: Download and install Git from https://git-scm.com/.
+2. Create a Repository: +
  • Open Terminal (Mac/Linux) or Git Bash (Windows).
  • +
  • Navigate to your project folder and run git init to initialize a new Git repository on your local machine.
  • +3. Make Your First Commit: +
  • Create a file in the repository (e.g., README.md) and edit it.
  • +
  • Run git add README.md to stage the file for committing.
  • +
  • Then, commit it using git commit -m "Initial commit".
  • +4. Explore History: +
  • Make further changes to your file and commit them.
  • +
  • Then, use git log to view the history of changes.
  • +
    +PRO TIP: +
    +To see a more detailed step-by-step guide with graphical aids to complete this exercise, navigate to tutorial GIT - a distributed version control system ⤴ and follow the steps in the hands-on sections: +
  • Install Git ⤴
  • +
  • Working with Local Repos ⤴
  • +
    +
    + + +### E2: Documenting with Jupyter Notebooks + +This exercise is designed to underscore the importance of using open-source tools for data analysis, emphasizing clear, well-commented scripts that can be easily shared and replicated. + +
    +Documenting Research with Jupyter Notebooks +

    Objective:
    +Create a Jupyter Notebook that integrates code, outputs, and narrative, showcasing how this tool can enhance research documentation. +

    Instructions:
    +1. Install Jupyter: Follow the instructions at https://jupyter.org/install.
    +2. Create a New Notebook: +
  • Navigate to your project folder.
  • +
  • Launch Jupyter Notebook ausing command jupyter lab.
  • +
  • Create a new notebook file using File option in the top menu bar. It will be saved in your File System.
  • +3. Document Your Process: +
  • Write a brief introduction in a markdown cell explaining what the notebook will achieve.
  • +
  • Add separate markdown cells to explain each step, including your thought process and interpretation of results.
  • +
  • Below each markdown cell include code cells to load a dataset and perform simple analyses (you will use them in Exercise 3). +
    +PRO TIP: +
    +To see a more detailed step-by-step guide with graphical aids to complete this exercise, navigate to tutorial Jupyter: Web-Based Programming Interface ⤴ and follow the installation steps in the Installing Jupyter ⤴ hands-on section. Then jump to Getting Started with JupyterLab ⤴ tutorial to launch Jupyter interface, learn about components in the GUI, create new notebook and add cells of various types. To start using rich text markup in your documentation, check out the Introduction to Markdown ⤴ tutorial. +
    +
  • + + +### E3: Data Analysis in R or Python + +This exercise is designed to underscore the importance of using open-source tools for data analysis, emphasizing clear, well-commented scripts that can be easily shared and replicated. + +
    +Data Analysis in R or Python +

    Objective:
    +Perform a basic data analysis using R or Python, focusing on making the script reproducible. +

    Instructions:
    +1. Choose Your Tool +
  • Install R (programming language) and RStudio (develpment environment, DE)
  • +  and/or +
  • Insytall Python (programming language) and use Jupyter Notebook (develpment environment, DE)
  • +2. Analysis Task: +
  • Choose a simple dataset (e.g., Iris dataset is available in both R and Python).
  • +
  • Load the dataset.
  • +
    See Code Example + +
    +#1 Loading the Iris Dataset in R
    +The Iris dataset is available in R by default through the datasets package, which is part of the standard R distribution. No additional installation is required for this exercise.
    +# Load the Iris dataset +data(iris) +# View the first few rows of the dataset +head(iris) +
    +#2 Loading the Iris Dataset in Python
    +The Iris dataset can be loaded using Pandas. While Pandas is not in the Python Standard Library, it is widely used for data manipulation and analysis tasks. If you don't have Pandas installed, you can install it using the following command in the terminal: pip install pandas seaborn
    +Then, you can load the the Iris dataset in a Jupyter Notebook or script file: +# Import pandas and seaborn +import pandas as pd +import seaborn as sns +# Load the iris dataset +iris = sns.load_dataset('iris') +# View the first few rows of the dataset +iris.head() + +
    +
    +3. Perform Basic Analysis: +
  • Perform simple statistical summaries (mean, median, standard deviation).
  • +
  • Create a basic plot (e.g., scatter plot or histogram).
  • +
    See Code Example (step 1) + +
    +#1 Exploring the Iris Dataset in R
    +Using R's built-in functions, we can quickly generate summary statistics and species counts, providing insights into the dataset's composition without the need for additional packages.
    +# Summary statistics +summary(iris) +# Count of species +table(iris$Species) +
    +#2 Exploring the Iris Dataset in Python
    +Exploring the Iris dataset in Python requires the use of Pandas for data manipulation and analysis. Pandas' functions, such as describe() for summary statistics and value_counts() for species counts, allow for an in-depth exploration of the dataset. (This step assumes Pandas is installed and imported.)
    +# Summary statistics +iris.describe() +# Count of species +iris['species'].value_counts() + +
    +
    +
    See Code Example (step 2) + +
    +This step enhances our exploratory data analysis by allowing us to observe the distribution of sepal lengths and widths across different species.

    +#1 Ploting the Iris Dataset in R
    +To visualize the Iris dataset in R, we utilize the ggplot2 package, a powerful tool for creating complex plots from data in a dataframe. If ggplot2 is not already installed, it can be easily added.
    +# Install ggplot2 if not already installed +if(!require(ggplot2)) install.packages("ggplot2") +library(ggplot2) +# Scatter plot of Sepal.Length vs Sepal.Width colored by Species +ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point() + + theme_minimal() + ggtitle("Sepal Length vs. Sepal Width") +
    +#2 Ploting the Iris Dataset in Python
    +To plot the Iris dataset in Python, both matplotlib and seaborn libraries are essential. These libraries are not part of the Python Standard Library but can be installed via pip. Together, they provide a comprehensive toolkit for creating a variety of visualizations, including scatter plots. If needed, use the following command in your terminal: pip install matplotlib seaborn
    +# Import matplotlib for plotting +import matplotlib.pyplot as plt +import seaborn as sns # optional here, if loaded earlier along with pandas +# Scatter plot of Sepal.Length vs Sepal.Width colored by Species +sns.scatterplot(data=iris, x="sepal_length", y="sepal_width", hue="species") +plt.title("Sepal Length vs. Sepal Width") +plt.show() + +
    +
    +4. Ensure Reproducibility: +
  • Comment your code thoroughly.
  • +
  • Include instructions for installing any required packages. Include info about used versions.
  • +   (You can draw inspiration from the code block instructions provided in the earlier steps of this exercise.) +
  • Save your script or notebook file.
  • +
    +PRO TIP: +
    +To see a more detailed step-by-step guide with graphical aids to complete this exercise, explore the following tutorials: +
  • Getting Started with JupyterLab on a local machine ⤴
  • +
  • Python Programming Environment(s) ⤴
  • +
  • Introduction to Python Programming ⤴
  • +
  • Pandas Library - Data Structure Manipulation Tool ⤴

  • +
  • RStudio: Integrated Environment for R Programming ⤴
  • +
  • R Programming Environment(s) ⤴
  • +
  • Introduction to R programming ⤴
  • +
  • Ggplot2 - R package for customizable graphs and charts ⤴
  • +
    +
    +
    + + +### E4: Share Your Work with Collaboartors + +This exercise is designed to highlight the importance of collaboration and community engagement in ensuring reproducibility and quality in research. + +
    +Community Standards and Collaboration using GitHub +

    Objective:
    +Explore collaborative features of GitHub, such as issues and pull requests, to understand community standards in code development. +

    Instructions:
    +1. Create a GitHub Account online: Sign up at https://github.com/ if you haven't already. (It is free.)
    +2. Ceate Your Own Empty Repository: +
  • Navigate to the Repositories tab and click the "New" button.
  • +
  • Name your repository and add a brief description. Ensure the repository is set to public.
  • +
  • Initialize the repository without a README, .gitignore, or license (because you initialized your repo locally in the previous exercise).
  • +
  • Navigate to your GitHub repository page, click the "Code" button, and then copy the URL of your repo to the clipboard. (You will use it in the next step.)
  • +3. Push Changes from Your Locally Created Git Repository: +
  • Open terminal or Git Bash on your computing machine.
  • +
  • Navigate to the folder containing your Jupyter Notebook.
  • +
  • Link your local repository to the GitHub repository you created using
  • +   git remote add origin [repository-URL] +
  • Push your changes to GitHub using
  • +   git push -u origin master or git push -u origin main
    +   depending on your default branch name.
    +4. Share Your Notebook via GitHub: +
  • Ensure your Jupyter Notebook is clearly named and includes comments or markdown cells that explain the analysis steps.
  • +
  • After pushing, navigate to your GitHub repository online to verify that your notebook is visible and accessible.
  • +
  • Once your Jupyter Notebook is on GitHub, you can preview it directly in the browser and copy the URL to share it with your collaborators. For broader context, you can also send the link to the entire repository if needed, allowing others to access not only the notebook but any associated data and documentation.

  • +Optional Steps for Further Engagement:
    +5. Fork a Repository: Explore other users' repositories related to your research interests, fork one, and consider contributing by adding improvements or additional analysis.
    +6. Engage with the Community: Participate in discussions, open issues for any bugs you find, or offer solutions to existing issues. Collaboration is key to advancing reproducible research. +
    +PRO TIP: +
    +To see a more detailed step-by-step guide with graphical aids to complete this exercise, navigate to tutorial GIT - a distributed version control system ⤴ and follow the steps in the Working with Remote Repos ⤴ hands-on section. You can engage further by exploring the Collaborating on Projects ⤴ section. +
    +
    ___