Skip to content
This repository has been archived by the owner on Apr 22, 2022. It is now read-only.

ST-314-nilsstreedain/DA2

Repository files navigation

Data Analysis 2 - Exploring Data

Part 1. (5 Points)

A paint manufacturing company claims the following in one of its advertisements.

“We have the fastest drying paint! In a designed, randomized experiment, our paint dried faster than our competitors paint.”

Below is the data from the experiment:

Dry time in minutes for Manufacturer’s Paint 33.4 29.2 35.7
Dry time in minutes for Competition’s Paint 33.5 35.8 29.4
  1. (1 point) What are the average dry times for each company’s paint?
  2. (2 points) Is the manufacturers claim truthful? Either way, is the advertisement misleading? Why or why not?
  3. (2 points) Suppose the company advertising the faster drying paint performed the experiment themselves. Why could this be a potential problem?

Part 2. (18 Points)

On Canvas, you’ll find the R script, One_Variable_Display_and_Summary_Stats.R and the ST314 student survey dataset, st314_student_survey.csv. You’ll use both of these to explore one categorical and one quantitative variable from the survey. Download the R script and the dataset, open the R script and follow the command instructions. Then answer the following questions:

Categorical Variable

The variable “Major” describes the individual’s major in school. The variable “Phone” identifies the type of phone the individual has (iOS, Android, other). Both of these variables are categorical. Select one of the two categorical variables just mentioned and answer the following three questions.

  1. (1 point) Choose a categorical variable to explore. Which variable did you choose?
  2. (2 point) Paste the table of counts and bar chart for the categorical variable of your choosing. Include color and an appropriate title on your plot.
  3. (2 point) Briefly, describe the distribution in context. Recall, categorical variables are summarized by counts and/or percents.

Quantitative Variable

The variable “Credit Hours” indicates the number of credit hours the individual was enrolled in during the term the survey was completed. The variable “Gaming Hours” describes approximately how many hours a week the survey participant games. Both of these variables are quantitative. Select one of the two quantitative variables just mentioned and answer the following three questions.

  1. (1 point) Choose a quantitative variable to explore. Which variable did you choose? Is the variable discrete or continuous?
  2. (2 point) Create a histogram of the variable. Include color and an appropriate title on your plot. Paste plot.
  3. (2 point) Create a boxplot of the variable. Include color and an appropriate title on your plot. Paste plot.
  4. (1 point) Which plot do you prefer (histogram or boxplot) to visualize the variable? Why?
  5. (2 points) Give a table that includes the mean, standard deviation, minimum, 1st quartile, median, 3rd quartile, maximum and IQR.
  6. (3 points) Use the plots and summary statistics to describe the data in the context of the problem. Include the shape, center and spread in your description. State whether there are any outliers.
  7. (2 points) Given the shape of the data which measure, the mean, median or either, would be a more appropriate to represent the center of the data? Explain your reasoning.

Gradescope Page Matching (2 points)

When you upload your PDF file to Gradescope, you will need to match each question on this assignment to the correct pages. Video instructions for doing this are available in the Start Here module on Canvas on the page “Submitting Assignments in Gradescope”. Failure to follow these instructions will result in a 2-point deduction on your assignment grade. Match this page to outline item “Gradescope Page Matching”.