Chapter_4.Rmd

---
title: "Chapter 4"
author: "Julin Maloof"
date: "2022-09-24"
output: 
  html_document: 
    keep_md: yes
---

## Quiz

1. What is the result of subsetting a vector with positive integers, negative integers, a logical vector, or a character vector?

_Positive integers select elements by positions; negative integers exclude elements by positions; a logical vector selects by position and needs to be the same length as the vector (or maybe it can be recycled); a character vector selects by name.

2. What’s the difference between [, [[, and $ when applied to a list? 

_[ returns a list with those elements, [[ extracts that element and doesn't return a list (unless the element itself is a list), $ will pull out an element by name_

3. When should you use drop = FALSE?

When you do not want a data frame to be converted to a vector (or maybe when you don't want to lose dimensions)

4. If x is a matrix, what does x[] <- 0 do? How is it different from x <- 0?

_`x[] <- 0` will set all elements to 0.  `x <- 0` will change x to a vector of length 1 with a value of 0._ 

5. How can you use a named vector to relabel categorical variables?

_I am not sure what this is asking_

## 4.2 Selecting multiple elements

skipped what I knew... this is the code for selecting from matrices with matrices:

```{r}
vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
vals
#>      [,1]  [,2]  [,3]  [,4]  [,5] 
#> [1,] "1,1" "1,2" "1,3" "1,4" "1,5"
#> [2,] "2,1" "2,2" "2,3" "2,4" "2,5"
#> [3,] "3,1" "3,2" "3,3" "3,4" "3,5"
#> [4,] "4,1" "4,2" "4,3" "4,4" "4,5"
#> [5,] "5,1" "5,2" "5,3" "5,4" "5,5"

vals[c(4, 15)] #Selects the 4th and 15th elements, going down columns

```

```{r}
select <- matrix(ncol = 2, byrow = TRUE, c(
  1, 1,
  3, 1,
  2, 4
))

select

vals[select]

```
Oh, cool, so these are selecting by "address"

### 4.2.6 Exercises

#### 1. Fix each of the following common data frame subsetting errors:

__Original__
```{r, eval=FALSE}
mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]
```

__Fixed__
```{r}
mtcars[mtcars$cyl == 4, ]
mtcars[-1:-4, ]
mtcars[mtcars$cyl <= 5,]
mtcars[mtcars$cyl == 4 | mtcars$cyl == 6, ]
```

#### 2. Why does the following code yield five missing values? (Hint: why is it different from x[NA_real_]?)


```{r}
x <- 1:5
x[NA]
x[NA_real_]
```

_I guess because [NA] is recycled and when you have an NA value it always returns NA in extraction.  But why is [NA_real_] different? Maybe because it is testing for NA_Real?

#### 3. What does upper.tri() return? How does subsetting a matrix with it work? Do we need any additional subsetting rules to describe its behaviour?

```{r}
x <- outer(1:5, 1:5, FUN = "*")
x
upper.tri(x)
x[upper.tri(x)]
```

_`upper.tri()` returns a matrix of TRUES and FALSES where TRUES correspond to the upper triangle.  We do not need any additional rules_

#### 4. Why does mtcars[1:20] return an error? How does it differ from the similar mtcars[1:20, ]?

```{r}
dim(mtcars)
```

_When you are subsetting a data frame and do not include a comma, it subsets columns.  `mtcars` has less than 20 columns._

#### 5. Implement your own function that extracts the diagonal entries from a matrix (it should behave like diag(x) where x is a matrix).

```{r}
m <- matrix(1:32, ncol=8)
m
diag(m)

diag2 <- function(x) {
  end <- ifelse(nrow(x) > ncol(x), length(x), nrow(x)^2) #anyway more clever to do this?
  d <- seq(1, end, by=nrow(x)+1)
  x[d]
}

diag2(m)

```

alternate way
```{r}
diag3 <- function(x) {
  select <- 1:min(dim(x))
  select <- cbind(select,select)
  x[select]
}

diag(matrix(1:32,ncol=4))
diag3(matrix(1:32,ncol=4))

diag(matrix(1:32,ncol=8))
diag3(matrix(1:32,ncol=8))
```


#### 6. What does df[is.na(df)] <- 0 do? How does it work?

```{r}
df <- as.data.frame(m)
df
df[3,2] <- NA
df
df[is.na(df)] <- 0
df
```
_subsets the df to positions with NA and replaces them with 0_

## 4.3

Huh?
```{r, eval=FALSE}
for (i in 2:length(x)) {
  out[[i]] <- fun(x[[i]], out[[i - 1]])
}
```


### Exercises 4.3.5

#### 1. Brainstorm as many ways as possible to extract the third value from the cyl variable in the mtcars dataset.

```{r}
mtcars[3,"cyl"]
mtcars[3,2]
mtcars$cyl[3]
mtcars[["cyl"]][3]
purrr::pluck(mtcars, "cyl", 3)
```


#### 2. Given a linear model, e.g., mod <- lm(mpg ~ wt, data = mtcars), extract the residual degrees of freedom. Then extract the R squared from the model summary (summary(mod))

```{r}
mod <- lm(mpg ~ wt, data = mtcars)
str(mod)
```

```{r}
mod$df.residual
```
```{r}
mod.sum <- summary(mod)
str(mod.sum)
```

```{r}
mod.sum$r.squared
```

### Exercises 4.5.9

#### 1. How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?

```{r}
df <- data.frame(x=LETTERS[1:5], y=1:5, z=letters[26:22])
df

#columns
df[,sample(ncol(df))]

#columns and rows

df[sample(nrow(df)), sample(ncol(df))]

```


#### 2. How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?

_I think the question means a random subset or random subsample_
```{r}
# 5 rows at random
mtcars[sample(nrow(mtcars), size = 5),]

# contiguous 5 rows
start <- sample(nrow(mtcars)-4, size=1)

mtcars[start:(start+4),]

```


#### 3. How could you put the columns in a data frame in alphabetical order?

```{r}
mtcars[, order(colnames(mtcars))]
```