Skip to content

Latest commit

 

History

History
1367 lines (816 loc) · 54.1 KB

swirl_R_program.md

File metadata and controls

1367 lines (816 loc) · 54.1 KB

You can find the PDF and doc version at http://pan.baidu.com/s/1dDqyeo5

===========

  • Part2 1: R Programming 2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Sequences of Numbers 3: Vectors 4: Missing Values 5: Subsetting Vectors 6: Matrices and Data Frames

Selection: 2

| | 0%

| In this lesson, you'll learn how to create sequences of numbers in R.

...

|====== | 5%

| The simplest way to create a sequence of numbers in R is by using the : operator. Type 1:20 to see how it works.

1:20 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

| You are really on a roll!

|============ | 9%

| That gave us every integer between (and including) 1 and 20. We could also use it to create a sequence of real numbers. For example, try | pi:10.

pi:10 [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593

| You are doing so well!

|================== | 14%

| The result is a vector of real numbers starting with pi (3.142...) and increasing in increments of 1. The upper limit of 10 is never | reached, since the next number in our sequence would be greater than 10.

...

|======================== | 18%

| What happens if we do 15:1? Give it a try to find out.

15:1 [1] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

| You're the best!

|============================== | 23%

| It counted backwards in increments of 1! It's unlikely we'd want this behavior, but nonetheless it's good to know how it could happen.

...

|==================================== | 27%

| Remember that if you have questions about a particular R function, you can access its documentation with a question mark followed by the | function name: ?function_name_here. However, in the case of an operator like the colon used above, you must enclose the symbol in | backticks like this: ?:. (NOTE: The backtick (`) key is generally located in the top left corner of a keyboard, above the Tab key. If | you don't have a backtick key, you can use regular quotes.)

...

|========================================== | 32%

| Pull up the documentation for : now.

?:

| You're the best!

|================================================ | 36%

| Often, we'll desire more control over a sequence we're creating than what the : operator gives us. The seq() function serves this | purpose.

...

|====================================================== | 41%

| The most basic use of seq() does exactly the same thing as the : operator. Try seq(1, 20) to see this.

seq(1, 20) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

| You are quite good my friend!

|============================================================ | 45%

| This gives us the same output as 1:20. However, let's say that instead we want a vector of numbers ranging from 0 to 10, incremented by | 0.5. seq(0, 10, by=0.5) does just that. Try it out.

seq(0, 10, by=0.5) [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0

| You're the best!

|================================================================== | 50%

| Or maybe we don't care what the increment is and we just want a sequence of 30 numbers between 5 and 10. seq(5, 10, length=30) does the | trick. Give it shot now and store the result in a new variable called my_seq.

my_seq<-seq(5, 10, length=30)

| You're the best!

|======================================================================= | 55%

| To confirm that my_seq has length 30, we can use the length() function. Try it now.

length(my_seq) [1] 30

| You are doing so well!

|============================================================================= | 59%

| Let's pretend we don't know the length of my_seq, but we want to generate a sequence of integers from 1 to N, where N represents the | length of the my_seq vector. In other words, we want a new vector (1, 2, 3, ...) that is the same length as my_seq.

...

|=================================================================================== | 64%

| There are several ways we could do this. One possibility is to combine the : operator and the length() function like this: | 1:length(my_seq). Give that a try.

1:length(my_seq) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

| You got it!

|========================================================================================= | 68%

| Another option is to use seq(along = my_seq). Give that a try.

seq(along = my_seq) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

| You are quite good my friend!

|=============================================================================================== | 73%

| However, as is the case with many common tasks, R has a separate built-in function for this purpose called seq_along(). Type | seq_along(my_seq) to see it in action.

seq_along(my_seq) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

| Great job!

|===================================================================================================== | 77%

| There are often several approaches to solving the same problem, particularly in R. Simple approaches that involve less typing are | generally best. It's also important for your code to be readable, so that you and others can figure out what's going on without too much | hassle.

...

|=========================================================================================================== | 82%

| If R has a built-in function for a particular task, it's likely that function is highly optimized for that purpose and is your best | option. As you become a more advanced R programmer, you'll design your own functions to perform tasks when there are no better options. | We'll explore writing your own functions in future lessons.

...

|================================================================================================================= | 86%

| One more function related to creating sequences of numbers is rep(), which stands for 'replicate'. Let's look at a few uses.

...

|======================================================================================================================= | 91%

| If we're interested in creating a vector that contains 40 zeros, we can use rep(0, times = 40). Try it out.

rep(0, times = 40) [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

| Keep up the great work!

|============================================================================================================================= | 95%

| If instead we want our vector to contain 10 repetitions of the vector (0, 1, 2), we can do rep(c(0, 1, 2), times = 10). Go ahead.

rep(c(0, 1, 2), times = 10) [1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2

| Excellent job!

|===================================================================================================================================| 100%

| Finally, let's say that rather than repeating the vector (0, 1, 2) over and over again, we want our vector to contain 10 zeros, then 10 | ones, then 10 twos. We can do this with the each argument. Try rep(c(0, 1, 2), each = 10).

rep(c(0, 1, 2), each = 10) [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

| You are doing so well!

Part3 1: R Programming 2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Sequences of Numbers 3: Vectors 4: Missing Values 5: Subsetting Vectors 6: Matrices and Data Frames

Selection: 3

| | 0%

| The simplest and most common data structure in R is the vector.

...

|==== | 3%

| Vectors come in two different flavors: atomic vectors and lists. An atomic vector contains exactly one data type, whereas a list may | contain multiple data types. We'll explore atomic vectors further before we get to lists.

...

|======= | 5%

| In previous lessons, we dealt entirely with numeric vectors, which are one type of atomic vector. Other types of atomic vectors include | logical, character, integer, and complex. In this lesson, we'll take a closer look at logical and character vectors.

...

|=========== | 8%

| Logical vectors can contain the values TRUE, FALSE, and NA (for 'not available'). These values are generated as the result of logical | 'conditions'. Let's experiment with some simple conditions.

...

|============== | 11%

| First, create a numeric vector num_vect that contains the values 0.5, 55, -10, and 6.

num_vect <- c( 0.5, 55, -10, 6)

| Keep up the great work!

|================== | 14%

| Now, create a variable called tf that gets the result of num_vect < 1, which is read as 'num_vect is less than 1'.

tf <- num_vect < 1

| Nice work!

|===================== | 16%

| What do you think tf will look like?

1: a single logical value 2: a vector of 4 logical values

Selection: 2

| Great job!

|========================= | 19%

| Print the contents of tf now.

tf [1] TRUE FALSE TRUE FALSE

| Excellent job!

|============================ | 22%

| The statement num_vect < 1 is a condition and tf tells us whether each corresponding element of our numeric vector num_vect satisfies this | condition.

...

|================================ | 24%

| The first element of num_vect is 0.5, which is less than 1 and therefore the statement 0.5 < 1 is TRUE. The second element of num_vect is | 55, which is greater than 1, so the statement 55 < 1 is FALSE. The same logic applies for the third and fourth elements.

...

|=================================== | 27%

| Let's try another. Type num_vect >= 6 without assigning the result to a new variable.

num_vect >= 6 [1] FALSE TRUE FALSE TRUE

| You got it right!

|======================================= | 30%

| This time, we are asking whether each individual element of num_vect is greater than OR equal to 6. Since only 55 and 6 are greater than | or equal to 6, the second and fourth elements of the result are TRUE and the first and third elements are FALSE.

...

|========================================== | 32%

| The < and >= symbols in these examples are called 'logical operators'. Other logical operators include >, <=, == for exact | equality, and != for inequality.

...

|============================================== | 35%

| If we have two logical expressions, A and B, we can ask whether at least one is TRUE with A | B (logical 'or' a.k.a. 'union') or whether | they are both TRUE with A & B (logical 'and' a.k.a. 'intersection'). Lastly, !A is the negation of A and is TRUE when A is FALSE and vice | versa.

...

|================================================== | 38%

| It's a good idea to spend some time playing around with various combinations of these logical operators until you get comfortable with | their use. We'll do a few examples here to get you started.

...

|===================================================== | 41%

| Try your best to predict the result of each of the following statements. You can use pencil and paper to work them out if it's helpful. If | you get stuck, just guess and you've got a 50% chance of getting the right answer!

...

|========================================================= | 43%

| (3 > 5) & (4 == 4)

1: FALSE 2: TRUE

Selection: 1

| Excellent job!

|============================================================ | 46%

| (TRUE == TRUE) | (TRUE == FALSE)

1: TRUE 2: FALSE

Selection: 1

| You got it right!

|================================================================ | 49%

| ((111 >= 111) | !(TRUE)) & ((4 + 1) == 5)

1: TRUE 2: FALSE

Selection: 1

| You are quite good my friend!

|=================================================================== | 51%

| Don't worry if you found these to be tricky. They're supposed to be. Working with logical statements in R takes practice, but your efforts | will be rewarded in future lessons (e.g. subsetting and control structures).

...

|======================================================================= | 54%

| Character vectors are also very common in R. Double quotes are used to distinguish character objects, as in the following example.

...

|========================================================================== | 57%

| Create a character vector that contains the following words: "My", "name", "is". Remember to enclose each word in its own set of double | quotes, so that R knows they are character strings. Store the vector in a variable called my_char.

my_char <- c("My","name","is")

| You are doing so well!

|============================================================================== | 59%

| Print the contents of my_char to see what it looks like.

my_char [1] "My" "name" "is"

| You got it!

|================================================================================= | 62%

| Right now, my_char is a character vector of length 3. Let's say we want to join the elements of my_char together into one continuous | character string (i.e. a character vector of length 1). We can do this using the paste() function.

...

|===================================================================================== | 65%

| Type paste(my_char, collapse = " ") now. Make sure there's a space between the double quotes in the collapse argument. You'll see why in | a second.

paste(my_char, collapse = " ") [1] "My name is"

| That's a job well done!

|========================================================================================= | 68%

| The collapse argument to the paste() function tells R that when we join together the elements of the my_char character vector, we'd like | to separate them with single spaces.

...

|============================================================================================ | 70%

| It seems that we're missing something.... Ah, yes! Your name!

...

|================================================================================================ | 73%

| To add (or 'concatenate') your name to the end of my_char, use the c() function like this: c(my_char, "your_name_here"). Place your name | in double quotes where I've put "your_name_here". Try it now, storing the result in a new variable called my_name.

c(my_char, "kk") [1] "My" "name" "is" "kk"

| You almost had it, but not quite. Try again. Or, type info() for more options.

| Tack your name on to the end of the my_char vector using the c() function. Be sure to assign the result to a new variable called my_name. | If your name was "Swirl", you would type my_name <- c(my_char, "Swirl").

c(my_char, "mengnan") [1] "My" "name" "is" "mengnan"

| You're close...I can feel it! Try it again. Or, type info() for more options.

| Tack your name on to the end of the my_char vector using the c() function. Be sure to assign the result to a new variable called my_name. | If your name was "Swirl", you would type my_name <- c(my_char, "Swirl").

c(my_char, "Swirl") [1] "My" "name" "is" "Swirl"

| Almost! Try again. Or, type info() for more options.

| Tack your name on to the end of the my_char vector using the c() function. Be sure to assign the result to a new variable called my_name. | If your name was "Swirl", you would type my_name <- c(my_char, "Swirl").

my_name <- c(my_char,"qq")

| Great job!

|=================================================================================================== | 76%

| Take a look at the contents of my_name.

my_name [1] "My" "name" "is" "qq"

| You got it!

|======================================================================================================= | 78%

| Now, use the paste() function once more to join the words in my_name together into a single character string. Don't forget to say collapse | = " "!

paste(my_name,collapse = " ") [1] "My name is qq"

| That's correct!

|========================================================================================================== | 81%

| In this example, we used the paste() function to collapse the elements of a single character vector. paste() can also be used to join the | elements of multiple character vectors.

...

|============================================================================================================== | 84%

| In the simplest case, we can join two character vectors that are each of length 1 (i.e. join two words). Try paste("Hello", "world!", sep | = " "), where the sep argument tells R that we want to separate the joined elements with a single space.

paste("Hello", "world!", sep = " ") [1] "Hello world!"

| You nailed it! Good job!

|================================================================================================================= | 86%

| For a slightly more complicated example, we can join two vectors, each of length 3. Use paste() to join the integer vector 1:3 with the | character vector c("X", "Y", "Z"). This time, use sep = "" to leave no space between the joined elements.

paste(1:3, c("X", "Y", "Z"),sep = "") [1] "1X" "2Y" "3Z"

| You got it!

|===================================================================================================================== | 89%

| What do you think will happen if our vectors are of different length? (Hint: we talked about this in a previous lesson.)

...

|======================================================================================================================== | 92%

| Vector recycling! Try paste(LETTERS, 1:4, sep = "-"), where LETTERS is a predefined variable in R containing a character vector of all 26 | letters in the English alphabet.

paste(LETTERS, 1:4, sep = "-") [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1" "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" [23] "W-3" "X-4" "Y-1" "Z-2"

| You nailed it! Good job!

|============================================================================================================================ | 95%

| Since the character vector LETTERS is longer than the numeric vector 1:4, R simply recycles, or repeats, 1:4 until it matches the length | of LETTERS.

...

|=============================================================================================================================== | 97%

| Also worth noting is that the numeric vector 1:4 gets 'coerced' into a character vector by the paste() function.

...

|===================================================================================================================================| 100%

| We'll discuss coercion in another lesson, but all it really means that the numbers 1, 2, 3, and 4 in the output above are no longer | numbers to R, but rather characters "1", "2", "3", and "4".

...

Part5 1: R Programming 2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Sequences of Numbers 3: Vectors 4: Missing Values 5: Subsetting Vectors 6: Matrices and Data Frames

Selection: 5

| | 0%

| In this lesson, we'll see how to extract elements from a vector based on some conditions that we specify.

...

|=== | 3%

| For example, we may only be interested in the first 20 elements of a vector, or only the elements that are not NA, or only those that are | positive or correspond to a specific variable of interest. By the end of this lesson, you'll know how to handle each of these scenarios.

...

|======= | 5%

| I've created for you a vector called x that contains a random ordering of 20 numbers (from a standard normal distribution) and 20 NAs. | Type x now to see what it looks like.

x [1] -0.72710135 NA NA NA NA 1.72500426 0.04650719 1.05771270 NA NA NA [12] 0.56231459 NA -1.43583067 NA 2.05595356 0.02575668 -0.57366218 0.94364110 NA 0.08617605 -1.30075176 [23] 0.21624823 NA NA NA NA NA -0.88428489 -0.24683732 0.80597510 2.47432768 NA [34] NA -0.91037672 0.04124237 NA NA NA 2.33516583

| Keep up the great work!

|========== | 8%

| The way you tell R that you want to select some particular elements (i.e. a 'subset') from a vector is by placing an 'index vector' in | square brackets immediately following the name of the vector.

...

|============== | 11%

| For a simple example, try x[1:10] to view the first ten elements of x.

x[1:10] [1] -0.72710135 NA NA NA NA 1.72500426 0.04650719 1.05771270 NA NA

| You are really on a roll!

|================= | 13%

| Index vectors come in four different flavors -- logical vectors, vectors of positive integers, vectors of negative integers, and vectors | of character strings -- each of which we'll cover in this lesson.

...

|===================== | 16%

| Let's start by indexing with logical vectors. One common scenario when working with real-world data is that we want to extract all | elements of a vector that are not NA (i.e. missing data). Recall that is.na(x) yields a vector of logical values the same length as x, | with TRUEs corresponding to NA values in x and FALSEs corresponding to non-NA values in x.

...

|======================== | 18%

| What do you think x[is.na(x)] will give you?

1: A vector of TRUEs and FALSEs 2: A vector of length 0 3: A vector with no NAs 4: A vector of all NAs

Selection: Enter an item from the menu, or 0 to exit Selection: 4

| You are amazing!

|============================ | 21%

| Prove it to yourself by typing x[is.na(x)].

x[is.na(x)] [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| You are amazing!

|=============================== | 24%

| Recall that ! gives us the negation of a logical expression, so !is.na(x) can be read as 'is not NA'. Therefore, if we want to create a | vector called y that contains all of the non-NA values from x, we can use y <- x[!is.na(x)]. Give it a try.

y <- x[!is.na(x)]

| You nailed it! Good job!

|================================== | 26%

| Print y to the console.

y [1] -0.72710135 1.72500426 0.04650719 1.05771270 0.56231459 -1.43583067 2.05595356 0.02575668 -0.57366218 0.94364110 0.08617605 [12] -1.30075176 0.21624823 -0.88428489 -0.24683732 0.80597510 2.47432768 -0.91037672 0.04124237 2.33516583

| You got it!

|====================================== | 29%

| Now that we've isolated the non-missing values of x and put them in y, we can subset y as we please.

...

|========================================= | 32%

| Recall that the expression y > 0 will give us a vector of logical values the same length as y, with TRUEs corresponding to values of y | that are greater than zero and FALSEs corresponding to values of y that are less than or equal to zero. What do you think y[y > 0] will | give you?

1: A vector of length 0 2: A vector of all the negative elements of y 3: A vector of all the positive elements of y 4: A vector of all NAs 5: A vector of TRUEs and FALSEs

Selection: 3

| You are really on a roll!

|============================================= | 34%

| Type y[y > 0] to see that we get all of the positive elements of y, which are also the positive elements of our original vector x.

y[y > 0] [1] 1.72500426 0.04650719 1.05771270 0.56231459 2.05595356 0.02575668 0.94364110 0.08617605 0.21624823 0.80597510 2.47432768 0.04124237 [13] 2.33516583

| You are quite good my friend!

|================================================ | 37%

| You might wonder why we didn't just start with x[x > 0] to isolate the positive elements of x. Try that now to see why.

x[x > 0] [1] NA NA NA NA 1.72500426 0.04650719 1.05771270 NA NA NA 0.56231459 NA [13] NA 2.05595356 0.02575668 0.94364110 NA 0.08617605 0.21624823 NA NA NA NA NA [25] 0.80597510 2.47432768 NA NA 0.04124237 NA NA NA 2.33516583

| Excellent job!

|==================================================== | 39%

| Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0 evaluates to NA. Hence we get a bunch of | NAs mixed in with our positive numbers when we do this.

...

|======================================================= | 42%

| Combining our knowledge of logical operators with our new knowledge of subsetting, we could do this -- x[!is.na(x) & x > 0]. Try it out.

x[!is.na(x) & x > 0] [1] 1.72500426 0.04650719 1.05771270 0.56231459 2.05595356 0.02575668 0.94364110 0.08617605 0.21624823 0.80597510 2.47432768 0.04124237 [13] 2.33516583

| You are doing so well!

|=========================================================== | 45%

| In this case, we request only values of x that are both non-missing AND greater than zero.

...

|============================================================== | 47%

| I've already shown you how to subset just the first ten values of x using x[1:10]. In this case, we're providing a vector of positive | integers inside of the square brackets, which tells R to return only the elements of x numbered 1 through 10.

...

|================================================================== | 50%

| Many programming languages use what's called 'zero-based indexing', which means that the first element of a vector is considered element | 0. R uses 'one-based indexing', which (you guessed it!) means the first element of a vector is considered element 1.

...

|===================================================================== | 53%

| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint -- Use the c() function to specify the element numbers as a | numeric vector.

x[c(3,5,7)] [1] NA NA 0.04650719

| Great job!

|======================================================================== | 55%

| It's important that when using integer vectors to subset our vector x, we stick with the set of indexes {1, 2, ..., 40} since x only has | 40 elements. What happens if we ask for the zeroth element of x (i.e. x[0])? Give it a try.

x[0] numeric(0)

| You're the best!

|============================================================================ | 58%

| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent us from doing this. What if we ask for the 3000th element of | x? Try it out.

x(3000) Error: could not find function "x" x[3000] [1] NA

| You are quite good my friend!

|=============================================================================== | 61%

| Again, nothing useful, but R doesn't prevent us from asking for it. This should be a cautionary tale. You should always make sure that | what you are asking for is within the bounds of the vector you're working with.

...

|=================================================================================== | 63%

| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It would be pretty tedious to construct a vector containing all | numbers 1 through 40 EXCEPT 2 and 10.

...

|====================================================================================== | 66%

| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd and 10th elements of x, x[c(-2, -10)] gives us all | elements of x EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see this.

x[c(-2,-10)] [1] -0.72710135 NA NA NA 1.72500426 0.04650719 1.05771270 NA NA 0.56231459 NA [12] -1.43583067 NA 2.05595356 0.02575668 -0.57366218 0.94364110 NA 0.08617605 -1.30075176 0.21624823 NA [23] NA NA NA NA -0.88428489 -0.24683732 0.80597510 2.47432768 NA NA -0.91037672 [34] 0.04124237 NA NA NA 2.33516583

| You are quite good my friend!

|========================================================================================== | 68%

| A shorthand way of specifying multiple negative numbers is to put the negative sign out in front of the vector of positive numbers. Type | x[-c(2, 10)] to get the exact same result.

x[-c(2,10)] [1] -0.72710135 NA NA NA 1.72500426 0.04650719 1.05771270 NA NA 0.56231459 NA [12] -1.43583067 NA 2.05595356 0.02575668 -0.57366218 0.94364110 NA 0.08617605 -1.30075176 0.21624823 NA [23] NA NA NA NA -0.88428489 -0.24683732 0.80597510 2.47432768 NA NA -0.91037672 [34] 0.04124237 NA NA NA 2.33516583

| You nailed it! Good job!

|============================================================================================= | 71%

| So far, we've covered three types of index vectors -- logical, positive integer, and negative integer. The only remaining type requires us | to introduce the concept of 'named' elements.

...

|================================================================================================= | 74%

| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf = NA).

vect <- c(foo = 11, bar = 2, norf = NA)

| You are really on a roll!

|==================================================================================================== | 76%

| When we print vect to the console, you'll see that each element has a name. Try it out.

vect foo bar norf 11 2 NA

| You are really on a roll!

|======================================================================================================= | 79%

| We can also get the names of vect by passing vect as an argument to the names() function. Give that a try.

name(foo) Error: could not find function "name" name(vect) Error: could not find function "name" names(vect) [1] "foo" "bar" "norf"

| You are quite good my friend!

|=========================================================================================================== | 82%

| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

vect2 <- c(11,2,NA)

| You got it right!

|============================================================================================================== | 84%

| Then, we can add the names attribute to vect2 after the fact with names(vect2) <- c("foo", "bar", "norf"). Go ahead.

names(vect) <- c("foo", "bar", "norf")

| Nice try, but that's not exactly what I was hoping for. Try again. Or, type info() for more options.

| Add names to vect2 with names(vect2) <- c("foo", "bar", "norf").

vect foo bar norf 11 2 NA

| You're close...I can feel it! Try it again. Or, type info() for more options.

| Add names to vect2 with names(vect2) <- c("foo", "bar", "norf").

names(vect2) <- c("foo", "bar", "norf")

| You nailed it! Good job!

|================================================================================================================== | 87%

| Now, let's check that vect and vect2 are the same by passing them as arguments to the identical() function.

identical(vect,vect2) [1] TRUE

| That's a job well done!

|===================================================================================================================== | 89%

| Indeed, vect and vect2 are identical named vectors.

...

|========================================================================================================================= | 92%

| Now, back to the matter of subsetting a vector by named elements. Which of the following commands do you think would give us the second | element of vect?

1: vect["bar"] 2: vect[bar] 3: vect["2"]

Selection: Enter an item from the menu, or 0 to exit Selection: 1

| That's correct!

|============================================================================================================================ | 95%

| Now, try it out.

vect["bar"] bar 2

| You're the best!

|================================================================================================================================ | 97%

| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it out.

vect[c("foo","bar")] foo bar 11 2

| You are really on a roll!

|===================================================================================================================================| 100%

| Now you know all four methods of subsetting data from vectors. Different approaches are best in different scenarios and when in doubt, try | it out!

Part6 1: R Programming 2: Take me to the swirl course repository!

Selection: Enter an item from the menu, or 0 to exit Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Sequences of Numbers 3: Vectors 4: Missing Values 5: Subsetting Vectors 6: Matrices and Data Frames

Selection: 6

| | 0%

| In this lesson, we'll cover matrices and data frames. Both represent 'rectangular' data types, meaning that they are used to store tabular | data, with rows and columns.

...

|==== | 3%

| The main difference, as you'll see, is that matrices can only contain a single class of data, while data frames can consist of many | different classes of data.

...

|======= | 6%

| Let's create a vector containing the numbers 1 through 20 using the : operator. Store the result in a variable called my_vector.

my_vector <- 1:20

| You are quite good my friend!

|=========== | 9%

| View the contents of the vector you just created.

my_vector [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

| You're the best!

|=============== | 11%

| The dim() function tells us the 'dimensions' of an object. What happens if we do dim(my_vector)? Give it try.

dim(my_vector) NULL

| You got it right!

|=================== | 14%

| Clearly, that's not very helpful! Since my_vector is a vector, it doesn't have a dim attribute (so it's just NULL), but we can find its | length using the length() function. Try that now.

length(my_vector) [1] 20

| You nailed it! Good job!

|====================== | 17%

| Ah! That's what we wanted. But, what happens if we give my_vector a dim attribute? Let's give it a try. Type dim(my_vector) <- c(4, 5).

dim(my_vector) <- c(4, 5)

| Nice work!

|========================== | 20%

| It's okay if that last command seemed a little strange to you. It should! The dim() function allows you to get OR set the dim attribute | for an R object. In this case, we assigned the value c(4, 5) to the dim attribute of my_vector.

...View(my_vector)

|============================== | 23%

| Use dim(my_vector) to confirm that we've set the dim attribute correctly.

View(my_vector)

| Almost! Try again. Or, type info() for more options.

| Just type dim(my_vector) to make sure the dim attribute has been set.

dim(my_vector) [1] 4 5

| That's a job well done!

|================================== | 26%

| Another way to see this is by calling the attributes() function on my_vector. Try it now.

attributes(my_vector) $dim [1] 4 5

| Excellent job!

|===================================== | 29%

| Just like in math class, when dealing with a 2-dimensional object (think rectangular table), the first number is the number of rows and | the second is the number of columns. Therefore, we just gave my_vector 4 rows and 5 columns.

...

|========================================= | 31%

| But, wait! That doesn't sound like a vector any more. Well, it's not. Now it's a matrix. View the contents of my_vector now to see what it | looks like.

my_vector [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20

| You are really on a roll!

|============================================= | 34%

| Now, let's confirm it's actually a matrix by using the class() function. Type class(my_vector) to see what I mean.

class(my_vector) [1] "matrix"

| Excellent job!

|================================================= | 37%

| Sure enough, my_vector is now a matrix. We should store it in a new variable that helps us remember what it is. Store the value of | my_vector in a new variable called my_matrix.

my_matrix<- my_vector

| Excellent job!

|==================================================== | 40%

| The example that we've used so far was meant to illustrate the point that a matrix is simply an atomic vector with a dimension attribute. | A more direct method of creating the same matrix uses the matrix() function.

...

|======================================================== | 43%

| Bring up the help file for the matrix() function now using the ? function.

?matrix

| You got it right!

|============================================================ | 46%

| Now, look at the documentation for the matrix function and see if you can figure out how to create a matrix containing the same numbers | (1-20) and dimensions (4 rows, 5 columns) by calling the matrix() function. Store the result in a variable called my_matrix2.

my_matrix2<-matrix(1:20,nrow = 4, ncol = 5)

| Nice work!

|================================================================ | 49%

| Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The identical() function will tell us if its first two | arguments are the same. Try it out.

identical(my_matrix,my_matrix2) [1] TRUE

| You are quite good my friend!

|=================================================================== | 51%

| Now, imagine that the numbers in our table represent some measurements from a clinical experiment, where each row represents one patient | and each column represents one variable for which measurements where taken.

...3

|======================================================================= | 54%

| We may want to label the rows, so that we know which numbers belong to each patient in the experiment. One way to do this is to add a | column to the matrix, which contains the names of all four people.

...

|=========================================================================== | 57%

| Let's start by creating a character vector containing the names of our patients -- Bill, Gina, Kelly, and Sean. Remember that double | quotes tell R that something is a character string. Store the result in a variable called patients.

patients <- c("Bill","Gina","Kelly","Sean")

| That's correct!

|=============================================================================== | 60%

| Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the result in a new variable. Just call cbind() with | two arguments -- the patients vector and my_matrix.

cbind(my_matrix,patients) patients [1,] "1" "5" "9" "13" "17" "Bill"
[2,] "2" "6" "10" "14" "18" "Gina"
[3,] "3" "7" "11" "15" "19" "Kelly" [4,] "4" "8" "12" "16" "20" "Sean"

| Nice try, but that's not exactly what I was hoping for. Try again. Or, type info() for more options.

| Type cbind(patients, my_matrix) to add the names of our patients to the matrix of numbers.

cbind(patients,my_matrix) patients
[1,] "Bill" "1" "5" "9" "13" "17" [2,] "Gina" "2" "6" "10" "14" "18" [3,] "Kelly" "3" "7" "11" "15" "19" [4,] "Sean" "4" "8" "12" "16" "20"

| That's a job well done!

|================================================================================== | 63%

| Something is fishy about our result! It appears that combining the character vector with our matrix of numbers caused everything to be | enclosed in double quotes. This means we're left with a matrix of character strings, which is no good.

...

|====================================================================================== | 66%

| If you remember back to the beginning of this lesson, I told you that matrices can only contain ONE class of data. Therefore, when we | tried to combine a character vector with a numeric matrix, R was forced to 'coerce' the numbers to characters, hence the double quotes.

...

|========================================================================================== | 69%

| This is called 'implicit coercion', because we didn't ask for it. It just happened. But why didn't R just convert the names of our | patients to numbers? I'll let you ponder that question on your own.

...

|============================================================================================== | 71%

| So, we're still left with the question of how to include the names of our patients in the table without destroying the integrity of our | numeric data. Try the following -- my_data <- data.frame(patients, my_matrix)

my_data <- data.frame(patients,my_matrix)

| Keep up the great work!

|================================================================================================= | 74%

| Now view the contents of my_data to see what we've come up with.

my_data patients X1 X2 X3 X4 X5 1 Bill 1 5 9 13 17 2 Gina 2 6 10 14 18 3 Kelly 3 7 11 15 19 4 Sean 4 8 12 16 20

| That's correct!

|===================================================================================================== | 77%

| It looks like the data.frame() function allowed us to store our character vector of names right alongside our matrix of numbers. That's | exactly what we were hoping for!

...

|========================================================================================================= | 80%

| Behind the scenes, the data.frame() function takes any number of arguments and returns a single object of class data.frame that is | composed of the original objects.

...

|============================================================================================================= | 83%

| Let's confirm this by calling the class() function on our newly created data frame.

class(my_data) [1] "data.frame"

| Keep up the great work!

|================================================================================================================ | 86%

| It's also possible to assign names to the individual rows and columns of a data frame, which presents another possible way of determining | which row of values in our table belongs to each patient.

...

|==================================================================================================================== | 89%

| However, since we've already solved that problem, let's solve a different problem by assigning names to the columns of our data frame so | that we know what type of measurement each column represents.

...

|======================================================================================================================== | 91%

| Since we have six columns (including patient names), we'll need to first create a vector containing one element for each column. Create a | character vector called cnames that contains the following values (in order) -- "patient", "age", "weight", "bp", "rating", "test".

cnames<-c("patient","age","weight","bp","rating","test")

| You're the best!

|============================================================================================================================ | 94%

| Now, use the colnames() function to set the colnames attribute for our data frame. This is similar to the way we used the dim() function | earlier in this lesson.

colnames(cnames,my_data) Error in if (do.NULL) NULL else if (nc > 0L) paste0(prefix, seq_len(nc)) else character() : argument is not interpretable as logical In addition: Warning message: In if (do.NULL) NULL else if (nc > 0L) paste0(prefix, seq_len(nc)) else character() : the condition has length > 1 and only the first element will be used colnames(cnames,my_matrix) NULL Warning message: In if (do.NULL) NULL else if (nc > 0L) paste0(prefix, seq_len(nc)) else character() : the condition has length > 1 and only the first element will be used

| Almost! Try again. Or, type info() for more options.

| Try colnames(my_data) <- cnames.

colnames(my_data) <- cnames

| You are amazing!

|=============================================================================================================================== | 97%

| Let's see if that got the job done. Print the contents of my_data.

my_data patient age weight bp rating test 1 Bill 1 5 9 13 17 2 Gina 2 6 10 14 18 3 Kelly 3 7 11 15 19 4 Sean 4 8 12 16 20

| You are doing so well!

|===================================================================================================================================| 100%

| In this lesson, you learned the basics of working with two very important and common data structures -- matrices and data frames. There's | much more to learn and we'll be covering more advanced topics, particularly with respect to data frames, in future lessons.

...

| Are you currently enrolled in the Coursera course associated with this lesson?

1: Yes 2: No

Selection: 1

| Would you like me to notify Coursera that you've completed this lesson? If so, I'll need to get some more info from you.

1: Yes 2: No 3: Maybe later

Selection: 1

| Is the following information correct?

Course ID: rprog-005 Submission login (email): [email protected] Submission password: DCAEhgghma

1: Yes, go ahead! 2: No, I need to change something.

Selection: 1

| I'll try to tell Coursera you've completed this lesson now.

| Great work!

| I've notified Coursera that you have completed rprog-005, Matrices_and_Data_Frames.

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.