-
Notifications
You must be signed in to change notification settings - Fork 0
/
12_programming.qmd
205 lines (131 loc) · 5.25 KB
/
12_programming.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# Programming
Programming in **[R]{.sans-serif}** is basically the same as programming
in any other language; the core program is controlled by a series of
if-then statements, loops, print statements, calls to other programs,
and return statements. The only differences are minor syntax conventions
that just take practice.
## Branching
To begin with, let's look at an example of how if-then statements work
in **[R]{.sans-serif}**. Try submitting this command at the prompt to
see how it works:
`> if (1>0) print("I Like Binary")`
Note that **[R]{.sans-serif}** prints "I Like Binary\" because the
condition in the *if* statement is true. The `print` function is very
useful for programming purposes, in that it prints simple strings in the
**[R]{.sans-serif}** Console. No explicitly labeled *then* statement is
needed after an *if* statement; you simply type what you would like
**[R]{.sans-serif}** to do if the *if* condition is true. In general, if
you want to do more than one thing if the *if* condition is true, you
use this bracketed structure:
`if (logical condition) {`\
`do this`\
`and this`\
`and this`\
`}`
The statements in the brackets usually refer to function calls and
object assignments, and simply need to be on separate lines (no
punctuation necessary!). If necessary, you can use an "else" option:
`> if (x>5) print("x is big") else print("x is small")`
## Looping
Now let's take a look at how a "for" loop works in **[R]{.sans-serif}**.
Try submitting the following syntax at the command prompt:
`> for (i in 1:5) print(i)`
**[R]{.sans-serif}** prints 1, 2, 3, 4, and 5. *For* loops work like
*if* conditions, and if you want **[R]{.sans-serif}** to do more than
one thing in a *for* loop, use brackets around the commands:
`for(i in a:b) {`\
`do this`\
`and this`\
`and this`\
`}`
`while` and `repeat` loops in **[R]{.sans-serif}** work in a manner very
similar to other programming languages. One or more commands are
executed repeatedly while a condition remains true. Typically a counter
object is initialized for controlling the loop, and then incremented
within the while loop while certain commands are executed for each
repetition. The loop ends when the while condition is false.
`t = 0`\
`while(t < 7) {`\
`print(t)`\
`t = t+1`\
`}`
## Blood Pressure Example Revisited
Earlier we dealt with a data set where diastolic and systolic blood
pressure had been read into a single column separated. We learned how to
extract the two blood pressures using the `strsplit` function. Now that
we have discussed creating functions and programming, we will separate
the two measurements for the entire data set. We begin by clearing up
the workspace.
`> ls()`
`> rm(list=ls())`
`> dat = read.csv("linmod.csv", header=TRUE, na.strings=c("NA", 888, 999, "Not Reported"))`
Extract a blood pressure for practice.
`> b = dat$bp[1]`
Recall that the following commands successfully extracted the separated
measurements.
`> ## Change to character.`
`> b = as.character(b)`
`> sep = strsplit(b, split="/")`
`> ## Extract systolic blood pressure.`
`> s = sep[[1]][1]`
`> ## Extract diastolic blood pressure.`
`> d = sep[[1]][2]`
`> ## Convert to numeric.`
`> s = as.numeric(s)`
`> d = as.numeric(d)`
Now we enclose the following commands in a function. We will name this
function `extract.bp`.
`> extract.bp = function(x) {`
`> x = as.character(x)`
`> sep = strsplit(x, split="/")`
`> s = as.numeric(sep[[1]][1])`
`> d = as.numeric(sep[[1]][2])`
`> return(c(s,d))`
`> } `
Type `extract.bp` at the command line to verify the creation of the
function was successful.
`> extract.bp`
Now let's experiment to see if our function works.
`> b`
`> extract.bp(b)`
The following code uses a loop to extract the blood pressures for each
variable in the data set.
`> ## Number of rows in the data set.`
`> n = dim(dat)[1]`
`> ## Set up empty numeric vectors to store two blood pressures.`
`> systolic = numeric(n)`
`> diastolic = numeric(n)`
`> ## Loop over the rows and extract.`
`> for (j in 1:n){`
`> out = extract.bp(dat$bp[j])`
`> systolic[j] = out[1]`
`> diastolic[j] = out[2]`
`> } `
`> systolic`
`> diastolic`
## The `apply` Function
The `apply` function performs a function on each row or column of a
matrix. (There are other \*apply functions for other situations.) First
create a couple matrices to play with. In addition to the `matrix`
function, `rbind` (or `cbind`) can be used to create matrices. For
example,
`> c1 = c(2, 9, 3)`
`> c2 = c(12, 1, 5)`
`> M2 = cbind(c1, c2)`
`> M2`
`> apply(M2, 2, mean)`
In the example above, `apply` took three arguments:
1. The first argument, "M2" is the matrix
2. The second argument, "2", is an index indicating the function is
applied separately for each column, a value of "1" would apply the
function on each row
3. The last argument is the function that is applied
If you were to just type `mean(M2)`, the mean would be computed over the
entire matrix.
`> mean(M2)`
[^1]: $\beta_i$ is an additive effect, and $e^{\beta_i}$ is a
multiplicative effect. For example, if $\beta_i = 0.38$, we would
estimate that the log odds of success increases by 0.38 for every
unit increase in $X_i$. If $e^{\beta_i} = 1.46$, we would estimate
that the odds of success increases by 46 *percent* for every unit
increase in $X_i$.