-
Notifications
You must be signed in to change notification settings - Fork 0
/
comparing-students.qmd
307 lines (260 loc) · 11.7 KB
/
comparing-students.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
---
title: "Optional: Comparing Across Students"
page-layout: full
reference-location: margin
citation-location: margin
bibliography: ref-1.bib
execute:
echo: false
---
```{r}
#| include: false
library(readxl)
library(DT)
library(tidyverse)
studentA <- read_xlsx(
here::here("data", "code_book.xlsx"),
sheet = "studentA"
) %>%
pivot_longer(cols = starts_with("note"),
names_to = "note",
values_to = "notes"
) %>%
drop_na(notes) %>%
pivot_longer(cols = starts_with("theme"),
names_to = "theme",
values_to = "theme_note"
) %>%
drop_na(theme_note) %>%
mutate(notes =
case_when(
note == "note1" & theme == "theme1" ~ notes,
note == "note2" & theme == "theme2" ~ notes,
note == "note3" & theme == "theme3" ~ notes,
note == "note4" & theme == "theme4" ~ notes
)
) %>%
drop_na(notes) %>%
select(code, descriptive_code, theme_note, notes) %>%
mutate(student = "studentA")
studentB <- read_xlsx(
here::here("data", "code_book.xlsx"),
sheet = "studentB"
) %>%
pivot_longer(cols = starts_with("note"),
names_to = "note",
values_to = "notes"
) %>%
drop_na(notes) %>%
pivot_longer(cols = starts_with("theme"),
names_to = "theme",
values_to = "theme_note"
) %>%
drop_na(theme_note) %>%
mutate(notes =
case_when(
note == "note1" & theme == "theme1" ~ notes,
note == "note2" & theme == "theme2" ~ notes,
note == "note3" & theme == "theme3" ~ notes,
note == "note4" & theme == "theme4" ~ notes
)
) %>%
drop_na(notes) %>%
select(code, descriptive_code, theme_note, notes) %>%
mutate(student = "studentA")
students <- full_join(studentA,
studentB,
by = c("code",
"descriptive_code",
"theme_note",
"notes",
"student")
)
```
## Workflow
::: {.column-margin}
![](images/workflow.png)
:::
One of the most substantial differences between Student A's code and Student B's
was found in the theme of workflow. In Student A's code, there was no obvious
structure to their workflow. Sporadic code comments were used to describe what
the code below was for, yet it was also unclear what some comments corresponded
to (e.g., `#Tanner's code/help`). Student A, on the other hand, has a nearly
meticulous workflow, starting with sourcing in common functions, then loading
in the data, then cleaning the data, and finally analyzing the data. Moreover,
Student A used code comments to generate "sections" of code, describing the
overall context of the code (e.g., `#### Carboy D ####`), and "subsections" of
code, describing the process being taken (e.g., `# Estimate Initial
concentration of N15-NO3 relative to Ar`).
Interestingly, there were almost no instances of code in Student B's analysis
where an object was inspected. Alternatively, in Student A's code there were
frequent instances where an object was inspected (e.g.,
`summary(linearAnterior)`, `WeightChange`).
</br>
```{r workflow}
students %>%
filter(theme_note == "workflow") %>%
distinct(code, .keep_all = TRUE) %>%
select(code, descriptive_code, student) %>%
datatable(class = 'row-border stripe',
colnames = c("R Code", "Descriptive Code", "Student")
)
```
### Readability
::: {.column-margin}
![](images/read.png)
:::
Aside from student's use of code comments for organizing their workflow, I
noticed differences in their use of whitespace, returns for long lines of code,
and named arguments. Whereas Student A would consistently use whitespace
surrounding arithmetic operators (e.g., `+`, `-`, `=`. `*`, ), relational
operators (e.g., `==`, `<`, `>`) operators, and commas, Student B's use of
whitespace was again sporadic. Most frequently, Student A's statements would have
some combination of present and absent whitespace
(e.g., `Early <- subset(RPMA2GrowthSub, StockYear<2004)`).
Similar to the use of whitespace, differences were found in how each student
handled long lines of code. In all but a few instances of Student B's code, she
used returns to break lines longer than 80 characters. Student A, however,
never used returns to break long lines of code. When paired with a lack of
whitespace, these long lines made Student A's code difficult to interpret (as
demonstrated in the code below).
```{r student-B-whitespace}
#| eval: false
#| echo: true
with(PADataNoOutlier, plot(Lipid~log(PSUA), las = 1, col = ifelse(PADataNoOutlier$`Fork Length`<260,"red","black")))
```
Interestingly, Student B would habitually use named arguments for functions she
employed. Paired with her use of whitespace and returns, these named arguments
made her code more easily readable and digestible. Aside from the code used
to produce visualizations (e.g., `col = `, `las = `), Student A's code, however,
did not contain references to named arguments. Combined with a sporadic use
of whitespace and returns, this lack of named arguments made Student A's code
difficult to read and interpret the processes being enacted.
Below are two examples of code which contrast all three of these
instances of "readability":
**Student A**
```{r student-A-longline}
#| eval: false
#| echo: true
EarlyLengthAge <- ddply(Early, ~Age, summarise, meanLE=mean(ForkLength, na.rm = T))
```
**Student B**
```{r student-B-longline}
#| eval: false
#| echo: true
likelihoods <- apply(X = pMat,
MARGIN = 1,
FUN = nmle,
t = timeD,
y = obsD,
N15_NO3_O = fracDenD*(N15_NO3_O_D)
)
```
### Reproducibility
::: {.column-margin}
![](images/reproduce.png)
:::
As mentioned previously, at the beginning of Student B's code were explicit
references to the data being used for analysis. Specifically, Student B used the
`load()` function to source her data. Rather than writing statements of code,
Student A instead used the RStudio GUI to import her data into her workspace.
Thus, in Student A's code there are no lines of code which load in the data
she worked with. Not only does this make Student A's code not reproducible, but
references to dataframes named `PADataNoOutlier` become increasingly concerning.
When asked about how the "outliers" were removed from the `PADataNoOutlier`
dataset, Student A stated that she had used Excel to clean the data and then
loaded the cleaned data into RStudio (using the GUI).
Student A's code had additional statements which raise concerns for
reproducibilty. Specifically, there are statements which call on the `ddply()`
function *before* the plyr package has been loaded. In addition, Student A
had two instances of script fragments, code which would not execute or which
would not produce the desired result (displayed below). The first instance
(`plot(LengthAge$mean ~ LengthAge$Age)`) references a non-existent variable
(`LengthAge$mean`). The second instance attempts to create a dataframe of
previously created objects, but the `Growth` column is not correctly created, as
Student A neglects to use the `c()` function to combine these objects into a
vector.
```{r reproducibility}
students %>%
filter(theme_note == "reproducibility") %>%
distinct(code, .keep_all = TRUE) %>%
select(code, descriptive_code, notes) %>%
datatable(class = 'row-border stripe',
colnames = c("R Code", "Descriptive Code", "Notes")
)
```
</br>
**A Note About Student's Script Files**
Both Student A and Student B interacted with R through R scripts created in
RStudio. While R Markdown [@rmarkdown] documents existed at the time of their
GLAS course, the instructor of the course did not demonstrate the use of these
dynamic documents. Thus, these student's analysis copied and pasted their
results from RStudio into a Word file.
Although it was noted that Student B used functions (e.g., `source()`, `load()`)
to load functions and data into her R script, these statements used a mix of
full and relative paths to access these materials. This mix of full and relative
paths also makes Student B's script limited in its reproducibility. It is,
however, worth noting that at the time of their GLAS course, RStudio projects
did not exist. Thus, the methods
### Summary
Viewing these differences through the lens of these student's computing
experiences, helps us understand potential reasons for *why* these differences
occurred. As discussed in [Digging Deeper](digging-deeper.qmd), Student A entered
graduate school (and GLAS) with hardly any computing experiences. Student B,
however, entered graduate school having numerous experiences programming in
Matlab, and completed the Swirl tutorial [@swirl] before enrolling in GLAS.
Student B's prior programming experiences provided her with an appreciation for
structured programs, as well as an understanding of the importance of
reproducible code. Due to her limited programming experiences, Student A's
attention was pulled toward getting a working solution rather that writing
readable code, organizing her code, or ensuring her analysis was fully
reproducible.
## Efficiency / Inefficiency
::: {.column-margin}
![](images/efficient.png)
:::
Efficiency of student's code was determined based on a statement's
adheres to the "don't repeat yourself" principle [@greg]. Student B's prior
programming experiences allowed her to see the importance of writing
efficient code, sourcing in functions she frequently used and utilizing
iteration for repeated operations (e.g., `apply()`). With her limited
programming experiences, Student A was unfamiliar with this programming
practice. Instead, Student A was focused on finding a working solution for the
task at hand. Thus, when a working solution was found, Student A would
copy, paste, and modify the code to suit a variety of situations.
</br>
```{r efficiency}
students %>%
filter(theme_note %in% c("inefficiency", "efficiency")) %>%
distinct(code, .keep_all = TRUE) %>%
select(code, descriptive_code, student) %>%
datatable(class = 'row-border stripe',
colnames = c("R Code", "Descriptive Code", "Student")
)
```
## Data Visualization
Despite the considerable differences in Student A and Student B's workflow and
programming efficiency, they had substantial similarities in the data
visualizations they produced. Both students primarily produced scatterplots,
often including a third variable by coloring points. Both students would
consistently modify their axis labels, rotate their axis tick mark labels
(`las`), and include a legend in their plot. Each of these similarities
arose from their experiences in the GLAS course, where these practices were
modeled by the instructor for the visualizations the class produced.
There are, however, notable differences within these similarities. Where Student
A paired the `plot()` and `lines()` functions, Student B used the built-in
`type` argument to produce a line plot. Additionally, Student B's
scatterplot had more polished axis labels through her use of the `title()`
function. Finally, although small in nature, each student used a different
method to declare the legend position, with Student A specifying x and y
coordinates and Student B using the ("bottomright") string specification.
</br>
```{r visualization}
students %>%
filter(theme_note %in% c("inefficiency", "efficiency")) %>%
distinct(code, .keep_all = TRUE) %>%
select(code, descriptive_code, student) %>%
datatable(class = 'row-border stripe',
colnames = c("R Code", "Descriptive Code", "Student")
)
```