-
Notifications
You must be signed in to change notification settings - Fork 0
/
day_04.Rmd
199 lines (162 loc) · 5.92 KB
/
day_04.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# Passport Processing
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(crayon.enabled = NULL)
library(tidyverse)
START_TIME <- Sys.time()
```
This is my attempt to solve [Day 4](https://adventofcode.com/2020/day/4).
```{r load data}
sample <- read_lines("samples/day_04_sample.txt")
actual <- read_lines("inputs/day_04_input.txt")
```
## Part 1
The fields for todays puzzle are:
- byr (Birth Year)
- iyr (Issue Year)
- eyr (Expiration Year)
- hgt (Height)
- hcl (Hair Color)
- ecl (Eye Color)
- pid (Passport ID)
- cid (Country ID)
For a record to be treated as valid we must have all of the fields present, except for the country id which is optional.
Records are separated by blank new lines. I am going to first collapse the data into a single string, separated by ",".
The blank lines will then be ",,", so we can simply split our input there. This will give us one string per records. We
then need to sort out each record by splitting each string at either a space or a comma, then using the `unglue_data`
function to extra a data frame with a column "key" for the left hand part and a column "value" for the right hand part.
By using `map_dfr` we will be able to combine each record's key/value pairs into a single dataframe, but we add a
column "record" to keep track of which record we are dealing with.
```{r process data}
process_data <- function(input) {
input %>%
paste(collapse = ",") %>%
str_split(",,") %>%
# take just the first result returned by str_split, it will return a list
# with one item which contains the results
pluck(1) %>%
str_split("[, ]") %>%
map_dfr(unglue::unglue_data, "{key}:{value}", .id = "record")
}
process_data(sample)
```
```{r valid record}
get_valid_records <- function(input) {
input %>%
process_data() %>%
filter(key != "cid") %>%
group_by(record) %>%
filter(n() == 7) %>%
ungroup()
}
```
```{r part 1 functon}
part_1 <- function(input) {
input %>%
get_valid_records() %>%
distinct(record) %>%
nrow()
}
```
We can test our function on the sample:
```{r part 1 test sample}
part_1(sample) == 2
```
Now we can run our function on the actual data:
```{r part 1 actual}
part_1(actual)
```
## Part 2
We now need to validate the data in the passports:
- byr (Birth Year) - four digits; at least 1920 and at most 2002.
- iyr (Issue Year) - four digits; at least 2010 and at most 2020.
- eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
- hgt (Height) - a number followed by either cm or in:
- If cm, the number must be at least 150 and at most 193.
- If in, the number must be at least 59 and at most 76.
- hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
- ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
- pid (Passport ID) - a nine-digit number, including leading zeroes.
- cid (Country ID) - ignored, missing or not.
The approach I am going to take for part 2 is to build some helper validation functions for the years and the height
parts of the record, and then `filter` the rows to just retain valid records. To make it easier to do this I will first
pivot the data from long format to wide, so each passport is one row of data.
```{r part 2 function}
part_2 <- function(input) {
validate_years <- function(y, min, max) {
yi <- suppressWarnings(as.integer(y))
ifelse(is.na(yi), FALSE, min <= yi & yi <= max)
}
validate_height <- function(h) {
hv <- suppressWarnings(as.integer(str_sub(h, 1, -3)))
ht <- str_sub(h, -2, -1)
case_when(is.na(hv) ~ FALSE,
ht == "cm" ~ 150 <= hv & hv <= 193,
ht == "in" ~ 59 <= hv & hv <= 76,
TRUE ~ FALSE)
}
input %>%
get_valid_records() %>%
pivot_wider(names_from = key, values_from = value) %>%
filter(validate_years(byr, 1920, 2002),
validate_years(iyr, 2010, 2020),
validate_years(eyr, 2020, 2030),
validate_height(hgt),
str_detect(hcl, "^#[0-9a-f]{6}$"),
ecl %in% c("amb", "blu", "brn", "gry", "grn", "hzl", "oth"),
str_detect(pid, "^\\d{9}$"))
}
```
The provided test cases don't use the initial sample data, so let's just run the function and see if it does not error.
```{r part 2 check on sample}
part_2(sample)
```
It seems to work, so let's run on our actual data
```{r part 2 actual}
nrow(part_2(actual))
```
## Extra: Solving with regular expressions
This could be reduced to simply solving with regular expressions. First let's create a function to convert the input
into a single string.
```{r extra}
records_as_strings <- function(input) {
input %>%
str_replace("^$", "\n") %>%
paste(collapse = " ") %>%
str_split(" \n ") %>%
pluck(1)
}
records_as_strings(sample)
```
Now, for part 1 we just need to run a regular expression for each of the different fields on each record. Using map
gives us a list for each of the different fields, so we transpose to get the results of the regex's for each record. We
can then flatten these lists and run the `all` function to check to see if every regex was matched for that record.
```{r extra part 1}
actual %>%
records_as_strings() %>%
map(c("byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid"),
str_detect,
string = .) %>%
transpose() %>%
map_lgl(compose(all, flatten_lgl)) %>%
sum()
```
Part 2 is similar, but we need to match the value after the field name.
```{r extra part 2}
actual %>%
records_as_strings() %>%
map(c("byr:(19[2-9][0-9]|200[0-2])",
"iyr:20(1[0-9]|20)",
"eyr:20(2[0-9]|30)",
"hgt:(1([5-8][0-9]|9[0-3])cm|(59|6[0-9]|7[0-6])in)",
"hcl:#[0-9a-f]{6}",
"ecl:(amb|blu|brn|gry|grn|hzl|oth)",
"pid:\\d{9}(?!\\d)"), # negative lookahead: make sure the character that follows the 9th digit is not a digit
str_detect,
string = .) %>%
transpose() %>%
map_lgl(compose(all, flatten_lgl)) %>%
sum()
```
---
*Elapsed Time: `r round(Sys.time() - START_TIME, 3)`s*