You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When pulling variables from multiple data source tables with get_acs(), I've noticed inconsistent behavior:
First, we can pull a list of variables from the same table ("DP") without issue:
library(tidycensus) # v1.6.3
library(dplyr) # v1.1.4
# Pulling from the same data source ---------------------------------------
# All "DP" variables
vars_dp <- c("DP04_0001", "DP02_0001")
acs_from_list_same_table <- get_acs(
geography = "tract",
variables = vars_dp,
year = 2010,
state = c("AL", "NY", "CA"),
output = "wide",
cache = FALSE) %>%
bind_rows()
table(is.na(acs_from_list_same_table$DP04_0001E))
This has zero NA values in the variable DP04_0001E for the states I have pulled it for.
Next, I try to add in a variable from a different table ("S"):
# Pulling from different source tables ------------------------------------
# A mix of "DP" and "S" variables
vars_dps <- c("DP04_0001", "S0601_C01_001")
acs_from_list_different_table <- get_acs(
geography = "tract",
variables = vars_dps,
year = 2010,
state = c("AL", "NY", "CA"),
output = "wide",
cache = FALSE) %>%
bind_rows()
table(is.na(acs_from_list_different_table$DP04_0001E))
As can be seen, we now have manyNA values in variable DP04_0001E.
When comparing the two, I see that the values pulled are the same where the "multiple source table" is not NA:
# Comparing ---------------------------------------------------------------
joined_data <- left_join(x = acs_from_list_same_table,
y = acs_from_list_different_table,
by = "GEOID") %>%
select(starts_with("DP04"))
# In instances where "multiple data source" values were not NA, they match the
# "pulled from a single data source" version's values.
joined_data %>%
print(n = 10)
joined_data %>%
filter(complete.cases(.)) %>%
print(n = 10)
This was working fine a couple of months ago, but unfortunately I don't have a record of the previous tidycensus version I used.
The text was updated successfully, but these errors were encountered:
To combine variables from different tables, we join on GEOID and NAME to preserve both columns. The problem is that in the 2010 ACS, the NAME column is not consistent across datasets (some use commas, others use semicolons as separators). This does not appear to be an issue in later years, which is why we never noticed it.
When pulling variables from multiple data source tables with
get_acs()
, I've noticed inconsistent behavior:First, we can pull a list of variables from the same table ("DP") without issue:
This has zero NA values in the variable
DP04_0001E
for the states I have pulled it for.Next, I try to add in a variable from a different table ("S"):
As can be seen, we now have many
NA
values in variableDP04_0001E
.When comparing the two, I see that the values pulled are the same where the "multiple source table" is not
NA
:This was working fine a couple of months ago, but unfortunately I don't have a record of the previous
tidycensus
version I used.The text was updated successfully, but these errors were encountered: