-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marginal controls when the full distribution is unknown #1
Comments
My intention was that this could already be accomplished, but when I tested it, there was a bug in the code. I have resolved it (see #2). So first, use the following to update your package (until the CRAN version is no longer 1.0.0): library(devtools)
install_github("dkyleward/ipfr", build_vignettes = TRUE) Once your package is updated, you can accomplish what you want to do with the existing tools. Importantly, it does require using the # standard ASU setup
result <- setup_arizona()
hh_seed <- result$hh_seed
hh_targets <- result$hh_targets
per_seed <- result$per_seed
per_targets <- result$per_targets
# Modify if only a regional person count is known
per_seed <- per_seed %>%
mutate(pertype = "any")
per_targets$pertype <- tibble(
any = 260
)
result <- ipu(hh_seed, hh_targets, per_seed, per_targets) In English: I simply create a marginal target that applies to every person record and then set the control total to be 260 people. You could do the same with autos. If you need an introduction to the |
I sent you the email. If auto control is set in the person table, obviously not every person has a car. What label should I give the non car owning person. |
That's a good point. This isn't as straightforward as I thought, particularly because a person could own 2 or more cars. Currently, you are able to control the total number of households if your primary records represent households. You can control the number of people if your secondary records represent people. The attributes of people and households, while sometimes numeric (like the number of autos), are treated as a label/category. It may as well be "big" or "small". To control the number of vehicles, you would have to make your secondary seed records represent vehicles instead of people. This is fine if you don't also want to control the number of people by certain characteristics, but the package can't do both at the same time at the moment. I'm going to re-open this issue. A general solution is not immediately apparent to me. I definitely don't want to add If you only care to control households and regional autos, then you can use the If you want to control household, persons, and autos, I'm open to suggestion. |
My initial thought would be to set a special 'garbage can' category for each variable, say category '0', which is not controlled. Then, autos or any other 'number' variable can be represented as "virtual persons" each owning 1 car whose other values are all set to zero. But could this lead to inconsistent totals or create other problems, I don't know yet. |
That's pretty clever. Let me know how it works out. |
It seems to be working OK for my own data, I disabled target balancing (within primary and secondary) and recalculated geo level average weights. |
From an email I received:
After an initial test of the ipu function on one of my current projects, I think maybe there could be more flexibility in how one can specify the marginal control values when the complete marginal distribution of a variable is unknown.
For example, say only the total number of vehicles by district is known for a study area, it might still make sense to include the variable as a control. Is there a way to do that in the current version?
The text was updated successfully, but these errors were encountered: