Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marginal controls when the full distribution is unknown #1

Open
dkyleward opened this issue Sep 9, 2019 · 6 comments
Open

Marginal controls when the full distribution is unknown #1

dkyleward opened this issue Sep 9, 2019 · 6 comments

Comments

@dkyleward
Copy link
Owner

From an email I received:

After an initial test of the ipu function on one of my current projects, I think maybe there could be more flexibility in how one can specify the marginal control values when the complete marginal distribution of a variable is unknown.

For example, say only the total number of vehicles by district is known for a study area, it might still make sense to include the variable as a control. Is there a way to do that in the current version?

@dkyleward
Copy link
Owner Author

My intention was that this could already be accomplished, but when I tested it, there was a bug in the code. I have resolved it (see #2). So first, use the following to update your package (until the CRAN version is no longer 1.0.0):

library(devtools)
install_github("dkyleward/ipfr", build_vignettes = TRUE)

Once your package is updated, you can accomplish what you want to do with the existing tools. Importantly, it does require using the secondary_seed and secondary_targets arguments. I have modified the ASU example to show how:

# standard ASU setup
result <- setup_arizona()
hh_seed <- result$hh_seed
hh_targets <- result$hh_targets
per_seed <- result$per_seed
per_targets <- result$per_targets

# Modify if only a regional person count is known
per_seed <- per_seed %>%
  mutate(pertype = "any")
per_targets$pertype <- tibble(
  any = 260
)
result <- ipu(hh_seed, hh_targets, per_seed, per_targets)

In English: I simply create a marginal target that applies to every person record and then set the control total to be 260 people. You could do the same with autos.

If you need an introduction to the secondary arguments, see "Example: Add person targets" in the vignette here: https://cran.r-project.org/web/packages/ipfr/vignettes/using_ipfr.html. In your case, you would use them for total autos rather than total people.

@figo2002
Copy link

I sent you the email. If auto control is set in the person table, obviously not every person has a car. What label should I give the non car owning person.

@dkyleward
Copy link
Owner Author

That's a good point. This isn't as straightforward as I thought, particularly because a person could own 2 or more cars. Currently, you are able to control the total number of households if your primary records represent households. You can control the number of people if your secondary records represent people. The attributes of people and households, while sometimes numeric (like the number of autos), are treated as a label/category. It may as well be "big" or "small".

To control the number of vehicles, you would have to make your secondary seed records represent vehicles instead of people. This is fine if you don't also want to control the number of people by certain characteristics, but the package can't do both at the same time at the moment.

I'm going to re-open this issue. A general solution is not immediately apparent to me. I definitely don't want to add tertiary_seed and tertiary_targets. I've put one potential solution in #3, but that is a big change to the package and won't happen anytime soon.

If you only care to control households and regional autos, then you can use the secondary_seed and secondary_targets parameters. Create a seed table where each row is an auto with a primary_id column linking it back to the household it belongs to. You can then create a type field filled with "any"and follow what I did in my previous post.

If you want to control household, persons, and autos, I'm open to suggestion.

@dkyleward dkyleward reopened this Sep 10, 2019
@figo2002
Copy link

My initial thought would be to set a special 'garbage can' category for each variable, say category '0', which is not controlled. Then, autos or any other 'number' variable can be represented as "virtual persons" each owning 1 car whose other values are all set to zero. But could this lead to inconsistent totals or create other problems, I don't know yet.

@dkyleward
Copy link
Owner Author

That's pretty clever. Let me know how it works out.

@figo2002
Copy link

It seems to be working OK for my own data, I disabled target balancing (within primary and secondary) and recalculated geo level average weights.
I have not written any R packages before, I'll try to create a pull request later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants