Marginal controls when the full distribution is unknown #1

dkyleward · 2019-09-09T13:50:26Z

From an email I received:

After an initial test of the ipu function on one of my current projects, I think maybe there could be more flexibility in how one can specify the marginal control values when the complete marginal distribution of a variable is unknown.

For example, say only the total number of vehicles by district is known for a study area, it might still make sense to include the variable as a control. Is there a way to do that in the current version?

dkyleward · 2019-09-09T15:37:26Z

My intention was that this could already be accomplished, but when I tested it, there was a bug in the code. I have resolved it (see #2). So first, use the following to update your package (until the CRAN version is no longer 1.0.0):

library(devtools)
install_github("dkyleward/ipfr", build_vignettes = TRUE)

Once your package is updated, you can accomplish what you want to do with the existing tools. Importantly, it does require using the secondary_seed and secondary_targets arguments. I have modified the ASU example to show how:

# standard ASU setup
result <- setup_arizona()
hh_seed <- result$hh_seed
hh_targets <- result$hh_targets
per_seed <- result$per_seed
per_targets <- result$per_targets

# Modify if only a regional person count is known
per_seed <- per_seed %>%
  mutate(pertype = "any")
per_targets$pertype <- tibble(
  any = 260
)
result <- ipu(hh_seed, hh_targets, per_seed, per_targets)

In English: I simply create a marginal target that applies to every person record and then set the control total to be 260 people. You could do the same with autos.

If you need an introduction to the secondary arguments, see "Example: Add person targets" in the vignette here: https://cran.r-project.org/web/packages/ipfr/vignettes/using_ipfr.html. In your case, you would use them for total autos rather than total people.

figo2002 · 2019-09-10T04:43:32Z

I sent you the email. If auto control is set in the person table, obviously not every person has a car. What label should I give the non car owning person.

dkyleward · 2019-09-10T13:05:59Z

That's a good point. This isn't as straightforward as I thought, particularly because a person could own 2 or more cars. Currently, you are able to control the total number of households if your primary records represent households. You can control the number of people if your secondary records represent people. The attributes of people and households, while sometimes numeric (like the number of autos), are treated as a label/category. It may as well be "big" or "small".

To control the number of vehicles, you would have to make your secondary seed records represent vehicles instead of people. This is fine if you don't also want to control the number of people by certain characteristics, but the package can't do both at the same time at the moment.

I'm going to re-open this issue. A general solution is not immediately apparent to me. I definitely don't want to add tertiary_seed and tertiary_targets. I've put one potential solution in #3, but that is a big change to the package and won't happen anytime soon.

If you only care to control households and regional autos, then you can use the secondary_seed and secondary_targets parameters. Create a seed table where each row is an auto with a primary_id column linking it back to the household it belongs to. You can then create a type field filled with "any"and follow what I did in my previous post.

If you want to control household, persons, and autos, I'm open to suggestion.

figo2002 · 2019-09-10T13:35:53Z

My initial thought would be to set a special 'garbage can' category for each variable, say category '0', which is not controlled. Then, autos or any other 'number' variable can be represented as "virtual persons" each owning 1 car whose other values are all set to zero. But could this lead to inconsistent totals or create other problems, I don't know yet.

dkyleward · 2019-09-10T13:49:43Z

That's pretty clever. Let me know how it works out.

figo2002 · 2019-09-12T03:54:44Z

It seems to be working OK for my own data, I disabled target balancing (within primary and secondary) and recalculated geo level average weights.
I have not written any R packages before, I'll try to create a pull request later.

dkyleward mentioned this issue Sep 9, 2019

Error when a marginal has a single string category #2

Closed

dkyleward closed this as completed Sep 9, 2019

dkyleward reopened this Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marginal controls when the full distribution is unknown #1

Marginal controls when the full distribution is unknown #1

dkyleward commented Sep 9, 2019

dkyleward commented Sep 9, 2019

figo2002 commented Sep 10, 2019

dkyleward commented Sep 10, 2019

figo2002 commented Sep 10, 2019

dkyleward commented Sep 10, 2019

figo2002 commented Sep 12, 2019

Marginal controls when the full distribution is unknown #1

Marginal controls when the full distribution is unknown #1

Comments

dkyleward commented Sep 9, 2019

dkyleward commented Sep 9, 2019

figo2002 commented Sep 10, 2019

dkyleward commented Sep 10, 2019

figo2002 commented Sep 10, 2019

dkyleward commented Sep 10, 2019

figo2002 commented Sep 12, 2019