Implement multi-level calibration #26

nikhilwoodruff · 2024-09-20T13:32:54Z

We're going to need to calibrate weights for the following areas:

UK (1)
UK regions (12)
Parliamentary constituencies (650)
Local authorities (382)

This is (1,045 areas x ~50,000 households) = ~ 50 million weight values. In this issue, I'll outline how I suggest we implement this and what we should compromise on for speed/usability.

Firstly, and probably most importantly, I think we should derive containing regions by summing component area weights. We should run a calibration of the constituencies directly, including national targets from summed weights in the loss function and then add up the local area weights to get our UK regions and UK weights. One good reason for this is that there are lots of targets that we only have at higher geographic levels, and this approach ensures we're at least providing that information down at some level to the local weights. Another good reason is internal consistency. And another: I think this will lead to performance gains. The more that we can do in one PyTorch tensor, rather than separate processes, the more we can make use of torch's in-built parallelisation.

An issue I don't see a way around is that local authorities and Parliamentary constituencies cover the same areas and are at the same level. I think we do one calibration run that outputs constituency, region and national weights, then another that outputs local authority, region and national weights, and then just take region and national weights from the one of them. We're just going to have to run some checks that there's no large inconsistency (which there shouldn't be if we're targeting the same national statistics) and be OK with that I think.

Another thing: we currently reweight the UK for each of seven years. I think we should continue to do that for national weights, but not local areas due to the size of the weight files. So we should reweight all areas in 2022, then run separate calibration of national weights for 2023 and onwards. I don't think that using different loss functions should introduce any big inconsistencies here if we're still targeting the same national statistics.

cc @MaxGhenis

nikhilwoodruff added the enhancement New feature or request label Sep 20, 2024

nikhilwoodruff self-assigned this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement multi-level calibration #26

Implement multi-level calibration #26

nikhilwoodruff commented Sep 20, 2024

Implement multi-level calibration #26

Implement multi-level calibration #26

Comments

nikhilwoodruff commented Sep 20, 2024