Skip to content

zimfv/rat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAT - Restoring Aggregated Tables

What's the problem?

There are two aggregated tables with some concided columns:

Where

I want to "restore" (disaggregate) them somehow to get new table

Such that

Solving the problem

Of course there exists a lot of solves for that's system. So I make a guess that results are most independent.

Independence is not possible in general situation, so i will minimize N sums

Or we can just minimize

Preparing to use

The first you should install another packages: numpy, scipy, pandas and cvxpy.

pip install numpy
pip install scipy
pip install pandas
pip install cvxpy

Then move rat folder to a needed directory.

Usage examples

Restoring two tables

import numpy as np
import pandas as pd

There are two tables: employment with columns District, Agriculture, Industry, Production, Service;

df_employment = pd.read_csv('tables/employment.csv')
df_employment
District Agriculture Industry Production Service
0 East Forests 2063 3644 504 5562
1 North Mountains 1258 3807 862 11540

and environment with columns District, Urban, Suburban, Rural.

df_environment = pd.read_csv('tables/environment.csv')
df_environment
District Urban Suburban Rural
0 East Forests 3866 1510 6397
1 North Mountains 3438 5779 8250

So there is column District in both tables.

We want to "restore" that by making them more independence. That we can make with function restore_table from ratrestore module

from rat.ratrestore import restore_table
df_restored = restore_table(df_employment, df_environment, name_a='Employment', name_b='Environment', name_res='Count')
df_restored.head(6)
District Employment Environment Count
0 East Forests Agriculture Rural 1120.955661
1 East Forests Agriculture Suburban 264.599508
2 East Forests Agriculture Urban 677.444831
3 East Forests Industry Rural 1980.010872
4 East Forests Industry Suburban 467.377899
5 East Forests Industry Urban 1196.611229

If we want to "restore" that by making minimal square sums, we can change parameter obj_type from default 'dependences' to 'squares':

Warning: That can be noncorrect.

df_restored = restore_table(df_employment, df_environment, name_a='Employment', name_b='Environment', name_res='Count', obj_type='squares')
df_restored.head(6)
District Employment Environment Count
0 East Forests Agriculture Rural 1268.107791
1 East Forests Agriculture Suburban 244.740815
2 East Forests Agriculture Urban 611.218573
3 East Forests Industry Rural 2161.590402
4 East Forests Industry Suburban 129.786239
5 East Forests Industry Urban 1504.701185

Editing

Suppose we have DataFrame df_family

df_family = pd.read_csv('tables/family.csv')
df_family
District Sex Single Marriged Widower
0 East Forests Female 2545 2248 314
1 East Forests Male 1702 2920 2044
2 North Mountains Female 3059 3352 617
3 North Mountains Male 3207 4578 2654

But we want to make one column Family Status from columns Single, Marriged and Widower.

We can use function roll_weak from package ratedit

from rat.ratedit import roll_weak
df_family_weak = roll_weak(df_family, ['Single', 'Marriged', 'Widower'], value_name='Family Status', res_name='Persons')
df_family_weak
District Sex Family Status Persons
0 East Forests Female Single 2545
1 East Forests Female Marriged 2248
2 East Forests Female Widower 314
3 East Forests Male Single 1702
4 East Forests Male Marriged 2920
5 East Forests Male Widower 2044
6 North Mountains Female Single 3059
7 North Mountains Female Marriged 3352
8 North Mountains Female Widower 617
9 North Mountains Male Single 3207
10 North Mountains Male Marriged 4578
11 North Mountains Male Widower 2654

Another situation: we want to get df)family with unique values in column District by making columns from values of column Sex.

We can use function roll_strong from package ratedit to get two tables

from rat.ratedit import roll_strong
df_family_strong, cols_family_strong = roll_strong(df_family, ['District'], ['Sex'], ['Single', 'Marriged', 'Widower'], value_name='Family status')

The first is interesting table with renamed columns:

df_family_strong
District col_0 col_1 col_2 col_3 col_4 col_5
0 East Forests 2545 1702 2248 2920 314 2044
1 North Mountains 3059 3207 3352 4578 617 2654

And the second is explanation of new column names:

cols_family_strong
Sex Family status
col_0 Female Single
col_1 Female Marriged
col_2 Female Widower
col_3 Male Single
col_4 Male Marriged
col_5 Male Widower

You can change index_prefix parameter from default 'col_' to any you like.

Restoring alot of tables

If we want to restore many tables consistently (employment, enviroment and family), we should edit them to same names columns and different value columns.

In our case that is just change df_family to df_family_strong. So we have three tables (df_employment, df_enviroment and df_family_strong) with name-column District.

To restore them consistently, we can use restore_alot function from package ratrestore

from rat.ratrestore import restore_alot

df_alot = restore_alot([df_employment, df_environment, df_family_strong], name_cols=['District'], 
                       tab_names=['Employment', 'Environment', 'Family'], name_res='Count')
df_alot.head(5)
District Employment Environment Family Count
0 East Forests Agriculture Rural col_0 242.319898
1 East Forests Agriculture Rural col_1 162.054407
2 East Forests Agriculture Rural col_2 214.041309
3 East Forests Agriculture Rural col_3 278.025187
4 East Forests Agriculture Rural col_4 29.897229

But Family column is not beautiful. So let change it to values from cols_family_strong, by function add_indexed_cols from package ratedit:

from rat.ratedit import add_indexed_cols

df_alot = add_indexed_cols(df_alot, 'Family', cols_family_strong)
df_alot.head(5)
District Employment Environment Sex Family status Count
0 East Forests Agriculture Rural Female Single 242.319898
1 East Forests Agriculture Rural Female Marriged 162.054407
2 East Forests Agriculture Rural Female Widower 214.041309
3 East Forests Agriculture Rural Male Single 278.025187
4 East Forests Agriculture Rural Male Marriged 29.897229

About

RAT - Restoring Aggregated Tables

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published