Skip to content

Small utility that takes two identically shaped Pandas DataFrames and diffs them by a list of keys.

Notifications You must be signed in to change notification settings

Zetifi/pandas-merge-diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pandas Merge Diff

Small utility that takes two identically shaped Pandas DataFrames and diffs them by a list of keys.

merge_diff(compare: pd.DataFrame, reference: pd.DataFrame, keys) -> pd.DataFrame:

Will return a new DataFrame with an action columns which will be set to new|deleted|changed|identical

Action Explanation
new The key(s) was not found in the reference rows.
deleted The key(s) was not found in the compare rows.
changed The key(s) was found in both rows, but one or more compared values are different.
identical The key(s) was found in both rows, and the compared values are identical.

Example Usage

Given two frames:

compare_df = pd.DataFrame(
    [
        {
            "key1": "AABB",
            "key2": "AABB",
            "email": "[email protected]",
            "name": "BL",
        },
        {
            "key1": "FFAA",
            "key2": "FFBB",
            "email": "[email protected]",
            "name": "CM",
        },
        {
            "key1": "BBCC",
            "key2": "BBCC",
            "email": "[email protected]",
            "name": "Mik Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas",
        },
        {
            "key1": "KKOO",
            "key2": "KKOO",
            "email": "[email protected]",
            "name": "Mik Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas",
        },
    ]
)

reference_df = pd.DataFrame(
    [
        {
            "key1": "AABB",
            "key2": "AABB",
            "email": "[email protected]",
            "name": "BL",
        },
        {
            "key1": "FFAA",
            "key2": "CCFF",
            "email": "[email protected]",
            "name": "CM",
        },
        {
            "key1": "KKOO",
            "key2": "KKOO",
            "email": "[email protected]",
            "name": "Mik Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas",
        },
    ]
)

assert_frame_equal(
    pandas_merge_diff.merge_diff(compare_df, reference_df, keys=["key1", "key2"]),
    pd.DataFrame(
        [
            {
                "key1": "AABB",
                "key2": "AABB",
                "email": "[email protected]",
                "name": "BL",
                "action": "identical",
            },
            {
                "key1": "BBCC",
                "key2": "BBCC",
                "email": "[email protected]",
                "name": "Mik Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas",
                "action": "new",
            },
            {
                "key1": "FFAA",
                "key2": "CCFF",
                "email": "[email protected]",
                "name": "CM",
                "action": "deleted",
            },
            {
                "key1": "FFAA",
                "key2": "FFBB",
                "email": "[email protected]",
                "name": "CM",
                "action": "new",
            },
            {
                "key1": "KKOO",
                "key2": "KKOO",
                "email": "[email protected]",
                "name": "Mik Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas",
                "action": "changed",
            },
        ]
    ),
)

About

Small utility that takes two identically shaped Pandas DataFrames and diffs them by a list of keys.

Topics

Resources

Stars

Watchers

Forks

Languages