How to Compare Two DataFrames in Python to Find Differences?

Estimated read time 2 min read

To compare two DataFrames in Python and find the differences between them, you can use the Pandas library. Here’s an example implementation:

import pandas as pd

def compare_dataframes(df1, df2):
    # Find rows that are different or missing in df2
    diff_df = df1.compare(df2)

    return diff_df

# Example DataFrames to compare
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})

# Compare the DataFrames
differences = compare_dataframes(df1, df2)

if differences.empty:
    print("The DataFrames are identical.")
else:
    print("Differences found:")
    print(differences)

In this example, the compare_dataframes function takes two DataFrames (df1 and df2) as input and compares them using the compare method available in Pandas.

The compare method compares the two DataFrames column-wise and returns a new DataFrame (diff_df) that contains the differences. If there are no differences, the diff_df will be empty.

Finally, the script checks if there are any differences by checking if the diff_df is empty. If there are differences, it prints them; otherwise, it prints a message indicating that the DataFrames are identical.

In the example, df1 and df2 have different values in the third row of column ‘A’ and column ‘B’, so the output will be:

Differences found:
     A    B
2  3.0  6.0
2  4.0  7.0

You can replace df1 and df2 with your own DataFrames to compare.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply