To compare two DataFrames in Python and find the differences between them, you can use the Pandas library. Here’s an example implementation:
import pandas as pd
def compare_dataframes(df1, df2):
# Find rows that are different or missing in df2
diff_df = df1.compare(df2)
return diff_df
# Example DataFrames to compare
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': [4, 5, 7]})
# Compare the DataFrames
differences = compare_dataframes(df1, df2)
if differences.empty:
print("The DataFrames are identical.")
else:
print("Differences found:")
print(differences)
In this example, the compare_dataframes
function takes two DataFrames (df1
and df2
) as input and compares them using the compare
method available in Pandas.
The compare
method compares the two DataFrames column-wise and returns a new DataFrame (diff_df
) that contains the differences. If there are no differences, the diff_df
will be empty.
Finally, the script checks if there are any differences by checking if the diff_df
is empty. If there are differences, it prints them; otherwise, it prints a message indicating that the DataFrames are identical.
In the example, df1
and df2
have different values in the third row of column ‘A’ and column ‘B’, so the output will be:
Differences found:
A B
2 3.0 6.0
2 4.0 7.0
You can replace df1
and df2
with your own DataFrames to compare.
+ There are no comments
Add yours