How to Compare Two CSV Files and Get the Difference in Python?

Estimated read time 2 min read

To compare two CSV files and get the differences in Python, you can use the Pandas library. Here’s a step-by-step approach:

  1. Import the necessary libraries:
import pandas as pd
  1. Read the CSV files into Pandas data frames:
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

Replace 'file1.csv' and 'file2.csv' with the paths or filenames of your CSV files.

  1. Compare the two data frames and get the differences:
diff_df = pd.concat([df1, df2]).drop_duplicates(keep=False)

The concat() function is used to combine the two data frames vertically, and then the drop_duplicates() function is used to remove any duplicate rows. By setting keep=False, it keeps only the unique rows.

  1. Optionally, save the differences to a new CSV file:
diff_df.to_csv('differences.csv', index=False)

Replace 'differences.csv' with the desired filename for the CSV file containing the differences.

The resulting diff_df data frame contains the rows from the original two data frames that are different or unique. You can perform further analysis or manipulations on this data frame as needed.

Note that this method compares the entire rows of the two CSV files. If you want to compare specific columns or perform more advanced comparisons, you may need to customize the approach accordingly.

Make sure you have the Pandas library installed (pip install pandas) before running this code. Adjust the file paths, column names, and other details to match your specific CSV files.

By following these steps, you can efficiently compare two CSV files in Python and obtain the differences between them.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply