How to Convert Text Files to Parquet Files in Python?

Estimated read time 2 min read

To convert text files to Parquet files in Python, you can use the pandas library along with the pyarrow library. Here’s an example of how to perform this conversion:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Read the text file into a pandas DataFrame
df = pd.read_csv('input.txt', delimiter='\t')

# Convert the DataFrame to an Arrow Table
table = pa.Table.from_pandas(df)

# Write the Arrow Table to a Parquet file
pq.write_table(table, 'output.parquet')

In this example, we first use pd.read_csv() from pandas to read the text file 'input.txt' into a pandas DataFrame. You might need to adjust the delimiter parameter based on the actual delimiter used in your text file.

Next, we convert the DataFrame to an Arrow Table using pa.Table.from_pandas(). The pyarrow library provides an efficient conversion between pandas DataFrames and Arrow Tables.

Finally, we use pq.write_table() from pyarrow.parquet to write the Arrow Table to a Parquet file named 'output.parquet'.

Make sure to have the pandas and pyarrow libraries installed before running the code. You can install them using pip:

pip install pandas pyarrow

After running the code, the text file will be converted to a Parquet file at the specified output path.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply