To convert text files to Parquet files in Python, you can use the
pandas library along with the
pyarrow library. Here’s an example of how to perform this conversion:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
# Read the text file into a pandas DataFrame
df = pd.read_csv('input.txt', delimiter='\t')
# Convert the DataFrame to an Arrow Table
table = pa.Table.from_pandas(df)
# Write the Arrow Table to a Parquet file
In this example, we first use
pandas to read the text file
'input.txt' into a pandas DataFrame. You might need to adjust the
delimiter parameter based on the actual delimiter used in your text file.
Next, we convert the DataFrame to an Arrow Table using
pyarrow library provides an efficient conversion between pandas DataFrames and Arrow Tables.
Finally, we use
pyarrow.parquet to write the Arrow Table to a Parquet file named
Make sure to have the
pyarrow libraries installed before running the code. You can install them using pip:
pip install pandas pyarrow
After running the code, the text file will be converted to a Parquet file at the specified output path.