To convert text files to Parquet files in Python, you can use the pandas
library along with the pyarrow
library. Here’s an example of how to perform this conversion:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
# Read the text file into a pandas DataFrame
df = pd.read_csv('input.txt', delimiter='\t')
# Convert the DataFrame to an Arrow Table
table = pa.Table.from_pandas(df)
# Write the Arrow Table to a Parquet file
pq.write_table(table, 'output.parquet')
In this example, we first use pd.read_csv()
from pandas
to read the text file 'input.txt'
into a pandas DataFrame. You might need to adjust the delimiter
parameter based on the actual delimiter used in your text file.
Next, we convert the DataFrame to an Arrow Table using pa.Table.from_pandas()
. The pyarrow
library provides an efficient conversion between pandas DataFrames and Arrow Tables.
Finally, we use pq.write_table()
from pyarrow.parquet
to write the Arrow Table to a Parquet file named 'output.parquet'
.
Make sure to have the pandas
and pyarrow
libraries installed before running the code. You can install them using pip:
pip install pandas pyarrow
After running the code, the text file will be converted to a Parquet file at the specified output path.
+ There are no comments
Add yours