How to Convert a Dictionary to a Spark DataFrame in Python?

Estimated read time 2 min read

To convert a dictionary to a Spark DataFrame in Python, you can use the createDataFrame() function provided by the pyspark.sql module. Here’s an example:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.getOrCreate()

# Sample dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [30, 25, 35],
        'City': ['New York', 'London', 'Paris']}

# Convert dictionary to Spark DataFrame
df = spark.createDataFrame(data)

# Show the DataFrame
df.show()

Output:

+-----+---+---------+
| Name|Age|     City|
+-----+---+---------+
| John| 30|New York |
|Alice| 25|  London |
|  Bob| 35|   Paris |
+-----+---+---------+

In this example, we first import the SparkSession class from pyspark.sql module.

We create a SparkSession using the SparkSession.builder.getOrCreate() method.

Next, we define a sample dictionary data representing columns and their respective values.

To convert the dictionary to a Spark DataFrame, we use the createDataFrame() function provided by the SparkSession object spark. We pass the data dictionary as an argument to this function.

The createDataFrame() function automatically infers the schema from the dictionary and creates a DataFrame. The column names and data types are determined based on the dictionary keys and values.

Finally, we use the show() method on the DataFrame to display its contents.

Note that the dictionary should have consistent values across all keys for each record. In case the dictionary has lists of different lengths for different keys, you may need to preprocess the data to ensure a consistent structure before creating the DataFrame.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply