How to Shape Python Data Using Pandas?

Estimated read time 3 min read

In Python, you can use the pandas library to manipulate and shape data. Here are some common ways to shape data using pandas:

  1. Reshaping data with pivot_table and melt: The pivot_table function allows you to reshape data from long to wide format or vice versa, while the melt function allows you to unpivot data from wide to long format. These functions are especially useful for organizing data for analysis or visualization.
  2. Handling missing data with fillna and dropna: The fillna function allows you to replace missing values with a specified value or method, while the dropna function allows you to remove rows or columns with missing values.
  3. Combining data with concat and merge: The concat function allows you to concatenate multiple dataframes along a specified axis, while the merge function allows you to join two or more dataframes based on a common column or index.
  4. Grouping data with groupby: The groupby function allows you to group data based on one or more columns and perform aggregate functions on the groups, such as sum, mean, or count.
  5. Reshaping data with stack and unstack: The stack function allows you to pivot a level of column labels into the row index, while the unstack function allows you to pivot a level of row index into the column labels.

Here’s an example code that demonstrates some of these techniques:

import pandas as pd
import numpy as np

# Create sample dataframe
df = pd.DataFrame({
    'A': ['foo', 'foo', 'bar', 'bar'],
    'B': ['one', 'two', 'one', 'two'],
    'C': [1, 2, 3, 4],
    'D': [5, 6, np.nan, 8]
})

# Pivot table to reshape data
pivot_df = pd.pivot_table(df, values='C', index='A', columns='B')

# Melt table to unpivot data
melt_df = pd.melt(df, id_vars=['A', 'B'], value_vars=['C', 'D'])

# Fill missing values with mean
filled_df = df.fillna(df.mean())

# Drop rows with missing values
dropped_df = df.dropna()

# Concatenate dataframes
concat_df = pd.concat([df, pivot_df])

# Merge dataframes
merge_df = pd.merge(df, pivot_df, on='A')

# Group data by column A and get mean of column C
grouped_df = df.groupby('A')['C'].mean()

# Stack dataframe
stacked_df = pivot_df.stack()

# Unstack dataframe
unstacked_df = stacked_df.unstack(level='B')

In this example, we create a sample dataframe df with four columns: A, B, C, and D. We then use various pandas functions to reshape and manipulate the data in different ways.

Note that there are many other functions and techniques available in pandas for shaping and manipulating data, depending on your specific needs and goals.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply