How to fill missing values by mean in each group with Python pandas?

Estimated read time 2 min read

To fill missing values by the mean in each group using Python pandas, you can utilize the groupby function along with the transform function to calculate the group-wise mean and fill the missing values accordingly. Here’s an example:

import pandas as pd

# Create a sample DataFrame
data = {'Group': ['A', 'A', 'B', 'B', 'B'],
        'Value': [1, 2, 3, None, 5]}
df = pd.DataFrame(data)

# Calculate the group-wise mean
group_means = df.groupby('Group')['Value'].transform('mean')

# Fill missing values with the group-wise mean
df['Value'] = df['Value'].fillna(group_means)

# Display the updated DataFrame
print(df)

In this example, we have a DataFrame df with two columns: ‘Group’ and ‘Value’. The ‘Value’ column contains some missing values represented by None.

We start by calculating the group-wise mean using df.groupby('Group')['Value'].transform('mean'). This operation groups the DataFrame by the ‘Group’ column and calculates the mean of the ‘Value’ column within each group. The resulting series will have the same length as the original DataFrame and hold the mean values corresponding to each group.

Next, we fill the missing values in the ‘Value’ column with the group-wise mean using df['Value'].fillna(group_means). This operation replaces the missing values in the ‘Value’ column with the corresponding mean value from the group_means series.

Finally, we display the updated DataFrame using print(df).

After running this code, the missing value in the ‘Value’ column of group ‘B’ is filled with the mean value of 4, resulting in the following output:

  Group  Value
0     A    1.0
1     A    2.0
2     B    3.0
3     B    4.0
4     B    5.0

Now, all missing values in each group have been filled with the group-wise mean.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply