How to Convert a Categorical Variable to Binary in Python?

Estimated read time 2 min read

To convert a categorical variable to binary (dummy variables) in Python, you can use the pd.get_dummies() function from the pandas library. Here’s an example:

import pandas as pd

# Create a DataFrame with a categorical variable
data = {'Category': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)

# Convert the categorical variable to binary
binary_df = pd.get_dummies(df['Category'])

# Concatenate the original DataFrame with the binary representation
df_binary = pd.concat([df, binary_df], axis=1)

# Print the result

In this example, we have a DataFrame df with a single categorical variable “Category”. The variable has four unique categories: ‘A’, ‘B’, and ‘C’.

To convert the categorical variable to binary, we use the pd.get_dummies() function, passing the column containing the categorical variable (df['Category']) as an argument. This function will create a new DataFrame (binary_df) with binary representation for each category.

We then use the pd.concat() function to concatenate the original DataFrame df with the binary representation (binary_df) along the columns (axis=1).

Finally, we print the resulting DataFrame (df_binary) that contains the original categorical variable and its binary representation.

The output will be:

  Category  A  B  C
0        A  1  0  0
1        B  0  1  0
2        A  1  0  0
3        C  0  0  1
4        B  0  1  0

In this case, the original categorical variable “Category” is converted to binary representation using dummy variables. Each unique category is represented by a separate column with binary values: ‘A’ is represented by column ‘A’, ‘B’ by column ‘B’, and ‘C’ by column ‘C’.

You can adjust the column name and the DataFrame according to your specific use case.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply