How to Convert Strings to Categories or Integers in a Python Dataframe?

Estimated read time 2 min read

To convert strings to categories or integers in a Python DataFrame, you can use the pandas library. Here’s how you can do it:

  1. Convert Strings to Categories:
import pandas as pd

# Create a DataFrame with a column containing strings
data = {'Category': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)

# Convert the column to a categorical data type
df['Category'] = df['Category'].astype('category')

# Print the DataFrame
print(df)

In this example, the Category column is converted to a categorical data type using the astype function with the argument 'category'.

  1. Convert Strings to Integers:
import pandas as pd

# Create a DataFrame with a column containing strings
data = {'Numeric': ['10', '20', '30', '40', '50']}
df = pd.DataFrame(data)

# Convert the column to integers
df['Numeric'] = df['Numeric'].astype(int)

# Print the DataFrame
print(df)

In this example, the Numeric column is converted to integers using the astype function with the argument int.

By converting strings to categories, you can save memory and improve performance when working with large datasets where the column values have a limited set of possible values. Converting strings to integers is useful when you want to perform numerical operations or comparisons on the values.

Note that when converting strings to categories or integers, make sure that the strings in the column can be appropriately converted. If there are invalid or missing values, you may need to handle them separately before performing the conversion.

You May Also Like

More From Author

+ There are no comments

Add yours

Leave a Reply