pandas dataframe convert column type to string or categorical

Question

How do I convert a single column of a pandas dataframe to type string  In the df of housing data below I need to convert zipcode to string so that when I run linear regression  zipcode is treated as categorical and not numeric  Thanks   df   pd DataFrame   zipcode    17384  98125  2680  98107  722  98005  18754  98109  14554  98155    bathrooms    17384  1 5  2680  0 75  722  3 25  18754  1 0  14554  2 5    sqft lot    17384  1650  2680  3700  722  51836  18754  2640  14554  9603    bedrooms    17384  2  2680  2  722  4  18754  2  14554  4    sqft living    17384  1430  2680  1440  722  4670  18754  1130  14554  3180    floors    17384  3 0  2680  1 0  722  2 0  18754  1 0  14554  2 0    print  df         bathrooms  bedrooms  floors  sqft living  sqft lot  zipcode 722         3 25         4     2 0         4670     51836    98005 2680        0 75         2     1 0         1440      3700    98107 14554       2 50         4     2 0         3180      9603    98155 17384       1 50         2     3 0         1430      1650    98125 18754       1 00         2     1 0         1130      2640    98109

User · Answer

To convert a column into a string type  that will be an object column per se in pandas   use astype   df zipcode   zipcode astype str    If you want to get a Categorical column  you can pass the parameter  category  to the function   df zipcode   zipcode astype  category

User · Answer

Prior answers focused on nominal data  e g  unordered   If there is a reason to impose order for an ordinal variable  then one would use      Transform to category df  zipcode category     df  zipcode category   astype  category      Add ordered category df  zipcode ordered     df  zipcode category      Setup the ordering df zipcode ordered cat set categories      new categories    90211  90210   ordered   True  inplace   True      Output IDs df  zipcode ordered id     df zipcode ordered cat codes print df     zipcode category zipcode ordered  zipcode ordered id              90210           90210                   1              90211           90211                   0   More details on setting ordered categories can be found at the pandas website    https   pandas pydata org pandas-docs stable user guide categorical html sorting-and-order

User · Answer

You need astype   df  zipcode     df zipcode astype str   df zipcode   df zipcode astype str      For converting to categorical   df  zipcode     df zipcode astype  category    df zipcode   df zipcode astype  category     Another solution is Categorical   df  zipcode     pd Categorical df zipcode    Sample with data   import pandas as pd  df   pd DataFrame   zipcode    17384  98125  2680  98107  722  98005  18754  98109  14554  98155    bathrooms    17384  1 5  2680  0 75  722  3 25  18754  1 0  14554  2 5    sqft lot    17384  1650  2680  3700  722  51836  18754  2640  14554  9603    bedrooms    17384  2  2680  2  722  4  18754  2  14554  4    sqft living    17384  1430  2680  1440  722  4670  18754  1130  14554  3180    floors    17384  3 0  2680  1 0  722  2 0  18754  1 0  14554  2 0        print  df         bathrooms  bedrooms  floors  sqft living  sqft lot  zipcode 722         3 25         4     2 0         4670     51836    98005 2680        0 75         2     1 0         1440      3700    98107 14554       2 50         4     2 0         3180      9603    98155 17384       1 50         2     3 0         1430      1650    98125 18754       1 00         2     1 0         1130      2640    98109  print  df dtypes  bathrooms      float64 bedrooms         int64 floors         float64 sqft living      int64 sqft lot         int64 zipcode          int64 dtype  object  df  zipcode     df zipcode astype  category    print  df         bathrooms  bedrooms  floors  sqft living  sqft lot zipcode 722         3 25         4     2 0         4670     51836   98005 2680        0 75         2     1 0         1440      3700   98107 14554       2 50         4     2 0         3180      9603   98155 17384       1 50         2     3 0         1430      1650   98125 18754       1 00         2     1 0         1130      2640   98109  print  df dtypes  bathrooms       float64 bedrooms          int64 floors          float64 sqft living       int64 sqft lot          int64 zipcode        category dtype  object

User · Answer

With pandas  gt   1 0 there is now a dedicated string datatype  1  You can convert your column to this pandas string datatype using  astype  string    df  zipcode     df  zipcode   astype  string    2  This is different from using str which sets the pandas object datatype  df  zipcode     df  zipcode   astype str   3  For changing into categorical datatype use  df  zipcode     df  zipcode   astype  category    You can see this difference in datatypes when you look at the info of the dataframe  df   pd DataFrame        zipcode str    90210  90211         zipcode string    90210  90211        zipcode category    90210  90211       df  zipcode str     df  zipcode str   astype str  df  zipcode string     df  zipcode str   astype  string   df  zipcode category     df  zipcode category   astype  category    df info      you can see that the first column has dtype object   while the second column has the new dtype string   the third column has dtype category      Column            Non-Null Count  Dtype    ---  ------            --------------  -----     0   zipcode str       2 non-null      object    1   zipcode string    2 non-null      string    2   zipcode category  2 non-null      category dtypes  category 1   object 1   string 1    From the docs   The  string  extension type solves several issues with object-dtype NumPy arrays   You can accidentally store a mixture of strings and non-strings in an object dtype array  A StringArray can only store strings   object dtype breaks dtype-specific operations like DataFrame select dtypes    There isn   t a clear way to select just text while excluding non-text  but still object-dtype columns   When reading code  the contents of an object dtype array is less clear than string      More info on working with the new string datatype can be found here  https   pandas pydata org pandas-docs stable user guide text html

[pandas] pandas dataframe convert column type to string or categorical

Examples related to pandas

Examples related to dataframe

Examples related to type-conversion

Examples related to categorical-data