how to check the dtype of a column in python pandas

Question

I need to use different functions to treat numeric columns and string columns  What I am doing now is really dumb   allc   list  agg loc     agg dtypes  np float64   agg dtypes  np int    columns  for y in allc      treat numeric agg y        allc   list  agg loc     agg dtypes  np float64  amp  agg dtypes  np int    columns  for y in allc      treat str agg y         Is there a more elegant way to do this  E g   for y in agg columns      if dtype agg y       string              treat str agg y       elif dtype agg y       string              treat numeric agg y

User · Accepted Answer

You can access the data-type of a column with dtype   for y in agg columns      if agg y  dtype    np float64 or agg y  dtype    np int64             treat numeric agg y       else            treat str agg y

User · Answer

To pretty print the column data types  To check the data types after  for example  an import from a file  def printColumnInfo df       template   -8s  -30s  s      print template     Type    Column Name    Example Value        print  -  53      for c in df columns          print template    df c  dtype  c  df c  iloc 1       Illustrative output   Type     Column Name                    Example Value ----------------------------------------------------- int64    Age                            49 object   Attrition                      No object   BusinessTravel                 Travel Frequently float64  DailyRate                      279 0

User · Answer

If you want to mark the type of a dataframe column as a string  you can do  df  A   dtype kind  An example  In  8   df   pd DataFrame   1  a  1 2   2  b  2 3    In  9   df 0  dtype kind  df 1  dtype kind  df 2  dtype kind Out 9     i    O    f    The answer for your code  for y in agg columns      if agg y  dtype kind     f  or agg y  dtype kind     i              treat numeric agg y       else            treat str agg y    Note   uint and UInt are of kind u  not kind i  Consider the dtype introspection utility functions  e g  pd api types is integer dtype

User · Answer

I know this is a bit of an old thread but with pandas 19 02  you can do   df select dtypes include   float64    apply your function  df select dtypes exclude   string   object    apply your other function    http   pandas pydata org pandas-docs version 0 19 2 generated pandas DataFrame select dtypes html

User · Answer

Asked question title is general  but authors use case stated in the body of the question is specific  So any other answers may be used     But in order to fully answer the title question it should be clarified that it seems like all of the approaches may fail in some cases and require some rework  I reviewed all of them  and some additional  in decreasing of reliability order  in my opinion    1  Comparing types directly via     accepted answer    Despite the fact that this is accepted answer and has most upvotes count  I think this method should not be used at all  Because in fact this approach is discouraged in python as mentioned several times here  But if one still want to use it - should be aware of some pandas-specific dtypes like pd CategoricalDType  pd PeriodDtype  or pd IntervalDtype  Here one have to use extra type    in order to recognize dtype correctly   s   pd Series  pd Period  2002-03   D    pd Period  2012-02-01    D     s s dtype    pd PeriodDtype     Not working type s dtype     pd PeriodDtype   working    gt  gt  gt  0    2002-03-01  gt  gt  gt  1    2012-02-01  gt  gt  gt  dtype  period D   gt  gt  gt  False  gt  gt  gt  True   Another caveat here is that type should be pointed out precisely   s   pd Series  1 2   s s dtype    np int64   Working s dtype    np int32   Not working   gt  gt  gt  0    1  gt  gt  gt  1    2  gt  gt  gt  dtype  int64  gt  gt  gt  True  gt  gt  gt  False   2  isinstance   approach   This method has not been mentioned in answers so far     So if direct comparing of types is not a good idea - lets try built-in python function for this purpose  namely - isinstance    It fails just in the beginning  because assumes that we have some objects  but pd Series or pd DataFrame may be used as just empty containers with predefined dtype but no objects in it   s   pd Series     dtype bool  s   gt  gt  gt  Series     dtype  bool    But if one somehow overcome this issue  and wants to access each object  for example  in the first row and checks its dtype like something like that   df   pd DataFrame   int    12  2    dt    pd Timestamp  2013-01-02    pd Timestamp  2016-10-20                        index     A    B    for col in df columns      df col  dtype   is int64    s    isinstance df loc  A   col   np int64    gt  gt  gt   dtype  int64     is int64   True    gt  gt  gt   dtype   lt M8 ns      is int64   False     It will be misleading in the case of mixed type of data in single column   df2   pd DataFrame   data    12  pd Timestamp  2013-01-02                        index     A    B    for col in df2 columns      df2 col  dtype   is int64    s    isinstance df2 loc  A   col   np int64    gt  gt  gt   dtype  O     is int64   False     And last but not least - this method cannot directly recognize Category dtype  As stated in docs       Returning a single item from categorical data will also return the value  not a categorical of length    1       df  int     df  int   astype  category   for col in df columns      df col  dtype   is int64    s    isinstance df loc  A   col   np int64    gt  gt  gt   CategoricalDtype categories  2  12   ordered False    is int64   True    gt  gt  gt   dtype   lt M8 ns      is int64   False     So this method is also almost inapplicable   3  df dtype kind approach   This method yet may work with empty pd Series or pd DataFrames but has another problems    First - it is unable to differ some dtypes   df   pd DataFrame   prd     pd Period  2002-03   D    pd Period  2012-02-01    D                         str      s1    s2                        cat     1  -1    df  cat     df  cat   astype  category   for col in df        kind will define all columns as  Object      print  df col  dtype  df col  dtype kind    gt  gt  gt  period D  O  gt  gt  gt  object O  gt  gt  gt  category O   Second  what is actually still unclear for me  it even returns on some dtypes None   4  df select dtypes approach   This is almost what we want  This method designed inside pandas so it handles most corner cases mentioned earlier - empty DataFrames  differs numpy or pandas-specific dtypes well  It works well with single dtype like  select dtypes  bool    It may be used even for selecting groups of columns based on dtype   test   pd DataFrame   bool    False  True    int64   -1 2    int32   -1 2   float    -2 5  3 4                         compl  np array  1-1j  5                          dt      pd Timestamp  2013-01-02    pd Timestamp  2016-10-20                           td      pd Timestamp  2012-03-02  - pd Timestamp  2016-10-20                                  pd Timestamp  2010-07-12  - pd Timestamp  2000-11-10                           prd     pd Period  2002-03   D    pd Period  2012-02-01    D                           intrv  pd arrays IntervalArray  pd Interval 0  0 1   pd Interval 1  5                           str      s1    s2                          cat     1  -1                         obj      1 2 3    5435 35 -52 14                          test  int32     test  int32   astype np int32  test  cat     test  cat   astype  category     Like so  as stated in the docs   test select dtypes  number     gt  gt  gt      int64   int32   float   compl   td  gt  gt  gt  0      -1      -1   -2 5     1-1j   -1693 days  gt  gt  gt  1       2       2    3 4     5 0j    3531 days   On may think that here we see first unexpected  at used to be for me  question  results - TimeDelta is included into output DataFrame  But as answered in contrary it should be so  but one have to be aware of it  Note that bool dtype is skipped  that may be also undesired for someone  but it s due to bool and number are in different  subtrees  of numpy dtypes  In case with bool  we may use test select dtypes   bool    here    Next restriction of this method is that for current version of pandas  0 24 2   this code  test select dtypes  period   will raise NotImplementedError   And another thing is that it s unable to differ strings from other objects   test select dtypes  object     gt  gt  gt      str     obj  gt  gt  gt  0    s1      1  2  3   gt  gt  gt  1    s2      5435  35  -52  14    But this is  first - already mentioned in the docs  And second - is not the problem of this method  rather the way strings are stored in DataFrame  But anyway this case have to have some post processing   5  df api types is XXX dtype approach   This one is intended to be most robust and native way to achieve dtype recognition  path of the module where functions resides says by itself  as i suppose  And it works almost perfectly  but still have at least one caveat and still have to somehow distinguish string columns   Besides  this may be subjective  but this approach also has more  human-understandable  number dtypes group processing comparing with  select dtypes  number     for col in test columns      if pd api types is numeric dtype test col            print  test col  dtype    gt  gt  gt  bool  gt  gt  gt  int64  gt  gt  gt  int32  gt  gt  gt  float64  gt  gt  gt  complex128   No timedelta and bool is included  Perfect   My pipeline exploits exactly this functionality at this moment of time  plus a bit of post hand processing   Output   Hope I was able to argument the main point - that all discussed approaches may be used  but only pd DataFrame select dtypes   and pd api types is XXX dtype should be really considered as the applicable ones

User · Answer

In pandas 0 20 2 you can do   from pandas api types import is string dtype from pandas api types import is numeric dtype  is string dtype df  A     gt  gt  gt  gt  True  is numeric dtype df  B     gt  gt  gt  gt  True   So your code becomes   for y in agg columns      if  is string dtype agg y             treat str agg y       elif  is numeric dtype agg y             treat numeric agg y

[python] how to check the dtype of a column in python pandas

Examples related to python

Examples related to pandas