Can pandas automatically recognize dates

Question

Today I was positively surprised by the fact that while reading data from a data file  for example  pandas is able to recognize types of values   df   pandas read csv  test dat   delimiter r  s    names   col1   col2   col3      For example it can be checked in this way   for i  r in df iterrows        print type r  col1     type r  col2     type r  col3      In particular integer  floats and strings were recognized correctly  However  I have a column that has dates in the following format  2013-6-4  These dates were recognized as strings  not as python date-objects   Is there a way to  learn  pandas to recognized dates

User · Answer

pandas read csv method is great for parsing dates   Complete documentation at http   pandas pydata org pandas-docs stable generated pandas io parsers read csv html  you can even have the different date parts in different columns and pass the parameter   parse dates   boolean  list of ints or names  list of lists  or dict If True - gt  try parsing the index  If  1  2  3  - gt  try parsing columns 1  2  3 each as a separate date column  If   1  3   - gt  combine columns 1 and 3 and parse as a single date column      foo       1  3   - gt  parse columns 1  3 as date and call result    foo      The default sensing of dates works great  but it seems to be biased towards north american Date formats    If you live elsewhere you might occasionally be caught by the results    As far as I can remember 1 6 2000  means 6 January in the USA as opposed to 1 Jun where I live   It is smart enough to swing them around if dates like 23 6 2000 are used   Probably safer to stay with YYYYMMDD variations of date though    Apologies to pandas developers here but i have not tested it with local dates recently   you can use the date parser parameter to pass a function to convert your format   date parser   function Function to use for converting a sequence of string columns to an array of datetime instances  The default uses dateutil parser parser to do the conversion

User · Answer

Yes - according to the pandas read csv documentation      Note  A fast-path exists for iso8601-formatted dates    So if your csv has a column named datetime and the dates looks like 2013-01-01T01 01 for example  running this will make pandas  I m on v0 19 2  pick up the date and time automatically   df   pd read csv  test csv   parse dates   datetime     Note that you need to explicitly pass parse dates  it doesn t work without   Verify with   df dtypes  You should see the datatype of the column is datetime64 ns

User · Answer

In addition to what the other replies said  if you have to parse very large files with hundreds of thousands of timestamps  date parser can prove to be a huge performance bottleneck  as it s a Python function called once per row  You can get a sizeable performance improvements by instead keeping the dates as text while parsing the CSV file and then converting the entire column into dates in one go    For a data column df   pd read csv infile  parse dates   mydatetime     date    time      df  mydatetime     pd to datetime df  mydatetime    exact True  cache True  format   Y- m- d  H  M  S      For a DateTimeIndex df   pd read csv infile  parse dates   mydatetime     date    time     index col  mydatetime    df index   pd to datetime df index  exact True  cache True  format   Y- m- d  H  M  S      For a MultiIndex df   pd read csv infile  parse dates   mydatetime     date    time     index col   mydatetime    num     idx mydatetime   df index get level values 0  idx num   df index get level values 1  idx mydatetime   pd to datetime idx mydatetime  exact True  cache True  format   Y- m- d  H  M  S   df index   pd MultiIndex from arrays  idx mydatetime  idx num    For my use case on a file with 200k rows  one timestamp per row   that cut down processing time from about a minute to less than a second

User · Answer

When merging two columns into a single datetime column  the accepted answer generates an error  pandas version 0 20 3   since the columns are sent to the date parser function separately    The following works   def dateparse d t       dt   d         t     return pd datetime strptime dt    d  m  Y  H  M  S    df   pd read csv infile  parse dates   datetime     date    time     date parser dateparse

User · Answer

You could use pandas to datetime   as recommended in the documentation for pandas read csv        If a column or index contains an unparseable date  the entire column   or index will be returned unaltered as an object data type  For   non-standard datetime parsing  use pd to datetime after pd read csv    Demo    gt  gt  gt  D     date    2013-6-4    gt  gt  gt  df   pd DataFrame D  index  0    gt  gt  gt  df        date 0  2013-6-4  gt  gt  gt  df dtypes date    object dtype  object  gt  gt  gt  df  date     pd to datetime df date  format   Y- m- d    gt  gt  gt  df         date 0 2013-06-04  gt  gt  gt  df dtypes date    datetime64 ns  dtype  object

User · Answer

You should add parse dates True  or parse dates   column name   when reading  thats usually enough to magically parse it  But there are always weird formats which need to be defined manually  In such a case you can also add a date parser function  which is the most flexible way possible  Suppose you have a column  datetime  with your string  then  from datetime import datetime dateparse   lambda x  datetime strptime x    Y- m- d  H  M  S    df   pd read csv infile  parse dates   datetime    date parser dateparse   This way you can even combine multiple columns into a single datetime column  this merges a  date  and a  time  column into a single  datetime  column  dateparse   lambda x  datetime strptime x    Y- m- d  H  M  S    df   pd read csv infile  parse dates   datetime     date    time     date parser dateparse   You can find directives  i e  the letters to be used for different formats  for strptime and strftime in this page

User · Answer

While loading csv file contain date column We have two approach to to make pandas to  recognize date column i e   Pandas explicit recognize the format by arg date parser mydateparser   Pandas implicit recognize the format by agr infer datetime format True   Some of the date column data  01 01 18  01 02 18  Here we don t know the first two things It may be month or day  So in this case we have to use  Method 1 - Explicit pass the format      mydateparser   lambda x  pd datetime strptime x    m  d  y       df   pd read csv file name  parse dates   date col name    date parser mydateparser    Method 2 - Implicit or Automatically recognize the format  df   pd read csv file name  parse dates  date col name  infer datetime format True

User · Answer

If performance matters to you make sure you time   import sys import timeit import pandas as pd  print  Python  s on  s     sys version  sys platform   print  Pandas version  s    pd   version     repeat   3 numbers   100  def time statement   setup None       print  min          timeit Timer statement  setup  setup or setup  repeat              repeat  numbers     print  Format  m  d  y   setup      import pandas as pd import io  data   io StringIO      ProductCode Date            x1 07 29 15 x2 07 29 15 x3 07 29 15 x4 07 30 15 x5 07 29 15 x6 07 29 15 x7 07 29 15 y7 08 05 15 x8 08 05 15 z3 08 05 15       100      time  pd read csv data   data seek 0    time  pd read csv data  parse dates   Date     data seek 0    time  pd read csv data  parse dates   Date           infer datetime format True   data seek 0    time  pd read csv data  parse dates   Date           date parser lambda x  pd datetime strptime x    m  d  y     data seek 0     print  Format  Y- m- d  H  M  S   setup      import pandas as pd import io  data   io StringIO      ProductCode Date            x1 2016-10-15 00 00 43 x2 2016-10-15 00 00 56 x3 2016-10-15 00 00 56 x4 2016-10-15 00 00 12 x5 2016-10-15 00 00 34 x6 2016-10-15 00 00 55 x7 2016-10-15 00 00 06 y7 2016-10-15 00 00 01 x8 2016-10-15 00 00 00 z3 2016-10-15 00 00 02       1000      time  pd read csv data   data seek 0    time  pd read csv data  parse dates   Date     data seek 0    time  pd read csv data  parse dates   Date           infer datetime format True   data seek 0    time  pd read csv data  parse dates   Date           date parser lambda x  pd datetime strptime x    Y- m- d  H  M  S     data seek 0      prints   Python 3 7 1  v3 7 1 260ec2c36a  Oct 20 2018  03 13 28    Clang 6 0  clang-600 0 57   on darwin Pandas version 0 23 4 Format  m  d  y 0 19123052499999993 8 20691274 8 143124389 1 2384357139999977 Format  Y- m- d  H  M  S 0 5238807110000039 0 9202787830000005 0 9832778819999959 12 002349824999996   So with iso8601-formatted date   Y- m- d  H  M  S is apparently an iso8601-formatted date  I guess the T can be dropped and replaced by a space  you should not specify infer datetime format  which does not make a difference with more common ones either apparently  and passing your own parser in just cripples performance  On the other hand  date parser does make a difference with not so standard day formats  Be sure to time before you optimize  as usual

User · Answer

Perhaps the pandas interface has changed since  Rutger answered  but in the version I m using  0 15 2   the date parser function receives a list of dates instead of a single value  In this case  his code should be updated like so   dateparse   lambda dates   pd datetime strptime d    Y- m- d  H  M  S   for d in dates   df   pd read csv infile  parse dates   datetime    date parser dateparse

[python] Can pandas automatically recognize dates?

Examples related to python

Examples related to date

Examples related to types

Examples related to dataframe

Examples related to pandas