[csv] Convert a dta file to csv without Stata software

In Python, one can use statsmodels.iolib.foreign.genfromdta to read Stata datasets. In addition, there is also a wrapper of the aforementioned function which can be used to read a Stata file directly from the web: statsmodels.datasets.webuse.

Nevertheless, both of the above rely on the use of the pandas.io.stata.StataReader.data, which is now a legacy function and has been deprecated. As such, the new pandas.read_stata function should now always be used instead.

According to the source file of stata.py, as of version 0.23.0, the following are supported:

Stata data file versions:

  • 104
  • 105
  • 108
  • 111
  • 113
  • 114
  • 115
  • 117
  • 118

Valid encodings:

  • ascii
  • us-ascii
  • latin-1
  • latin_1
  • iso-8859-1
  • iso8859-1
  • 8859
  • cp819
  • latin
  • latin1
  • L1

As others have noted, the pandas.to_csv function can then be used to save the file into disk. A related function numpy.savetxt can also save the data as a text file.


EDIT:

The following details come from help dtaversion in Stata 15.1:

        Stata version     .dta file format
        ----------------------------------------
               1               102
            2, 3               103
               4               104
               5               105
               6               108
               7            110 and 111
            8, 9            112 and 113
          10, 11               114
              12               115
              13               117
              14 and 15        118 (# of variables <= 32,767)
              15               119 (# of variables > 32,767, Stata/MP only)
        ----------------------------------------
        file formats 103, 106, 107, 109, and 116
        were never used in any official release.