How to iterate over columns of pandas dataframe to run regression

Question

I m sure this is simple  but as a complete newbie to python  I m having trouble figuring out how to iterate over variables in a pandas dataframe and run a regression with each   Here s what I m doing   all data      for ticker in   FIUIX    FSAIX    FSAVX    FSTMX        all data ticker    web get data yahoo ticker   1 1 2010    1 1 2015    prices   DataFrame  tic  data  Adj Close   for tic  data in all data iteritems       returns   prices pct change     I know I can run a regression like this   regs   sm OLS returns FIUIX returns FSTMX  fit     but suppose I want to do this for each column in the dataframe  In particular  I want to regress FIUIX on FSTMX  and then FSAIX on FSTMX  and then FSAVX on FSTMX  After each regression I want to store the residuals   I ve tried various versions of the following  but I must be getting the syntax wrong   resids      for k in returns keys        reg   sm OLS returns k  returns FSTMX  fit       resids k    reg resid   I think the problem is I don t know how to refer to the returns column by key  so returns k  is probably wrong   Any guidance on the best way to do this would be much appreciated  Perhaps there s a common pandas approach I m missing

User · Answer

A workaround is to transpose the DataFrame and iterate over the rows   for column name  column in df transpose   iterrows        print column name

User · Answer

This answer is to iterate over selected columns as well as all columns in a DF   df columns gives a list containing all the columns  names in the DF  Now that isn t very helpful if you want to iterate over all the columns  But it comes in handy when you want to iterate over columns of your choosing only    We can use Python s list slicing easily to slice df columns according to our needs  For eg  to iterate over all columns but the first one  we can do   for column in df columns 1        print df column     Similarly to iterate over all the columns in reversed order  we can do   for column in df columns   -1       print df column     We can iterate over all the columns in a lot of cool ways using this technique  Also remember that you can get the indices of all columns easily using   for ind  column in enumerate df columns       print ind  column

User · Answer

I landed on this question as I was looking for a clean iterator of columns only  Series  no names   Unless I am mistaken  there is no such thing  which  if true  is a bit annoying  In particular  one would sometimes like to assign a few individual columns  Series  to variables  e g   x  y   df   x    y       does not work  There is df items   that gets close  but it gives an iterator of tuples  column name  column series    Interestingly  there is a corresponding df keys   which returns df columns  i e  the column names as an Index  so a  b   df   x    y    keys   assigns properly a  x  and b  y   But there is no corresponding df values    and for good reason  as df values is a property and returns the underlying numpy array  One  inelegant  way is to do  x  y    v for    v in df   x    y    items     but it s less pythonic than I d like

User · Answer

You can use iteritems     for name  values in df iteritems        print   name    value   format name name  value values 0

User · Answer

You can index dataframe columns by the position using ix   df1 ix   1    This returns the first column for example   0 would be the index   df1 ix 0     This returns the first row   df1 ix   1    This would be the value at the intersection of row 0 and column 1   df1 ix 0 1    and so on  So you can enumerate   returns keys    and use the number to index the dataframe

User · Answer

Using list comprehension  you can get all the columns names  header     column for column in df

User · Answer

Based on the accepted answer  if an index corresponding to each column is also desired   for i  column in enumerate df       print i  df column    The above df column  type is Series  which can simply be converted into numpy ndarrays   for i  column in enumerate df       print i  np asarray df column

User · Answer

I m a bit late but here s how I did this  The steps    Create a list of all columns Use itertools to take x combinations Append each result R squared value to a result dataframe along with excluded column list Sort the result DF in descending order of R squared to see which is the best fit    This is the code I used on DataFrame called aft tmt  Feel free to extrapolate to your use case    import pandas as pd   setting options to print without truncating output pd set option  display max columns   None  pd set option  display max colwidth   None   import statsmodels formula api as smf import itertools    This section gets the column names of the DF and removes some columns which I don t want to use as predictors  itercols   aft tmt columns tolist   itercols remove  sc97   itercols remove  sc   itercols remove  grc   itercols remove  grc97   print itercols len itercols     results DF regression res   pd DataFrame columns     Rsq    predictors    excluded       excluded cols exc         change 9 to the number of columns you want to combine from N columns   Possibly run an outer loop from 0 to N 2  for x in itertools combinations itercols  9       lmstr       join x      m   smf ols formula    sc       lmstr  data   aft tmt      f   m fit       exc    item for item in x if item not in itercols      regression res   regression res append pd DataFrame   f rsquared  lmstr      join  y for y in itercols if y not in list x       columns     Rsq    predictors    excluded      regression res sort values by  Rsq   ascending   False

User · Answer

for column in df      print df column

[python] How to iterate over columns of pandas dataframe to run regression

Examples related to python

Examples related to pandas

Examples related to statsmodels