Adding calculated column s to a dataframe in pandas

Question

I have an OHLC price data set  that I have parsed from CSV into a Pandas dataframe and resampled to 15 min bars    lt class  pandas core frame DataFrame  gt  DatetimeIndex  500047 entries  1998-05-04 04 45 00 to 2012-08-07 00 15 00 Freq  15T Data columns  Close    363152  non-null values High     363152  non-null values Low      363152  non-null values Open     363152  non-null values dtypes  float64 4    I would like to add various calculated columns  starting with simple ones such as period Range  H-L  and then booleans to indicate the occurrence of price patterns that I will define - e g  a hammer candle pattern  for which a sample definition   def closed in top half of range h l c       return c  gt  l    h-l  2  def lower wick o l c       return min o c -l  def real body o c       return abs c-o   def lower wick at least twice real body o l c       return lower wick o l c   gt   2   real body o c   def is hammer row       return lower wick at least twice real body row  Open   row  Low   row  Close          and closed in top half of range row  High   row  Low   row  Close      Basic problem  how do I map the function to the column  specifically where I would like to reference more than one other column or the whole row or whatever    This post deals with adding two calculated columns off of a single source column  which is close  but not quite it   And slightly more advanced  for price patterns that are determined with reference to more than a single bar  T   how can I reference different rows  e g  T-1  T-2 etc   from within the function definition

User · Answer

The first four functions you list will work on vectors as well, with the exception that lower_wick needs to be adapted. Something like this,

def lower_wick_vec(o, l, c):
    min_oc = numpy.where(o > c, c, o)
    return min_oc - l

where o, l and c are vectors. You could do it this way instead which just takes the df as input and avoid using numpy, although it will be much slower:

def lower_wick_df(df):
    min_oc = df[['Open', 'Close']].min(axis=1)
    return min_oc - l

The other three will work on columns or vectors just as they are. Then you can finish off with

def is_hammer(df):
    lw = lower_wick_at_least_twice_real_body(df["Open"], df["Low"], df["Close"]) 
    cl = closed_in_top_half_of_range(df["High"], df["Low"], df["Close"])
    return cl & lw

Bit operators can perform set logic on boolean vectors, & for and, | for or etc. This is enough to completely vectorize the sample calculations you gave and should be relatively fast. You could probably speed up even more by temporarily working with the numpy arrays underlying the data while performing these calculations.

For the second part, I would recommend introducing a column indicating the pattern for each row and writing a family of functions which deal with each pattern. Then groupby the pattern and apply the appropriate function to each group.

User · Answer

The exact code will vary for each of the columns you want to do  but it s likely you ll want to use the map and apply functions   In some cases you can just compute using the existing columns directly  since the columns are Pandas Series objects  which also work as Numpy arrays  which automatically work element-wise for usual mathematical operations    gt  gt  gt  d     A   B  C 0  11  13  5 1   6   7  4 2   8   3  6 3   4   8  7 4   0   1  7  gt  gt  gt   d A   d B    d C 0    4 800000 1    3 250000 2    1 833333 3    1 714286 4    0 142857  gt  gt  gt  d A  gt  d C 0     True 1     True 2     True 3    False 4    False   If you need to use operations like max and min within a row  you can use apply with axis 1 to apply any function you like to each row   Here s an example that computes min A  B -C  which seems to be like your  lower wick     gt  gt  gt  d apply lambda row  min  row  A    row  B    -row  C    axis 1  0    6 1    2 2   -3 3   -3 4   -7   Hopefully that gives you some idea of how to proceed   Edit  to compare rows against neighboring rows  the simplest approach is to slice the columns you want to compare  leaving off the beginning end  and then compare the resulting slices   For instance  this will tell you for which rows the element in column A is less than the next row s element in column C   d  A    -1   lt  d  C   1     and this does it the other way  telling you which rows have A less than the preceding row s C   d  A   1    lt  d  C    -1    Doing   A    -1  slices off the last element of column A  and doing   C   1   slices off the first element of column C  so when you line these two up and compare them  you re comparing each element in A with the C from the following row

User · Answer

For the second part of your question  you can also use shift  for example   df  t-1     df  t   shift 1    t-1 would then contain the values from t one row above   http   pandas pydata org pandas-docs stable generated pandas DataFrame shift html

User · Answer

You could have is hammer in terms of row  Open   etc  as follows  def is hammer rOpen rLow rClose rHigh       return lower wick at least twice real body rOpen rLow rClose           and closed in top half of range rHigh rLow rClose    Then you can use map   df  isHammer     map is hammer  df  Open    df  Low    df  Close    df  High

[python] Adding calculated column(s) to a dataframe in pandas

Examples related to python

Examples related to pandas