Normalize columns of pandas data frame

Question

I have a dataframe in pandas where each column has different value range  For example   df   A     B   C 1000  10  0 5 765   5   0 35 800   7   0 09   Any idea how I can normalize the columns of this dataframe where each value is between 0 and 1   My desired output is   A     B    C 1     1    1 0 765 0 5  0 7 0 8   0 7  0 18 which is 0 09 0 5

User · Answer

one easy way by using Pandas   here I want to use mean normalization   normalized df  df-df mean    df std     to use min-max normalization   normalized df  df-df min     df max  -df min      Edit  To address some concerns  need to say that Pandas automatically applies colomn-wise function in the code above

User · Answer

You can do this in one line  DF test   DF test sub DF test mean axis 0   axis 1  DF test mean axis 0    it takes mean for each of the column and then subtracts it mean  from every row mean of particular column subtracts from its row only  and divide by mean only  Finally  we what we get is the normalized data set

User · Answer

You can simply use the pandas DataFrame transform1 function in this way   df transform lambda x  x x max

User · Answer

This is how you do it column-wise using list comprehension    df col  update  df col  - df col  min       df col  max   - df col  min     for col in df columns

User · Answer

You might want to have some of columns being normalized and the others be unchanged like some of regression tasks which data labels or categorical columns are unchanged So I suggest you this pythonic way  It s a combination of  shg and  Cina answers     features to normalize     A    B    C     could be   A   B     df features to normalize    df features to normalize  apply lambda x  x-x min       x max  -x min

User · Answer

If your data is positively skewed  the best way to normalize is to use the log transformation  df   np log10 df

User · Answer

Detailed Example of Normalization Methods  Pandas normalization  unbiased  Sklearn normalization  biased  Does biased-vs-unbiased affect Machine Learning  Mix-max scaling  References  Wikipedia  Unbiased Estimation of Standard Deviation Example Data import pandas as pd df   pd DataFrame                   A   1 2 3                   B   100 300 500                   C  list  abc                   print df     A    B  C 0  1  100  a 1  2  300  b 2  3  500  c  Normalization using pandas  Gives unbiased estimates  When normalizing we simply subtract the mean and divide by standard deviation  df iloc   0 -1    df iloc   0 -1  apply lambda x   x-x mean     x std    axis 0  print df       A    B  C 0 -1 0 -1 0  a 1  0 0  0 0  b 2  1 0  1 0  c  Normalization using sklearn  Gives biased estimates  different from pandas  If you do the same thing with sklearn you will get DIFFERENT output  import pandas as pd  from sklearn preprocessing import StandardScaler scaler   StandardScaler     df   pd DataFrame                   A   1 2 3                   B   100 300 500                   C  list  abc                   df iloc   0 -1    scaler fit transform df iloc   0 -1  to numpy    print df            A         B  C 0 -1 224745 -1 224745  a 1  0 000000  0 000000  b 2  1 224745  1 224745  c  Does Biased estimates of sklearn makes Machine Learning Less Powerful  NO  The official documentation of sklearn preprocessing scale states that using biased estimator is UNLIKELY to affect the performance of machine learning algorithms and we can safely use them  From official documentation   We use a biased estimator for the standard deviation  equivalent to numpy std x  ddof 0   Note that the choice of ddof is unlikely to affect model performance   What about MinMax Scaling  There is no Standard Deviation calculation in MinMax scaling  So the result is same in both pandas and scikit-learn  import pandas as pd df   pd DataFrame                   A   1 2 3                   B   100 300 500                    df - df min       df max   - df min         A    B 0  0 0  0 0 1  0 5  0 5 2  1 0  1 0     Using sklearn from sklearn preprocessing import MinMaxScaler  scaler   MinMaxScaler    arr scaled   scaler fit transform df    print arr scaled    0   0      0 5 0 5    1   1      df scaled   pd DataFrame arr scaled  columns df columns index df index  print df scaled       A    B 0  0 0  0 0 1  0 5  0 5 2  1 0  1 0

User · Answer

You can use the package sklearn and its associated preprocessing utilities to normalize the data   import pandas as pd from sklearn import preprocessing  x   df values  returns a numpy array min max scaler   preprocessing MinMaxScaler   x scaled   min max scaler fit transform x  df   pd DataFrame x scaled    For more information look at the scikit-learn documentation on preprocessing data  scaling features to a range

User · Answer

Simple is Beautiful   df  A     df  A     df  A   max   df  B     df  B     df  B   max   df  C     df  C     df  C   max

User · Answer

Your problem is actually a simple transform acting on the columns   def f s       return s s max    frame apply f  axis 0    Or even more terse      frame apply lambda x  x x max    axis 0

User · Answer

I think that a better way to do that in pandas is just  df   df df max   astype np float64    Edit If in your data frame negative numbers are present you should use instead  df   df df loc df abs   idxmax    astype np float64

User · Answer

Based on this post  https   stats stackexchange com questions 70801 how-to-normalize-data-to-0-1-range  You can do the following   def normalize df       result   df copy       for feature name in df columns          max value   df feature name  max           min value   df feature name  min           result feature name     df feature name  - min value     max value - min value      return result   You don t need to stay worrying about whether your values are negative or positive  And the values should be nicely spread out between 0 and 1

User · Answer

def normalize x       try          x   x np linalg norm x ord 1          return x     except           raise data   pd DataFrame apply data normalize    From the document of pandas DataFrame structure can apply an operation  function  to itself    DataFrame apply func  axis 0  broadcast False  raw False  reduce None  args       kwds       Applies function along input axis of DataFrame    Objects passed to functions are Series objects having index either the DataFrame   s index  axis 0  or the columns  axis 1   Return type depends on whether passed function aggregates  or the reduce argument if the DataFrame is empty    You can apply a custom function to operate the DataFrame

User · Answer

The solution given by Sandman and Praveen is very well  The only problem with that if you have categorical variables in other columns of your data frame this method will need some adjustments    My solution to this type of issue is following    from sklearn import preprocesing  x   pd concat  df Numerical1  df Numerical2 df Numerical3    min max scaler   preprocessing MinMaxScaler    x scaled   min max scaler fit transform x   x new   pd DataFrame x scaled   df   pd concat  df Categoricals x new

User · Answer

The following function calculates the Z score   def standardization dataset         Standardization of numeric fields  where all values will have mean of zero    and standard deviation of one   z-score     Args      dataset  A  Pandas Dataframe           dtypes   list zip dataset dtypes index  map str  dataset dtypes        Normalize numeric columns    for column  dtype in dtypes        if dtype     float32             dataset column  -  dataset column  mean             dataset column     dataset column  std     return dataset

User · Answer

If you like using the sklearn package  you can keep the column and index names by using pandas loc like so   from sklearn preprocessing import MinMaxScaler  scaler   MinMaxScaler    scaled values   scaler fit transform df   df loc        scaled values

User · Answer

It is only simple mathematics  The answer should as simple as below   normed df    df - df min       df max   - df min

User · Answer

Pandas does column wise normalization by default  Try the code below   X  pd read csv     data csv   X    X-X min     X max  -X min      The output values will be in range of 0 and 1

User · Answer

df normalized   df   df max axis 0

User · Answer

You can create a list of columns that you want to normalize  column names to normalize     A    E    G    sadasdsd    lol   x   df column names to normalize  values x scaled   min max scaler fit transform x  df temp   pd DataFrame x scaled  columns column names to normalize  index   df index  df column names to normalize    df temp   Your Pandas Dataframe is now normalized only at the columns you want    However  if you want the opposite  select a list of columns that you DON T want to normalize  you can simply create a list of all columns and remove that non desired ones  column names to not normalize     B    J    K   column names to normalize    x for x in list df  if x not in column names to not normalize

[python] Normalize columns of pandas data frame

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to normalize