pandas dataframe columns scaling with sklearn

Question

I have a pandas dataframe with mixed type columns  and I d like to apply sklearn s min max scaler to some of the columns   Ideally  I d like to do these transformations in place  but haven t figured out a way to do that yet   I ve written the following code that works   import pandas as pd import numpy as np from sklearn import preprocessing  scaler   preprocessing MinMaxScaler    dfTest   pd DataFrame   A   14 00 90 20 90 95 96 27 91 21   B   103 02 107 26 110 35 114 23 114 68    C    big   small   big   small   small     min max scaler   preprocessing MinMaxScaler    def scaleColumns df  cols to scale       for col in cols to scale          df col    pd DataFrame min max scaler fit transform pd DataFrame dfTest col    columns  col       return df  dfTest      A   B   C 0    14 00   103 02  big 1    90 20   107 26  small 2    90 95   110 35  big 3    96 27   114 23  small 4    91 21   114 68  small  scaled df   scaleColumns dfTest   A   B    scaled df  A   B   C 0    0 000000    0 000000    big 1    0 926219    0 363636    small 2    0 935335    0 628645    big 3    1 000000    0 961407    small 4    0 938495    1 000000    small   I m curious if this is the preferred most efficient way to do this transformation   Is there a way I could use df apply that would be better     I m also surprised I can t get the following code to work   bad output   min max scaler fit transform dfTest  A     If I pass an entire dataframe to the scaler it works   dfTest2   dfTest drop  C   axis   1  good output   min max scaler fit transform dfTest2  good output  I m confused why passing a series to the scaler fails   In my full working code above I had hoped to just pass a series to the scaler then set the dataframe column   to the scaled series   I ve seen this question asked a few other places  but haven t found a good answer   Any help understanding what s going on here would be greatly appreciated

User · Answer

Like this   dfTest   pd DataFrame               A   14 00 90 20 90 95 96 27 91 21               B   103 02 107 26 110 35 114 23 114 68                C    big   small   big   small   small               dfTest   A   B      dfTest   A   B    apply                             lambda x  MinMaxScaler   fit transform x   dfTest      A           B           C 0   0 000000    0 000000    big 1   0 926219    0 363636    small 2   0 935335    0 628645    big 3   1 000000    0 961407    small 4   0 938495    1 000000    small

User · Answer

You can do it using  pandas only   In  235   dfTest   pd DataFrame   A   14 00 90 20 90 95 96 27 91 21   B   103 02 107 26 110 35 114 23 114 68    C    big   small   big   small   small     df   dfTest   A    B    df norm    df - df min       df max   - df min    print df norm print pd concat  df norm  dfTest C  1             A         B 0  0 000000  0 000000 1  0 926219  0 363636 2  0 935335  0 628645 3  1 000000  0 961407 4  0 938495  1 000000           A         B      C 0  0 000000  0 000000    big 1  0 926219  0 363636  small 2  0 935335  0 628645    big 3  1 000000  0 961407  small 4  0 938495  1 000000  small

User · Answer

Tested for pandas 1 0 5  Based on  athlonshi answer  it had ValueError  could not convert string to float   big   on C column   full working example without warning  import pandas as pd from sklearn preprocessing import MinMaxScaler scale   preprocessing MinMaxScaler    df   pd DataFrame               A   14 00 90 20 90 95 96 27 91 21               B   103 02 107 26 110 35 114 23 114 68                C    big   small   big   small   small               print df  df   quot A quot   quot B quot      pd DataFrame scale fit transform df   quot A quot   quot B quot    values   columns   quot A quot   quot B quot    index df index  print df          A       B      C 0  14 00  103 02    big 1  90 20  107 26  small 2  90 95  110 35    big 3  96 27  114 23  small 4  91 21  114 68  small           A         B      C 0  0 000000  0 000000    big 1  0 926219  0 363636  small 2  0 935335  0 628645    big 3  1 000000  0 961407  small 4  0 938495  1 000000  small

User · Answer

As it is being mentioned in pir s comment - the  apply lambda el  scale fit transform el   method will produce the following warning      DeprecationWarning  Passing 1d arrays as data is deprecated in 0 17   and will raise ValueError in 0 19  Reshape your data either using   X reshape -1  1  if your data has a single feature or X reshape 1  -1    if it contains a single sample    Converting your columns to numpy arrays should do the job  I prefer StandardScaler    from sklearn preprocessing import StandardScaler scale   StandardScaler    dfTest   A   B   C      scale fit transform dfTest   A   B   C    as matrix      -- Edit Nov 2018  Tested for pandas 0 23 4 --  As Rob Murray mentions in the comments  in the current  v0 23 4  version of pandas  as matrix   returns FutureWarning  Therefore  it should be replaced by  values   from sklearn preprocessing import StandardScaler scaler   StandardScaler    scaler fit transform dfTest   A   B    values    -- Edit May 2019  Tested for pandas 0 24 2 --  As joelostblom mentions in the comments   Since 0 24 0  it is recommended to use  to numpy   instead of  values    Updated example   import pandas as pd from sklearn preprocessing import StandardScaler scaler   StandardScaler   dfTest   pd DataFrame                   A   14 00 90 20 90 95 96 27 91 21                   B   103 02 107 26 110 35 114 23 114 68                   C    big   small   big   small   small                   dfTest   A    B      scaler fit transform dfTest   A   B    to numpy    dfTest       A         B      C 0 -1 995290 -1 571117    big 1  0 436356 -0 603995  small 2  0 460289  0 100818    big 3  0 630058  0 985826  small 4  0 468586  1 088469  small

User · Answer

df   pd DataFrame scale fit transform df values   columns df columns  index df index    This should work without depreciation warnings

User · Answer

I am not sure if previous versions of pandas prevented this but now the following snippet works perfectly for me and produces exactly what you want without having to use apply   gt  gt  gt  import pandas as pd  gt  gt  gt  from sklearn preprocessing import MinMaxScaler    gt  gt  gt  scaler   MinMaxScaler     gt  gt  gt  dfTest   pd DataFrame   A   14 00 90 20 90 95 96 27 91 21                               B   103 02 107 26 110 35 114 23 114 68                               C    big   small   big   small   small       gt  gt  gt  dfTest   A    B      scaler fit transform dfTest   A    B       gt  gt  gt  dfTest           A         B      C 0  0 000000  0 000000    big 1  0 926219  0 363636  small 2  0 935335  0 628645    big 3  1 000000  0 961407  small 4  0 938495  1 000000  small

User · Answer

I know it s a very old comment  but still   Instead of using single bracket  dfTest  A     use double brackets  dfTest   A       i e  min max scaler fit transform dfTest   A       I believe this will give the desired result

[python] pandas dataframe columns scaling with sklearn

Examples related to python

Examples related to pandas

Examples related to scikit-learn

Examples related to dataframe