pandas best way to select all columns whose names start with X

Question

I have a DataFrame   import pandas as pd import numpy as np  df   pd DataFrame   foo aa    1  2 1  np nan  4 7  5 6  6 8                       foo fighters    0  1  np nan  0  0  0                       foo bars    0  0  0  0  0  1                       bar baz    5  5  6  5  5 6  6 8                       foo fox    2  4  1  0  0  5                       nas foo     NA   0  1  0  0  0                       foo manchu     NA   0  0  0  0  0       I want to select values of 1 in columns starting with foo   Is there a better way to do it other than   df2   df  df  foo aa      1    df  foo fighters      1    df  foo bars      1    df  foo fox      1    df  foo manchu      1      Something similar to writing something like   df2  df df STARTS WITH FOO    1    The answer should print out a DataFrame like this      bar baz  foo aa  foo bars  foo fighters  foo fox foo manchu nas foo 0      5 0     1 0         0             0        2         NA      NA 1      5 0     2 1         0             1        4          0       0 2      6 0     NaN         0           NaN        1          0       1 5      6 8     6 8         1             0        5          0       0   4 rows x 7 columns

User · Answer

Based on  EdChum s answer  you can try the following solution   df df columns pd Series df columns  str contains  foo       This will be really helpful in case not all the columns you want to select start with foo  This method selects all the columns that contain the substring foo and it could be placed in at any point of a column s name   In essence  I replaced  startswith   with  contains

User · Answer

You can try the regex here to filter out the columns starting with  foo   df filter regex   foo     If you need to have the string foo in your column then   df filter regex  foo     would be appropriate   For the next step  you can use   df df filter regex   foo    values  1   to filter out the rows where one of the values of  foo   column is 1

User · Answer

In my case I needed a list of prefixes colsToScale   quot production quot    quot test quot    quot development quot   dc dc columns dc columns str startswith tuple colsToScale

User · Answer

Another option for the selection of the desired entries is to use map   df loc  df    1  any axis 1   df columns map lambda x  x startswith  foo       which gives you all the columns for rows that contain a 1      foo aa  foo bars  foo fighters  foo fox foo manchu 0     1 0         0             0        2         NA 1     2 1         0             1        4          0 2     NaN         0           NaN        1          0 5     6 8         1             0        5          0   The row selection is done by    df    1  any axis 1    as in  ajcr s answer which gives you   0     True 1     True 2     True 3    False 4    False 5     True dtype  bool   meaning that row 3 and 4 do not contain a 1 and won t be selected   The selection of the columns is done using Boolean indexing like this   df columns map lambda x  x startswith  foo      In the example above this returns  array  False   True   True   True   True   True  False   dtype bool    So  if a column does not start with foo  False is returned and the column is therefore not selected   If you just want to return all rows that contain a 1 - as your desired output suggests - you can simply do  df loc  df    1  any axis 1     which returns     bar baz  foo aa  foo bars  foo fighters  foo fox foo manchu nas foo 0      5 0     1 0         0             0        2         NA      NA 1      5 0     2 1         0             1        4          0       0 2      6 0     NaN         0           NaN        1          0       1 5      6 8     6 8         1             0        5          0       0

User · Answer

The simplest way is to use str directly on column names  there is no need for pd Series  df loc   df columns str startswith  foo

User · Answer

My solution  It may be slower on performance   a   pd concat df df c     1  for c in df columns if c startswith  foo    a sort index        bar baz  foo aa  foo bars  foo fighters  foo fox foo manchu nas foo 0      5 0     1 0         0             0        2         NA      NA 1      5 0     2 1         0             1        4          0       0 2      6 0     NaN         0           NaN        1          0       1 5      6 8     6 8         1             0        5          0       0

User · Answer

Now that pandas  indexes support string operations  arguably the simplest and best way to select columns beginning with  foo  is just   df loc    df columns str startswith  foo        Alternatively  you can filter column  or row  labels with df filter    To specify a regular expression to match the names beginning with foo     gt  gt  gt  df filter regex r  foo     axis 1     foo aa  foo bars  foo fighters  foo fox foo manchu 0     1 0         0             0        2         NA 1     2 1         0             1        4          0 2     NaN         0           NaN        1          0 3     4 7         0             0        0          0 4     5 6         0             0        0          0 5     6 8         1             0        5          0   To select only the required rows  containing a 1  and the columns  you can use loc  selecting the columns using filter  or any other method  and the rows using any    gt  gt  gt  df loc  df    1  any axis 1   df filter regex r  foo     axis 1  columns     foo aa  foo bars  foo fighters  foo fox foo manchu 0     1 0         0             0        2         NA 1     2 1         0             1        4          0 2     NaN         0           NaN        1          0 5     6 8         1             0        5          0

User · Answer

Just perform a list comprehension to create your columns   In  28    filter col    col for col in df if col startswith  foo    filter col Out 28     foo aa    foo bars    foo fighters    foo fox    foo manchu   In  29    df filter col  Out 29      foo aa  foo bars  foo fighters  foo fox foo manchu 0     1 0         0             0        2         NA 1     2 1         0             1        4          0 2     NaN         0           NaN        1          0 3     4 7         0             0        0          0 4     5 6         0             0        0          0 5     6 8         1             0        5          0   Another method is to create a series from the columns and use the vectorised str method startswith   In  33    df df columns pd Series df columns  str startswith  foo     Out 33      foo aa  foo bars  foo fighters  foo fox foo manchu 0     1 0         0             0        2         NA 1     2 1         0             1        4          0 2     NaN         0           NaN        1          0 3     4 7         0             0        0          0 4     5 6         0             0        0          0 5     6 8         1             0        5          0   In order to achieve what you want you need to add the following to filter the values that don t meet your   1 criteria   In  36    df df df columns pd Series df columns  str startswith  foo      1  Out 36      bar baz  foo aa  foo bars  foo fighters  foo fox foo manchu nas foo 0      NaN       1       NaN           NaN      NaN        NaN     NaN 1      NaN     NaN       NaN             1      NaN        NaN     NaN 2      NaN     NaN       NaN           NaN        1        NaN     NaN 3      NaN     NaN       NaN           NaN      NaN        NaN     NaN 4      NaN     NaN       NaN           NaN      NaN        NaN     NaN 5      NaN     NaN         1           NaN      NaN        NaN     NaN   EDIT  OK after seeing what you want the convoluted answer is this   In  72    df loc df df df columns pd Series df columns  str startswith  foo        1  dropna how  all   axis 0  index  Out 72      bar baz  foo aa  foo bars  foo fighters  foo fox foo manchu nas foo 0      5 0     1 0         0             0        2         NA      NA 1      5 0     2 1         0             1        4          0       0 2      6 0     NaN         0           NaN        1          0       1 5      6 8     6 8         1             0        5          0       0

[python] pandas: best way to select all columns whose names start with X

Examples related to python

Examples related to pandas

Examples related to dataframe

Examples related to selection