How to get name of dataframe column in pyspark

Question

In pandas  this can be done by column name   But how to do the same when its column of spark dataframe   e g  The calling program has a spark dataframe  spark df   gt  gt  gt  spark df columns   admit    gre    gpa    rank     This program calls my function  my function spark df  rank    In my function  I need the name of the column i e   rank   If it was pandas dataframe  we can use inside my function   gt  gt  gt  pandas df  rank   name  rank

User · Answer

You can get the names from the schema by doing  spark df schema names   Printing the schema can be useful to visualize it as well  spark df printSchema

User · Answer

I found the answer is very very simple        It is in java  but it should be same in pyspark Column col   ds col  colName      the column object String theNameOftheCol   col toString      The variable  theNameOftheCol  is  colName

User · Answer

The only way is to go an underlying level to the JVM   df col  jc toString   encode  utf8     This is also how it is converted to a str in the pyspark code itself   From pyspark sql column py   def   repr   self       return  Column lt  s gt     self  jc toString   encode  utf8

User · Answer

If you want the column names of your dataframe  you can use the pyspark sql class  I m not sure if the SDK supports explicitly indexing a DF by column name  I received this traceback    gt  gt  gt  df columns  High   Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  TypeError  list indices must be integers  not str  However  calling the columns method on your dataframe  which you have done  will return a list of column names   df columns will return   Date    Open    High    Low    Close    Volume    Adj Close    If you want the column datatypes  you can call the dtypes method   df dtypes will return    Date    timestamp      Open    double      High    double      Low    double      Close    double      Volume    int      Adj Close    double     If you want a particular column  you ll need to access it by index   df columns 2  will return  High

User · Answer

Python As  numeral correctly said  column  jc toString   works fine in case of unaliased columns  In case of aliased columns  i e  column alias  quot whatever quot     the alias can be extracted  even without the usage of regular expressions  str column  split  quot  AS  quot   1  split  quot   quot   1    I don t know Scala syntax  but I m sure It can be done the same

[pyspark] How to get name of dataframe column in pyspark?

Examples related to pyspark

Examples related to pyspark-sql