pyspark.sql.DataFrame.select

DataFrame.select(*cols)[source]

Projects a set of expressions and returns a new DataFrame.

New in version 1.3.0.

Parameters:
colsstr, Column, or list

column names (string) or expressions (Column). If one of the column names is ‘*’, that column is expanded to include all columns in the current DataFrame.

Examples

>>> df.select('*').collect()
[Row(age=2, name='Alice'), Row(age=5, name='Bob')]
>>> df.select('name', 'age').collect()
[Row(name='Alice', age=2), Row(name='Bob', age=5)]
>>> df.select(df.name, (df.age + 10).alias('age')).collect()
[Row(name='Alice', age=12), Row(name='Bob', age=15)]