pyspark.sql.functions.split¶
-
pyspark.sql.functions.
split
(str: ColumnOrName, pattern: str, limit: int = - 1) → pyspark.sql.column.Column[source]¶ Splits str around matches of the given pattern.
New in version 1.5.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- str
Column
or str a string expression to split
- patternstr
a string representing a regular expression. The regex string should be a Java regular expression.
- limitint, optional
an integer which controls the number of times pattern is applied.
limit > 0
: The resulting array’s length will not be more than limit, and theresulting array’s last entry will contain all input beyond the last matched pattern.
limit <= 0
: pattern will be applied as many times as possible, and the resultingarray can be of any size.
Changed in version 3.0: split now takes an optional limit field. If not provided, default limit value is -1.
- str
- Returns
Column
array of separated strings.
Examples
>>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',]) >>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect() [Row(s=['one', 'twoBthreeC'])] >>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect() [Row(s=['one', 'two', 'three', ''])]