pyspark.pandas.DataFrame.count#
- DataFrame.count(axis=None, numeric_only=False)#
Count non-NA cells for each column.
The values None, NaN are considered NA.
- Parameters
- axis: {0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
- numeric_only: bool, default False
If True, include only float, int, boolean columns. This parameter is mainly for pandas compatibility.
- Returns
- max: scalar for a Series, and a Series for a DataFrame.
See also
DataFrame.shape
Number of DataFrame rows and columns (including NA elements).
DataFrame.isna
Boolean same-sized DataFrame showing places of NA elements.
Examples
Constructing DataFrame from a dictionary:
>>> df = ps.DataFrame({"Person": ... ["John", "Myla", "Lewis", "John", "Myla"], ... "Age": [24., np.nan, 21., 33, 26], ... "Single": [False, True, True, True, False]}, ... columns=["Person", "Age", "Single"]) >>> df Person Age Single 0 John 24.0 False 1 Myla NaN True 2 Lewis 21.0 True 3 John 33.0 True 4 Myla 26.0 False
Notice the uncounted NA values:
>>> df.count() Person 5 Age 4 Single 5 dtype: int64
>>> df.count(axis=1) 0 3 1 2 2 3 3 3 4 3 dtype: int64
On a Series:
>>> df['Person'].count() 5
>>> df['Age'].count() 4