pyspark.pandas.DataFrame.count#

DataFrame.count(axis=None, numeric_only=False)#

Count non-NA cells for each column.

The values None, NaN are considered NA.

Parameters

axis: {0 or ‘index’, 1 or ‘columns’}, default 0: If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.
numeric_only: bool, default False: If True, include only float, int, boolean columns. This parameter is mainly for pandas compatibility.

Returns

max: scalar for a Series, and a Series for a DataFrame.

See also

DataFrame.shape: Number of DataFrame rows and columns (including NA elements).
DataFrame.isna: Boolean same-sized DataFrame showing places of NA elements.

Examples

Constructing DataFrame from a dictionary:

>>> df = ps.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]},
...                   columns=["Person", "Age", "Single"])
>>> df
  Person   Age  Single
0   John  24.0   False
1   Myla   NaN    True
2  Lewis  21.0    True
3   John  33.0    True
4   Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

>>> df.count(axis=1)
  3
  2
  3
  3
  3
dtype: int64

On a Series:

>>> df['Person'].count()
5

>>> df['Age'].count()
4