pyspark.pandas.Series.replace#

Series.replace(to_replace=None, value=None, regex=False)[source]#

Replace values given in to_replace with value. Values of the Series are replaced with other values dynamically.

Note

For partial pattern matching, the replacement is against the whole string, which is different from pandas. That’s by the nature of underlying Spark API.

Parameters
to_replacestr, list, tuple, dict, Series, int, float, or None

How to find the values that will be replaced. * numeric, str:

  • numeric: numeric values equal to to_replace will be replaced with value

  • str: string exactly matching to_replace will be replaced with value

  • list of str or numeric:

    • if to_replace and value are both lists or tuples, they must be the same length.

    • str and numeric rules apply as above.

  • dict:

    • Dicts can be used to specify different replacement values for different existing values. For example, {‘a’: ‘b’, ‘y’: ‘z’} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.

    • For a DataFrame a dict can specify that different values should be replaced in different columns. For example, {‘a’: 1, ‘b’: ‘z’} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.

See the examples section for examples of each of these.

valuescalar, dict, list, tuple, str default None

Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

regex: bool or str, default False

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression in which case to_replace must be None.

Returns
Series

Object after replacement.

Examples

Scalar to_replace and value

>>> s = ps.Series([0, 1, 2, 3, 4])
>>> s
0    0
1    1
2    2
3    3
4    4
dtype: int64
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64

List-like to_replace

>>> s.replace([0, 4], 5000)
0    5000
1       1
2       2
3       3
4    5000
dtype: int64
>>> s.replace([1, 2, 3], [10, 20, 30])
0     0
1    10
2    20
3    30
4     4
dtype: int64

Dict-like to_replace

>>> s.replace({1: 1000, 2: 2000, 3: 3000, 4: 4000})
0       0
1    1000
2    2000
3    3000
4    4000
dtype: int64

Also support for MultiIndex

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = ps.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
...               index=midx)
>>> s
lama    speed      45.0
        weight    200.0
        length      1.2
cow     speed      30.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64
>>> s.replace(45, 450)
lama    speed     450.0
        weight    200.0
        length      1.2
cow     speed      30.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64
>>> s.replace([45, 30, 320], 500)
lama    speed     500.0
        weight    200.0
        length      1.2
cow     speed     500.0
        weight    250.0
        length      1.5
falcon  speed     500.0
        weight      1.0
        length      0.3
dtype: float64
>>> s.replace({45: 450, 30: 300})
lama    speed     450.0
        weight    200.0
        length      1.2
cow     speed     300.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64

Regular expression to_replace

>>> psser = ps.Series(['bat', 'foo', 'bait', 'abc', 'bar', 'zoo'])
>>> psser.replace(to_replace=r'^ba.$', value='new', regex=True)
0     new
1     foo
2    bait
3     abc
4     new
5     zoo
dtype: object
>>> psser.replace(value='new', regex=r'^.oo$')
0     bat
1     new
2    bait
3     abc
4     bar
5     new
dtype: object

For partial pattern matching, the replacement is against the whole string

>>> psser.replace('ba', 'xx', regex=True)
0     xx
1    foo
2     xx
3    abc
4     xx
5    zoo
dtype: object