Mastering DataFrame – 3 tips for DataFrame.rolling function

Programming Image

pandas.DataFrame.rolling is useful function to use a window function to wrap a certain amount of rows. The rolling function itself does not calculate anything but returns a object of window.Rolling so that we can pass the rolling object to other functions to compute values. I showed an example how to calculate the daily percent change, the standard deviation and correlations of financial assets with the rolling function in the previous post. In this article I’d like to delve a little further into pandas.DataFrame.rolling.

Please note that pandas.Series also has a similar rolling function.

  • How to control NaN with the parameters center and min_periods
  • How to change the weight of window with win_type
  • How to apply your function for a window object

A window size is the number of observations used for calculating. A specified window will be a variable sized based on the observations included in the last row. This is the basic use of the rolling function. Let’s take a look for this example summarizing consecutive numbers in DataFrame at first.

Please note that the valid window appeared at index [2] with the window variable 3. The first window took the values of index [0] [1] [2] then sum of them was passed to [2]. The sum of [0] [1] [2] is 3. The index [0] and [1] of df.rolling(3).sum() are NaN. We have dropna() function to drop missing values such as NaN or fillna() function to replace NaN value.

How to control NaN with the parameters center and min_periods

We can change the window behavior by changing the parameters center and min_periods as following.

center (bool, default False)
Set the labels at the center of the window.

The result of the first window [0] [1] [2] is set at the index [1] in the previous example. If you have center true with the window size 3, the window’s center [1] is used for the label. If you have even numbers in window size, the latter place from the medium will be taken for the label. For example, the index [2] will be used for the window [0] [1] [2] [3] with center true.

As a result the values of index [0] and [9] are NaN for the same DataFrame.

min_periods (int, default None)
Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window.

If you set a window size, min_periods parameter is set to the same value in default. In the previous example, 3 is set to min_periods. Let’s change it to 1 with the window size 3. Each window does not take insufficient numbers, for example in case if there is no values, but take the rows more than min_periods. Setting min_periods to 1 might be more flexible instead of using fillna() function to some degree.

How to change the weight of window with win_type

All points in a window are treated evenly if the parameter win_type is set to None (Default is None). In other words, if you change this paramter the behavior of treating data points will differ in DataFrame.

There seems no detailed explanations for each win_type. But the names of win_type must be derived from the type effect as below.

How to apply your function for a window object

The last tip is about a function we pass to a window.Rolling object. There are a lot of built-in functions we can call on window.Rolling such as min, max, std, sum, etc. There is also the apply and aggregate function we can call a defined function for windows.

window.Rolling

apply and aggregate functions are calable on window.Rolling object and you can pass a defined function or labmda as below. aggregate funtcion is useful when you want to apply multiple operations for window.rolling objects. In this sample, new columns sum, min and lambda (sum – min) are added.

Another useful parameter is axis. This is set to 0 in default for processing axis and if you change it to 1, a window will be taken for rows not columns.

Leave a Reply

Your email address will not be published. Required fields are marked *