Window framing

For non-ranking aggregate OLAP functions, you can define a window frame with a window frame clause, which specifies the beginning and end of the window relative to the current row.

<WINDOW FRAME CLAUSE> ::=
  <WINDOW FRAME UNIT>
  <WINDOW FRAME EXTENT>

This OLAP function is computed with respect to the contents of a moving frame rather than the fixed contents of the whole partition. Depending on its definition, the partition has a start row and an end row, and the window frame slides from the starting point to the end of the partition.

Figure 4-3: Three-row moving window with partitioned input

shows current row as part of sliding window grouped by partition A, current and C.

UNBOUNDED PRECEEDING and FOLLOWING

Window frames can be defined by an unbounded aggregation group that either extends back to the beginning of the partition (UNBOUNDED PRECEDING) or extends to the end of the partition (UNBOUNDED FOLLOWING), or both.

UNBOUNDED PRECEDING includes all rows within the partition preceding the current row, which can be specified with either ROWS or RANGE. UNBOUNDED FOLLOWING includes all rows within the partition following the current row, which can be specified with either ROWS or RANGE. See “ROWS” and “RANGE”.

The value FOLLOWING specifies either the range or number of rows following the current row. If ROWS is specified, then the value is a positive integer indicating a number of rows. If RANGE is specified, the window includes any rows that are less than the current row plus the specified numeric value. For the RANGE case, the data type of the windowed value must be comparable to the type of the sort key expression of the ORDER BY clause. There can be only one sort key expression, and the data type of the sort key expression must allow addition.

The value PREDCEEDING specifies either the range or number of rows preceding the current row. If ROWS is specified, then the value is a positive integer indicating a number of rows. If RANGE is specified, the window includes any rows that are less than the current row minus the specified numeric value. For the RANGE case, the data type of the windowed value must be comparable to the type of the sort key expression of the ORDER BY clause. There can be only one sort key expression, and the data type of the sort key expression must allow subtraction. This clause cannot be specified in second bound group if the first bound group is CURRENT ROW or value FOLLOWING.

The combination BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING provides an aggregate over an entire partition, without the need to construct a join to a grouped query. An aggregate over an entire partition is also known as a reporting aggregate.

CURRENT ROW concept

In physical aggregation groups, rows are included or excluded based on their position relative to the current row, by counting adjacent rows. The current row is simply a reference to the next row in a query’s intermediate results. As the current row advances, the window is reevaluated based on the new set of rows that lie within the window. There is no requirement that the current row be included in a window.

If a window frame clause is not specified, the default window frame depends on whether or not a window order clause is specified:

You can also define a window by specifying a window frame unit that is row-based (rows specification) or value-based (range specification).

<WINDOW FRAME UNIT> ::= ROWS | RANGE
<WINDOW FRAME EXTENT> ::= <WINDOW FRAME START> | <WINDOW FRAME BETWEEN>

When a window frame extent specifies BETWEEN, it explicitly provides the beginning and end of a window frame.

If the window frame extent specifies only one of these two values then the other value defaults to CURRENT ROW.


ROWS

The window frame unit, ROWS, defines a window in the specified number of rows before or after the current row, which serves as the reference point that determines the start and end of a window. Each analytical calculation is based on the current row within a partition. To produce determinative results for a window expressed in rows, the ordering expression should be unique.

The reference point for all window frames is the current row. The SQL/OLAP syntax provides mechanisms for defining a row-based window frame as any number of rows preceding or following the current row or preceding and following the current row.

The following list illustrates common examples of a window frame unit:

Row-based window frames In the example in Figure 4-4, rows [1] through [5] represent a partition; each row becomes the current row as the OLAP window frame slides forward. The frame is defined as Between Current Row And 2 Following, so each frame includes a maximum of three rows and a minimum of one row. When the frame reaches the end of the partition, only the current row is included. The shaded areas indicate which rows are excluded from the frame at each step in Figure 4-4.

Figure 4-4: Row-based window frames

Shows current row moving in row-based window frame

The window frame in Figure 4-4 imposes the following rules:

The following diagram applies these rules to a specific set of values, showing the OLAP AVG function that would be calculated for each row. The sliding calculations produce a moving average with an interval of three rows or fewer, depending on which row is the current row:

Shows how sliding calculations produce a moving average with an interval of three rows or fewer, depending on which row is the current row

The following example demonstrates a sliding window:

SELECT dimension, measure,
  AVG(measure) OVER(partition BY dimension
    ORDER BY measure 
    ROWS BETWEEN CURRENT ROW and 2 FOLLOWING) 
    AS olap_avg
FROM ...

The averages are computed as follows:

Similar calculations would be computed for all subsequent partitions in the result set (such as, B, C, and so on).

If there are no rows in the current window, the result is NULL, except for COUNT.


RANGE

Range-based window frames The previous example, Row-based window frames, demonstrates one among many row-based window frame definitions. The SQL/OLAP syntax also supports another kind of window frame whose limits are defined in terms of a value-based—or range-based—set of rows, rather than a specific sequence of rows.

Value-based window frames define rows within a window partition that contain a specific range of numeric values. The OLAP function’s ORDER BY clause defines the numeric column to which the range specification is applied, relative to the current row’s value for that column. The range specification uses the same syntax as the rows specification, but the syntax is interpreted in a different way.

The window frame unit, RANGE, defines a window frame whose contents are determined by finding rows in which the ordering column has values within the specified range of value relative to the current row. This is called a logical offset of a window frame, which you can specify with constants, such as “3 preceding,” or any expression that can be evaluated to a numeric constant. When using a window defined with RANGE, there can be only a single numeric expression in the ORDER BY clause.

For example, a frame could be defined as the set of rows with year values some number of years preceding or following the current row’s year:

ORDER BY year ASC range BETWEEN CURRENT ROW and 1 PRECEDING

In the above example query, 1 preceding means the current row’s year value minus 1.

This kind of range specification is inclusive. If the current row’s year value is 2000, all rows in the window partition with year values 2000 and 1999 qualify for the frame, regardless of the physical position of those rows in the partition. The rules for including and excluding value-based rows are quite different from the rules applied to row-based frames, which depend entirely on the physical sequence of rows.

Put in the context of an OLAP AVG() calculation, the following partial result set further demonstrates the concept of a value-based window frame. Again, the frame consists of rows that:

Shows range based window frame with row, dimension, year, measure and OLAP average

The following query demonstrates a range-based window definition:

SELECT dimension, year, measure,
  AVG(measure) OVER(PARTITION BY dimension
    ORDER BY year ASC 
    range BETWEEN CURRENT ROW and 1 PRECEDING) 
    as olap_avg
FROM ...

The averages are computed as follows:

Ascending and descending order for value-based frames The ORDER BY clause for an OLAP function with a value-based window frame not only identifies the numeric column on which the range specification is based; it also declares the sort order for the ORDER BY values. The following specification is subject to the sort order that precedes it (ASC or DESC):

RANGE BETWEEN CURRENT ROW AND n FOLLOWING

The specification n FOLLOWING means:

For example, assume that the year column contains four distinct values, from 1999 to 2002. The following table shows the default ascending order of these values on the left and the descending order on the right:

Shows two columns: order by year ascending (1999 to 2002) and order by year descending (2002 to 1999)

If the current row is 1999 and the frame is specified as follows, rows that contain the values 1999 and 1998 (which doesn’t exist in the table) are included in the frame:

ORDER BY year ASC range BETWEEN CURRENT ROW and 1 FOLLOWING

NoteThe sort order of the ORDER BY values is a critical part of the test for qualifying rows in a value-based frame; the numeric values alone do not determine exclusion or inclusion.

Using an unbounded window The following query produces a result set consisting of all of the products accompanied by the total quantity of all products:

SELECT id, description, quantity,
  SUM(quantity) OVER () AS total
FROM product;

Computing deltas between adjacent rows Using two windows—one over the current row and the other over the previous row—provides a direct way of computing deltas, or changes, between adjacent rows. See the following query example and results.

SELECT emp_id, emp_lname, SUM(salary) OVER (ORDER BY
  birth_date rows between current row and current row)
  AS curr, SUM(salary) OVER (ORDER BY birth_date rows
  between 1 preceding and 1 preceding) AS prev, (curr
  -prev) as delta 
FROM employee WHERE state IN ('MA', 'AZ') AND dept_id
  =100 
ORDER BY emp_id, emp_lname;

The following are the results from the query:

emp_id  emp_lname    curr       prev        delta
------  ---------    ---------  ----------  ----------
 102    Whitney      45700.000  64500.000   -18800.000
 105    Cobb         62000.000  68400.000    -6400.000
 160    Breault      57490.000  96300.000   -38810.000
 243    Shishov      72995.000  59840.000    13155.000
 247    Driscoll     48023.690  87900.000   -39876.310
 249    Guevara      42998.000  48023.690    -5025.690
 266    Gowda        59840.000  57490.000     2350.000
 278    Melkisetian  48500.000  74500.000   -26000.000
 316    Pastor       74500.000  62000.000    12500.000
 445    Lull         87900.000  67890.000    20010.000
 453    Rabkin       64500.000  42998.000    21502.000
 479    Siperstein   39875.500  42500.000    -2624.500
 501    Scott        96300.000  54900.000    41400.000
 529    Sullivan     67890.000  72995.000    -5105.000
 582    Samuels      37400.000  39875.500    -2475.500
 604    Wang         68400.000  45700.000    22700.000
 839    Marshall     42500.000  48500.000    -6000.000
1157    Soo          39075.000  37400.000     1675.000
1250    Diaz         54900.000

Although the window function SUM() is used, the sum contains only the salary value of either the current or previous row because of how the window is specified. Also, the prev value of the first row in the result is NULL because it has no predecessor; therefore, the delta is NULL as well.

In each of the examples above, the function used with the OVER() clause is the SUM() aggregate function.