For non-ranking aggregate OLAP functions, you can define a window frame with a window frame clause, which specifies the beginning and end of the window relative to the current row.
<WINDOW FRAME CLAUSE> ::= <WINDOW FRAME UNIT> <WINDOW FRAME EXTENT>
This OLAP function is computed with respect to the contents of a moving frame rather than the fixed contents of the whole partition. Depending on its definition, the partition has a start row and an end row, and the window frame slides from the starting point to the end of the partition.
Figure 4-3: Three-row moving window with partitioned input
Window frames can be defined by an unbounded aggregation group that either extends back to the beginning of the partition (UNBOUNDED PRECEDING) or extends to the end of the partition (UNBOUNDED FOLLOWING), or both.
UNBOUNDED PRECEDING includes all rows within the partition preceding the current row, which can be specified with either ROWS or RANGE. UNBOUNDED FOLLOWING includes all rows within the partition following the current row, which can be specified with either ROWS or RANGE. See “ROWS” and “RANGE”.
The value FOLLOWING specifies either the range or number of rows following the current row. If ROWS is specified, then the value is a positive integer indicating a number of rows. If RANGE is specified, the window includes any rows that are less than the current row plus the specified numeric value. For the RANGE case, the data type of the windowed value must be comparable to the type of the sort key expression of the ORDER BY clause. There can be only one sort key expression, and the data type of the sort key expression must allow addition.
The value PREDCEEDING specifies either the range or number of rows preceding the current row. If ROWS is specified, then the value is a positive integer indicating a number of rows. If RANGE is specified, the window includes any rows that are less than the current row minus the specified numeric value. For the RANGE case, the data type of the windowed value must be comparable to the type of the sort key expression of the ORDER BY clause. There can be only one sort key expression, and the data type of the sort key expression must allow subtraction. This clause cannot be specified in second bound group if the first bound group is CURRENT ROW or value FOLLOWING.
The combination BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING provides an aggregate over an entire partition, without the need to construct a join to a grouped query. An aggregate over an entire partition is also known as a reporting aggregate.
In physical aggregation groups, rows are included or excluded based on their position relative to the current row, by counting adjacent rows. The current row is simply a reference to the next row in a query’s intermediate results. As the current row advances, the window is reevaluated based on the new set of rows that lie within the window. There is no requirement that the current row be included in a window.
If a window frame clause is not specified, the default window frame depends on whether or not a window order clause is specified:
If the window specification contains a window order clause, the window’s start point is UNBOUNDED PRECEDING, and the end point is CURRENT ROW, thus defining a varying-size window suitable for computing cumulative values.
If the window specification does not contain a window order clause, the window’s start point is UNBOUNDED PRECEDING, and the end point is UNBOUNDED FOLLOWING, thus defining a window of fixed size, regardless of the current row.
A window frame clause cannot be used with a ranking function.
You can also define a window by specifying a window frame unit that is row-based (rows specification) or value-based (range specification).
<WINDOW FRAME UNIT> ::= ROWS | RANGE
<WINDOW FRAME EXTENT> ::= <WINDOW FRAME START> | <WINDOW FRAME BETWEEN>
When a window frame extent specifies BETWEEN, it explicitly provides the beginning and end of a window frame.
If the window frame extent specifies only one of these two values then the other value defaults to CURRENT ROW.
The window frame unit, ROWS, defines a window in the specified number of rows before or after the current row, which serves as the reference point that determines the start and end of a window. Each analytical calculation is based on the current row within a partition. To produce determinative results for a window expressed in rows, the ordering expression should be unique.
The reference point for all window frames is the current row. The SQL/OLAP syntax provides mechanisms for defining a row-based window frame as any number of rows preceding or following the current row or preceding and following the current row.
The following list illustrates common examples of a window frame unit:
Rows Between Unbounded Preceding and Current Row – specifies a window whose start point is the beginning of each partition and the end point is the current row and is often used to construct windows that compute cumulative results, such as cumulative sums.
Rows Between Unbounded Preceding and Unbounded Following – specifies a fixed window, regardless of the current row, over the entire partition. The value of a window aggregate function is, therefore, identical in each row of the partition.
Rows Between 1 Preceding and 1 Following – specifies a fixed-sized moving window over three adjacent rows, one each before and after the current row. You can use this window frame unit to compute, for example, a 3-day or 3-month moving average. See Figure 4-3.
Be aware of non-meaningful results that could be generated using ROWS due to gaps in the windowed values. If the set of values is not continuous, consider using RANGE instead of ROWS, because a window definition based on RANGE automatically handle adjacent rows with duplicate values and will not include other rows when there are gaps in the range.
In the case of a moving window, it is assumed that rows containing NULL values exist before the first row, and after the last row, in the input. What this means is that with a 3-row moving window, the computation for the last row in the input—the current row—will include the immediately preceding row and a NULL value.
Rows Between Current Row and Current Row – restricts the window to the current row only.
Rows Between 1 Preceding and 1 Preceding – specifies a single row window consisting only of the preceding row, with respect to the current row. In combination with another window function that computes a value based on the current row only, this construction makes it possible to easily compute deltas, or differences in value, between adjacent rows. See “Computing deltas between adjacent rows”.
Row-based window frames In the example in Figure 4-4, rows [1] through [5] represent a partition; each row becomes the current row as the OLAP window frame slides forward. The frame is defined as Between Current Row And 2 Following, so each frame includes a maximum of three rows and a minimum of one row. When the frame reaches the end of the partition, only the current row is included. The shaded areas indicate which rows are excluded from the frame at each step in Figure 4-4.
Figure 4-4: Row-based window frames
The window frame in Figure 4-4 imposes the following rules:
When row [1] is the current row, rows [4] and [5] are excluded.
When row [2] is the current row, rows [5] and [1] are excluded.
When row [3] is the current row, rows [1] and [2] are excluded.
When row [4] is the current row, rows [1], [2], and [3] are excluded.
When row [5] is the current row, rows [1], [2], [3], and [4] are excluded.
The following diagram applies these rules to a specific set of values, showing the OLAP AVG function that would be calculated for each row. The sliding calculations produce a moving average with an interval of three rows or fewer, depending on which row is the current row:
The following example demonstrates a sliding window:
SELECT dimension, measure, AVG(measure) OVER(partition BY dimension ORDER BY measure ROWS BETWEEN CURRENT ROW and 2 FOLLOWING) AS olap_avg FROM ...
The averages are computed as follows:
Row [1] = (10 + 50 + 100)/3
Row [2] = (50+ 100 + 120)/3
Row [3] = (100 + 120 + 500)/3
Row [4] = (120 + 500 + NULL)/3
Row [5] = (500 + NULL + NULL)/3
Similar calculations would be computed for all subsequent partitions in the result set (such as, B, C, and so on).
If there are no rows in the current window, the result is NULL, except for COUNT.
Range-based window frames The previous example, Row-based window frames, demonstrates one among many row-based window frame definitions. The SQL/OLAP syntax also supports another kind of window frame whose limits are defined in terms of a value-based—or range-based—set of rows, rather than a specific sequence of rows.
Value-based window frames define rows within a window partition that contain a specific range of numeric values. The OLAP function’s ORDER BY clause defines the numeric column to which the range specification is applied, relative to the current row’s value for that column. The range specification uses the same syntax as the rows specification, but the syntax is interpreted in a different way.
The window frame unit, RANGE, defines a window frame whose contents are determined by finding rows in which the ordering column has values within the specified range of value relative to the current row. This is called a logical offset of a window frame, which you can specify with constants, such as “3 preceding,” or any expression that can be evaluated to a numeric constant. When using a window defined with RANGE, there can be only a single numeric expression in the ORDER BY clause.
For example, a frame could be defined as the set of rows with year values some number of years preceding or following the current row’s year:
ORDER BY year ASC range BETWEEN CURRENT ROW and 1 PRECEDING
In the above example query, 1 preceding means the current row’s year value minus 1.
This kind of range specification is inclusive. If the current row’s year value is 2000, all rows in the window partition with year values 2000 and 1999 qualify for the frame, regardless of the physical position of those rows in the partition. The rules for including and excluding value-based rows are quite different from the rules applied to row-based frames, which depend entirely on the physical sequence of rows.
Put in the context of an OLAP AVG() calculation, the following partial result set further demonstrates the concept of a value-based window frame. Again, the frame consists of rows that:
Have the same year as the current row
Have the same year as the current row minus 1
The following query demonstrates a range-based window definition:
SELECT dimension, year, measure, AVG(measure) OVER(PARTITION BY dimension ORDER BY year ASC range BETWEEN CURRENT ROW and 1 PRECEDING) as olap_avg FROM ...
The averages are computed as follows:
Row [1] = 1999; rows [2] through [5] are excluded; AVG = 10,000/1
Row [2] = 2001; rows [1], [4], and [5] are excluded; AVG = 6,000/2
Row [3] = 2001; rows [1], [4], and [5] are excluded; AVG = 6,000/2
Row [4] = 2002; row [1] is excluded; AVG = 21,000/4
Row [5] = 2002; row [1] is excluded; AVG = 21,000/4
Ascending and descending order for value-based frames The ORDER BY clause for an OLAP function with a value-based window frame not only identifies the numeric column on which the range specification is based; it also declares the sort order for the ORDER BY values. The following specification is subject to the sort order that precedes it (ASC or DESC):
RANGE BETWEEN CURRENT ROW AND n FOLLOWING
The specification n FOLLOWING means:
Plus n if the partition is sorted in default ascending order (ASC)
Minus n if the partition is sorted in descending order (DESC)
For example, assume that the year column contains four distinct values, from 1999 to 2002. The following table shows the default ascending order of these values on the left and the descending order on the right:
If the current row is 1999 and the frame is specified as follows, rows that contain the values 1999 and 1998 (which doesn’t exist in the table) are included in the frame:
ORDER BY year ASC range BETWEEN CURRENT ROW and 1 FOLLOWING
The sort order of the ORDER BY values is a critical part of the test for qualifying rows in a value-based frame; the numeric values alone do not determine exclusion or inclusion.
Using an unbounded window The following query produces a result set consisting of all of the products accompanied by the total quantity of all products:
SELECT id, description, quantity, SUM(quantity) OVER () AS total FROM product;
Computing deltas between adjacent rows Using two windows—one over the current row and the other over the previous row—provides a direct way of computing deltas, or changes, between adjacent rows. See the following query example and results.
SELECT emp_id, emp_lname, SUM(salary) OVER (ORDER BY birth_date rows between current row and current row) AS curr, SUM(salary) OVER (ORDER BY birth_date rows between 1 preceding and 1 preceding) AS prev, (curr -prev) as delta FROM employee WHERE state IN ('MA', 'AZ') AND dept_id =100 ORDER BY emp_id, emp_lname;
The following are the results from the query:
emp_id emp_lname curr prev delta ------ --------- --------- ---------- ---------- 102 Whitney 45700.000 64500.000 -18800.000 105 Cobb 62000.000 68400.000 -6400.000 160 Breault 57490.000 96300.000 -38810.000 243 Shishov 72995.000 59840.000 13155.000 247 Driscoll 48023.690 87900.000 -39876.310 249 Guevara 42998.000 48023.690 -5025.690 266 Gowda 59840.000 57490.000 2350.000 278 Melkisetian 48500.000 74500.000 -26000.000 316 Pastor 74500.000 62000.000 12500.000 445 Lull 87900.000 67890.000 20010.000 453 Rabkin 64500.000 42998.000 21502.000 479 Siperstein 39875.500 42500.000 -2624.500 501 Scott 96300.000 54900.000 41400.000 529 Sullivan 67890.000 72995.000 -5105.000 582 Samuels 37400.000 39875.500 -2475.500 604 Wang 68400.000 45700.000 22700.000 839 Marshall 42500.000 48500.000 -6000.000 1157 Soo 39075.000 37400.000 1675.000 1250 Diaz 54900.000
Although the window function SUM() is used, the sum contains only the salary value of either the current or previous row because of how the window is specified. Also, the prev value of the first row in the result is NULL because it has no predecessor; therefore, the delta is NULL as well.
In each of the examples above, the function used with the OVER() clause is the SUM() aggregate function.