Group by CUBE

The CUBE operator in the GROUP BY clause analyzes data by forming the data into groups in more than one dimension (grouping expression). CUBE requires an ordered list of dimensions as arguments and enables the SELECT statement to calculate subtotals for all possible combinations of the group of dimensions that you specify in the query and generates a result set that shows aggregates for all combinations of values in selected columns.

CUBE syntax:

SELECT … [ GROUPING (column-name) … ] …
GROUP BY [ expression [,…]
| CUBE ( expression [,…] ) ]

GROUPING takes a column name as a parameter and returns a Boolean value as listed in Table 4-2.

**Table 4-2: Values returned by GROUPING with the CUBE operator**
If the value of the result is	GROUPING returns
NULL created by a CUBE operation	1 (TRUE)
NULL indicating the row is a subtotal	1 (TRUE)
Not created by a CUBE operation	0 (FALSE)
A stored NULL	0 (FALSE)

CUBE is particularly useful when your dimensions are not a part of the same hierarchy.

This SQL syntax...	Defines the following sets...
`GROUP BY CUBE (A, B, C);`	(A, B, C) (A, B) (A, C) (A) (B, C) (B) (C) ( )

This SQL syntax...

Defines the following sets...

GROUP BY CUBE (A, B, C);

(A, B, C)

(A, B)

(A, C)

(A)

(B, C)

(B)

(C)

( )

Restrictions on the CUBE operator are:

The CUBE operator supports all of the aggregate functions available to the GROUP BY clause, but CUBE is currently not supported with COUNT DISTINCT or SUM DISTINCT.
CUBE is currently not supported with the inverse distribution analytical functions, PERCENTILE_CONT and PERCENTILE_DISC.
CUBE can only be used in the SELECT statement; you cannot use CUBE in a SELECT subquery.
A GROUPING specification that combines ROLLUP, CUBE, and GROUP BY columns in the same GROUP BY clause is not currently supported.
Constant expressions as GROUP BY keys are not supported.

Performance of CUBE will diminish if the size of the cube exceeds the size of the temp cache.

GROUPING can be used with the CUBE operator to distinguish between stored NULL values and “NULL” values in query results created by CUBE.

See the examples in the description of the ROLLUP operator for illustrations of the use of the GROUPING function to interpret results.

All CUBE operations return result sets with at least one row where NULL appears in each column except for the aggregate columns. This row represents the summary of each column to the aggregate function.

CUBE example 1 The following queries use data from a census, including the state (geographic location), gender, education level, and income of people. The first query contains a GROUP BY clause that organizes the results of the query into groups of rows, according to the values of the columns state, gender, and education in the table census and computes the average income and the total counts of each group. This query uses only the GROUP BY clause without the CUBE operator to group the rows.

SELECT state, sex as gender, dept_id, COUNT(*),
  CAST(ROUND(AVG(salary),2) AS NUMERIC(18,2)) 
  AS average 
FROM employee WHERE state IN ('MA' , 'CA') 
GROUP BY state, sex, dept_id 
ORDER BY 1,2;

The following are the results from the above query:

state  gender  dept_id  count(*)  avg salary
-----  ------  -------  --------  ----------
CA       F       200      2         58650.00
CA       M       200      1         39300.00
MA       F       500      4         29950.00
MA       F       400      8         41959.88
MA       F       300      7         59685.71
MA       F       200      3         60451.00
MA       F       100      6         58243.42
MA       M       300      2         58850.00
MA       M       500      5         36793.96
MA       M       400      8         45321.47
MA       M       100     13         58563.59
MA       M       200      8         46810.63

Use the CUBE extension of the GROUP BY clause, if you want to compute the average income in the entire census of state, gender, and education and compute the average income in all possible combinations of the columns state, gender, and education, while making only a single pass through the census data. For example, use the CUBE operator if you want to compute the average income of all females in all states, or compute the average income of all people in the census according to their education and geographic location.

When CUBE calculates a group, a NULL value is generated for the columns whose group is calculated. The GROUPING function must be used to distinguish whether a NULL is a NULL stored in the database or a NULL resulting from CUBE. The GROUPING function returns 1 if the designated column has been merged to a higher level group.

CUBE example 2 The following query illustrates the use of the GROUPING function with GROUP BY CUBE.

SELECT case grouping(state) WHEN 1 THEN 'ALL' ELSE state
  END AS c_state, case grouping(sex) WHEN 1 THEN 'ALL'
  ELSE sex end AS c_gender, case grouping(dept_id)
  WHEN 1 THEN 'ALL' ELSE cast(dept_id as char(4)) end
  AS c_dept, COUNT(*), CAST(ROUND(AVG(salary),2) AS
  NUMERIC(18,2))AS AVERAGE 
FROM employee WHERE state IN ('MA' , 'CA') 
GROUP BY CUBE(state, sex, dept_id) 
ORDER BY 1,2,3;

The results of this query are shown below. Note that the NULLs generated by CUBE to indicate a subtotal row are replaced with ALL in the subtotal rows, as specified in the query.

state     sex     dept_id    count   avg salary
-----     ---     -------    -----   ----------
ALL       ALL     100         19       58462.48
ALL       ALL     200         14       50888.43
ALL       ALL     300          9       59500.00
ALL       ALL     400         16       43640.67
ALL       ALL     500          9       33752.20
ALL       ALL     ALL         67       50160.38
ALL       F       100          6       58243.42
ALL       F       200          5       59730.60
ALL       F       300          7       59685.71
ALL       F       400          8       41959.88
ALL       F       500          4       29950.00
ALL       F       ALL         30       50713.08
ALL       M       100         13       58563.59
ALL       M       200          9       45976.11
ALL       M       300          2       58850.00
ALL       M       400          8       45321.47
ALL       M       500          5       36793.96
ALL       M       ALL         37       49712.25
CA        ALL     200          3       52200.00
CA        ALL     ALL          3       52200.00
CA        F       200          2       58650.00
CA        F       ALL          2       58650.00
CA        M       200          1       39300.00
CA        M       ALL          1       39300.00
MA        ALL     100         19       58462.48
MA        ALL     200         11       50530.73
MA        ALL     300          9       59500.00
MA        ALL     400         16       43640.67
MA        ALL     500          9       33752.20
MA        ALL     ALL         64       50064.78
MA        F       100          6       58243.42
MA        F       200          3       60451.00
MA        F       300          7       59685.71
MA        F       400          8       41959.88
MA        F       500          4       29950.00
MA        F       ALL         28       50146.16
MA        M       100         13       58563.59
MA        M       200          8       46810.63
MA        M       300          2       58850.00
MA        M       400          8       45321.47
MA        M       500          5       36793.96
MA        M       ALL         36       50001.48

CUBE example 3 In this example, the query returns a result set that summarizes the total number of orders and then calculates subtotals for the number of orders by year and quarter.

As the number of variables that you want to compare increases, the cost of computing the cube increases exponentially.

SELECT year(order_date) AS Year, quarter(order_date)
  AS Quarter, COUNT(*) AS Orders 
FROM alt_sales_order 
GROUP BY CUBE(Year, Quarter) 
ORDER BY Year, Quarter

The figure that follows represents the result set from the query. The subtotal rows are highlighted in the result set. Each subtotal row has a NULL in the column or columns over which the subtotal is computed.

three column table lists year, quarter and orders

The first highlighted row [1] represents the total number of orders across both years and all quarters. The value in the Orders column is the sum of the values in each of the rows marked [3]. It is also the sum of the four values in the rows marked [2].

The next set of highlighted rows [2] represents the total number of orders by quarter across both years. The two rows marked by [3] represent the total number of orders across all quarters for the years 2000 and 2001, respectively.