If you have data that varies across dimensions such as date or place, you may need to determine how the data varies in each dimension. You can use the ROLLUP and CUBE operators to create multiple levels of subtotals and a grand total from a list of references to grouping columns. The subtotals “roll up” from the most detailed level to the grand total. For example, if you are analyzing sales data, you can compute an overall average and the average sales by year using the same query.
To select total car sales by year, model and color:
SELECT year, model, color, sum(sales) FROM sales_tab GROUP BY ROLLUP (year, model, color);
year |
model |
color |
sales |
---|---|---|---|
1990 |
Chevrolet |
red |
5 |
1990 |
Chevrolet |
white |
87 |
1990 |
Chevrolet |
blue |
62 |
1990 |
Chevrolet |
NULL |
154 |
1990 |
Ford |
blue |
64 |
1990 |
Ford |
red |
62 |
1990 |
Ford |
white |
63 |
1990 |
Ford |
NULL |
189 |
1990 |
NULL |
NULL |
343 |
1991 |
Chevrolet |
blue |
54 |
1991 |
Chevrolet |
red |
95 |
1991 |
Chevrolet |
white |
49 |
1991 |
Chevrolet |
NULL |
198 |
1991 |
Ford |
blue |
52 |
1991 |
Ford |
red |
55 |
1991 |
Ford |
white |
9 |
1991 |
Ford |
NULL |
116 |
1991 |
NULL |
NULL |
314 |
NULL |
NULL |
NULL |
657 |
When processing this query, Sybase IQ groups the data first by all three specified grouping expressions (year, model, color), then for all grouping expressions except the last one (color). In the fifth row, NULL indicates the ROLLUP value for the color column, in other words, the total number of sales of that model in all colors. 343 represents the total sales of all models and colors in 1990 and 314 is the total for 1991. The last row represents total sales on all years, all models and all colors.
ROLLUP requires an ordered list of grouping expressions as arguments. When listing groups that contain other groups, list the larger group first (such as state before city.)
You can use ROLLUP with the aggregate functions: SUM, COUNT, AVG, MIN, MAX, STDDEV, and VARIANCE. ROLLUP does not support COUNT DISTINCT and SUM DISTINCT, however.
The following query uses data from a census, including the state (geographic location), gender, education level, and income of people. You can use the CUBE extension of the GROUP BY clause, if you want to compute the average income in the entire census of state, gender, and education and compute the average income in all possible combinations of the columns state, gender, and education, while making only a single pass through the census data in the table census. For example, use the CUBE operator if you want to compute the average income of all females in all states, or compute the average income of all people in the census according to their education and geographic location.
When CUBE calculates a group, CUBE puts a NULL value in the column(s) whose group is calculated. The distinction is difficult between the type of group each row represents and whether the NULL is a NULL stored in the database or a NULL resulting from CUBE. The GROUPING function solves this problem by returning 1, if the designated column has been merged to a higher level group.
The following query illustrates the use of the GROUPING function with GROUP BY CUBE.
SELECT CASE GROUPING ( state ) WHEN 1 THEN 'ALL' ELSE state END AS c_state, CASE GROUPING ( gender ) WHEN 1 THEN 'ALL' ELSE gender END AS c_gender, CASE GROUPING ( education ) WHEN 1 THEN 'ALL' ELSE education END AS c_education, COUNT(*), CAST (ROUND ( AVG ( income ), 2 ) AS NUMERIC (18,2)) AS average FROM census GROUP BY CUBE (state, gender, education);
The results of this query are shown below. Note that the NULLs generated by CUBE to indicate a subtotal row are replaced with ALL in the subtotal rows, as specified in the query.
c_state |
c_gender |
c_education |
count(*) |
average |
---|---|---|---|---|
MA |
f |
BA |
3 |
48333.33 |
MA |
f |
HS |
2 |
40000.00 |
MA |
f |
MS |
1 |
45000.00 |
MA |
f |
ALL |
6 |
45000.00 |
MA |
m |
BA |
4 |
55000.00 |
MA |
m |
HS |
1 |
55000.00 |
MA |
m |
MS |
3 |
85000.00 |
MA |
m |
ALL |
8 |
66250.00 |
MA |
ALL |
ALL |
14 |
57142.86 |
NH |
f |
HS |
2 |
50000.00 |
NH |
f |
MS |
1 |
85000.00 |
NH |
f |
ALL |
3 |
61666.67 |
NH |
m |
BA |
3 |
55000.00 |
NH |
m |
MS |
1 |
49000.00 |
NH |
m |
ALL |
4 |
53500.00 |
NH |
ALL |
ALL |
7 |
57000.00 |
ALL |
ALL |
ALL |
21 |
57095.24 |
ALL |
ALL |
BA |
10 |
53000.00 |
ALL |
ALL |
MS |
6 |
72333.33 |
ALL |
ALL |
HS |
5 |
47000.00 |
ALL |
f |
ALL |
9 |
50555.56 |
ALL |
m |
ALL |
12 |
62000.00 |
ALL |
f |
BA |
3 |
48333.33 |
ALL |
m |
HS |
1 |
55000.00 |
ALL |
m |
MS |
4 |
76000.00 |
ALL |
m |
BA |
7 |
55000.00 |
ALL |
f |
MS |
2 |
65000.00 |
ALL |
f |
HS |
4 |
45000.00 |
NH |
ALL |
HS |
2 |
50000.00 |
NH |
ALL |
MS |
2 |
67000.00 |
MA |
ALL |
MS |
4 |
75000.00 |
MA |
ALL |
HS |
3 |
45000.00 |
MA |
ALL |
BA |
7 |
52142.86 |
NH |
ALL |
BA |
3 |
55000.00 |
Data warehouse administrators find ROLLUP and CUBE particularly useful for operations like:
Subtotaling on a hierarchical dimension like geography or time, for example year/month/day or country/state/city
Populating summary tables
ROLLUP and CUBE allow you to use one query to compute data using multiple levels of grouping, instead of a separate query for each level.
See the SELECT statement in Chapter 6, “SQL Statements,”Sybase IQ Reference Manual, for more information on the ROLLUP and CUBE operators.