Impact on query performance of GROUP BY over a UNION ALL

To improve performance, very large tables are often segmented into several small tables and accessed using a UNION ALL in a view. For certain very specific queries using such a view with a GROUP BY, the Sybase IQ optimizer is able to enhance performance by pushing some GROUP BY operations into each arm of such a UNION ALL, performing the operations in parallel, then combining the results. This method, referred to as split GROUP BY, reduces the amount of data that is processed by the top level GROUP BY, and consequently reduces query processing time.

Only certain queries with a GROUP BY over a UNION ALL show a performance improvement. The following simple query, for example, benefits from the split GROUP BY:

CREATE VIEW vtable (v1 int, v2 char(4)) AS
SELECT a1, a2 FROM tableA
UNION ALL
SELECT b1, b2 FROM tableB;

SELECT COUNT(*), SUM(v1) FROM vtable GROUP BY v2;

When analyzing this query, the optimizer first performs COUNT(*) GROUP BY on tableA and COUNT(*) GROUP BY on tableB, then passes these results to the top level GROUP BY. The top level GROUP BY performs a SUM of the two COUNT(*) results, to produce the final query result. Note that the role of the top level GROUP BY changes: the aggregation used by the top level GROUP BY is SUM instead of COUNT.

Restrictions on split GROUP BY

There are some restrictions on the situations and queries that benefit from the split GROUP BY.

The query may benefit from the split GROUP BY, if the query uses UNION ALL, rather than UNION. The following query uses GROUP BY with UNION, so it does not take advantage of the GROUP BY split:
```
CREATE VIEW viewA (va1 int, va2 int, va3 int,
va4 int) AS
SELECT b1, b2, b3, b4 FROM tableB
UNION
SELECT c1, c2, c3, c4 FROM tableC;

SELECT SUM(va1) FROM viewA GROUP BY va3;
```
The query may benefit from the split GROUP BY, if an aggregation in the query does not contain DISTINCT. The following query uses SUM DISTINCT, so it does not take advantage of the split GROUP BY:
```
CREATE VIEW viewA (va1 int, va2 int, va3 int,
va4 int) AS
SELECT b1, b2, b3, b4 FROM tableB
UNION ALL
SELECT c1, c2, c3, c4 FROM tableC;

SELECT SUM(DISTINCT va1) FROM viewA GROUP BY va3;
```
In order for the query to benefit from the split GROUP BY, you need enough memory in the temporary shared buffer cache to store the aggregation information and data used for processing the additional GROUP BY operators.
```
CREATE VIEW viewA (va1 int, va2 int, va3 int,
va4 int) AS
SELECT b1, b2, b3, b4 FROM tableB
UNION ALL
SELECT c1, c2, c3, c4 FROM tableC
UNION ALL
SELECT d1, d2, d3, d4 FROM tableD
UNION ALL
SELECT e1, e2, e3, e4 FROM tableE
UNION ALL
SELECT f1, f2, f3, f4 FROM tableF
UNION ALL
SELECT g1, g2, g3, g4 FROM tableG;

SELECT SUM(va1) FROM viewA GROUP BY va3;
```
In this example, the IQ optimizer splits the GROUP BY and inserts six GROUP BY operators into the query plan. Consequently, the query requires more temporary cache to store aggregation information and data. If the system cannot allocate enough cache, the optimizer does not split the GROUP BY.You can use the TEMP_CACHE_MEMORY_MB database option to increase the size of the temporary cache, if memory is available. For information on setting buffer cache sizes, see the sections “Determining the sizes of the buffer caches” on page 515 and “TEMP_CACHE_MEMORY_MB option” in the chapter “Database Options” of the Sybase IQ Reference Manual.
In order for the query to benefit from split GROUP BY, the AGGREGATION_PREFERENCE database option should be set to its default value of 0. This value allows the IQ optimizer to determine the best algorithm to apply to the GROUP BY. The query does not benefit from split GROUP BY, if the value of AGGREGATION_PREFERENCE forces the IQ optimizer to choose a sort algorithm to process the GROUP BY. The option AGGREGATION_PREFERENCE can be used to override the optimizer's choice of algorithm for processing the GROUP BY and should not be set to 1 or 2 in this case.

Examples of split GROUP BY

In this example, a large table named tableA is segmented into four smaller tables: tabA1, tabA2, tabA3, and tabA4. The view unionTab is created using the four smaller tables and UNION ALL:

CREATE VIEW unionTab (v1 int, v2 int, v3 int, v4 int) AS
SELECT a, b, c, d FROM tabA1
UNION ALL
SELECT a, b, c, d FROM tabA2
UNION ALL
SELECT a, b, c, d FROM tabA3
UNION ALL
SELECT a, b, c, d FROM tabA4;

The IQ optimizer splits the GROUP BY operation in the following queries and improves query performance:

SELECT v1, v2, SUM(v3), COUNT(*) FROM unionTab
GROUP BY v1, v2;

SELECT v3, SUM(v1*v2) FROM unionTab
GROUP BY v3;