To improve performance, very large tables are often segmented into several small tables and accessed using a UNION ALL in a view. For certain very specific queries using such a view with a GROUP BY, the Sybase IQ optimizer is able to enhance performance by pushing some GROUP BY operations into each arm of such a UNION ALL, performing the operations in parallel, then combining the results. This method, referred to as split GROUP BY, reduces the amount of data that is processed by the top level GROUP BY, and consequently reduces query processing time.
Only certain queries with a GROUP BY over a UNION ALL show a performance improvement. The following simple query, for example, benefits from the split GROUP BY:
CREATE VIEW vtable (v1 int, v2 char(4)) AS SELECT a1, a2 FROM tableA UNION ALL SELECT b1, b2 FROM tableB; SELECT COUNT(*), SUM(v1) FROM vtable GROUP BY v2;
When analyzing this query, the optimizer first performs COUNT(*) GROUP BY on tableA and COUNT(*) GROUP BY on tableB, then passes these results to the top level GROUP BY. The top level GROUP BY performs a SUM of the two COUNT(*) results, to produce the final query result. Note that the role of the top level GROUP BY changes: the aggregation used by the top level GROUP BY is SUM instead of COUNT.
There are some restrictions on the situations and queries that benefit from the split GROUP BY.
The query may benefit from the split GROUP BY, if the query uses UNION ALL, rather than UNION. The following query uses GROUP BY with UNION, so it does not take advantage of the GROUP BY split:
CREATE VIEW viewA (va1 int, va2 int, va3 int, va4 int) AS SELECT b1, b2, b3, b4 FROM tableB UNION SELECT c1, c2, c3, c4 FROM tableC; SELECT SUM(va1) FROM viewA GROUP BY va3;
The query may benefit from the split GROUP BY, if an aggregation in the query does not contain DISTINCT. The following query uses SUM DISTINCT, so it does not take advantage of the split GROUP BY:
CREATE VIEW viewA (va1 int, va2 int, va3 int, va4 int) AS SELECT b1, b2, b3, b4 FROM tableB UNION ALL SELECT c1, c2, c3, c4 FROM tableC; SELECT SUM(DISTINCT va1) FROM viewA GROUP BY va3;
In order for the query to benefit from the split GROUP BY, you need enough memory in the temporary shared buffer cache to store the aggregation information and data used for processing the additional GROUP BY operators.
CREATE VIEW viewA (va1 int, va2 int, va3 int, va4 int) AS SELECT b1, b2, b3, b4 FROM tableB UNION ALL SELECT c1, c2, c3, c4 FROM tableC UNION ALL SELECT d1, d2, d3, d4 FROM tableD UNION ALL SELECT e1, e2, e3, e4 FROM tableE UNION ALL SELECT f1, f2, f3, f4 FROM tableF UNION ALL SELECT g1, g2, g3, g4 FROM tableG; SELECT SUM(va1) FROM viewA GROUP BY va3;
In this example, the IQ optimizer splits the GROUP BY and inserts six GROUP BY operators into the query plan. Consequently, the query requires more temporary cache to store aggregation information and data. If the system cannot allocate enough cache, the optimizer does not split the GROUP BY.You can use the TEMP_CACHE_MEMORY_MB database option to increase the size of the temporary cache, if memory is available. For information on setting buffer cache sizes, see the sections “Determining the sizes of the buffer caches” on page 515 and “TEMP_CACHE_MEMORY_MB option” in the chapter “Database Options” of the Sybase IQ Reference Manual.
In order for the query to benefit from split GROUP BY, the AGGREGATION_PREFERENCE database option should be set to its default value of 0. This value allows the IQ optimizer to determine the best algorithm to apply to the GROUP BY. The query does not benefit from split GROUP BY, if the value of AGGREGATION_PREFERENCE forces the IQ optimizer to choose a sort algorithm to process the GROUP BY. The option AGGREGATION_PREFERENCE can be used to override the optimizer's choice of algorithm for processing the GROUP BY and should not be set to 1 or 2 in this case.
In this example, a large table named tableA is segmented into four smaller tables: tabA1, tabA2, tabA3, and tabA4. The view unionTab is created using the four smaller tables and UNION ALL:
CREATE VIEW unionTab (v1 int, v2 int, v3 int, v4 int) AS SELECT a, b, c, d FROM tabA1 UNION ALL SELECT a, b, c, d FROM tabA2 UNION ALL SELECT a, b, c, d FROM tabA3 UNION ALL SELECT a, b, c, d FROM tabA4;
The IQ optimizer splits the GROUP BY operation in the following queries and improves query performance:
SELECT v1, v2, SUM(v3), COUNT(*) FROM unionTab GROUP BY v1, v2; SELECT v3, SUM(v1*v2) FROM unionTab GROUP BY v3;