There
are three unary operators used for vector aggregation. They are
the GROUP SORTED COUNT AGGREGATE
, the HASH
VECTOR AGGREGATE
, and the GROUP
INSERTING
operators.
See Table 1-3 for a list and description of all query processor operators.
The GROUP SORTED COUNT AGGREGATE
nonblocking
operator is a variant of the GROUP SORTED Distinct
operator
described in “GROUP SORTED Distinct operator”. The GROUP
SORTED COUNT AGGREGATE
operator requires that
input set to be sorted on the group by columns.
The algorithm is very similar to that of GROUP
SORTED Distinct
.
A row is read from the child operator. If the row is the start of a new vector, its grouping columns are cached and the aggregation results are initialized.
If the row belongs to the current group being processed, the
aggregate functions are applied to the aggregate results. When the
child operator returns a row that starts a new group or End
Of Scan
, the current vector and its aggregated
values are returned to the parent operator.
The first row in the GROUP SORTED COUNT
AGGREGATE
operator is returned after an entire
group is processed, where the first row in the GROUP
SORTED Distinct
operator is returned at the
start of a new group. This example collects a list of all cities
with the number of authors that live in each city.
select city, total_authors = count(*) from authors group by city plan "(group_sorted (sort (scan authors)) )" QUERY PLAN FOR STATEMENT 1 (at line 1). Optimized using the Abstract Plan in the PLAN clause. STEP 1 The type of query is SELECT. 3 operator(s) under root ROOT:EMIT Operator (VA = 3) |GROUP SORTED Operator (VA = 2) | Evaluate Grouped COUNT AGGREGATE. | | |SORT Operator (VA = 1) | | Using Worktable1 for internal storage. | | | | | SCAN Operator (VA = 0) | | | FROM TABLE | | | authors | | | Table Scan. | | | Forward Scan. | | | Positioning at start of table. | | | Using I/O Size 2 Kbytes for data pages. | | | With LRU Buffer Replacement Strategy for data pages.
In this query plan, the scan of authors does
not return rows in grouping order. A SORT
operator
is applied to order the stream based on the grouping column city.
At this point, a GROUP SORTED COUNT AGGREGATE
operator
can be applied to evaluate the count aggregate.
The GROUP SORTED COUNT AGGREGATE
operator showplan output
reports the aggregate functions being applied as:
| Evaluate Grouped COUNT AGGREGATE.
The HASH
VECTOR AGGREGATE
operator is a blocking operator.
All rows from the child operator must be processed before the first
row from the HASH VECTOR AGGREGATE
operator
can be returned to its parent operator. Other than this, the algorithm
is similar to the HASH Distinct
operator’s
algorithm.
Rows are fetched from the child operator. Each row is hashed on the query’s grouping columns. The bucket that is hashed is searched to see if the vector already exists.
If the group by values do not exist, the vector is added and the aggregate values are initialized using this first row. If the group by values do exist, the current row is aggregated to the existing values. This example collects a list of all cities with the number of authors that live in each city.
select city, total_authors = count(*) from authors group by city QUERY PLAN FOR STATEMENT 1 (at line 1). STEP 1 The type of query is SELECT. 2 operator(s) under root ROOT:EMIT Operator (VA = 2) |HASH VECTOR AGGREGATE Operator (VA = 1) | GROUP BY | Evaluate Grouped COUNT AGGREGATE. | Using Worktable1 for internal storage. | Key Count: 1 | | |SCAN Operator (VA = 0) | | FROM TABLE | | authors | | Table Scan. | | Forward Scan. | | Using I/O Size 2 Kbytes for data pages. | | With LRU Buffer Replacement Strategy for data pages.
In this query plan, the HASH VECTOR AGGREGATE
operator
reads all of the rows from its child operator, which is scanning
the authors table. Each row is checked to see
if there is already an entry bucket entry for the current city value. If
there is not, a hash entry row is added with the new city grouping
value and the count result is initialized to 1. If there is already
a hash entry for the new row’s city value,
the aggregation function is applied. In this case, the count result
is incriminated.
The showplan output prints a group
by message specifically for the HASH VECTOR
AGGREGATE
operator, then prints the grouped
aggregation messages:
| Evaluate Grouped COUNT AGGREGATE.
The showplan output reports used to store spilled groups and unprocessed rows:
| Using Worktable1 for internal storage.
GROUP INSERTING
is
a blocking operator. All rows from the child operator must be processed
before the first row can be returned from the GROUP INSERTING
.
GROUP INSERTING
is
limited to 31 or fewer columns in the group by clause. The
operator starts by creating a worktable with a clustered index of
the grouping columns. As each row is fetched from the child, a lookup
into the work table is done based on the grouping columns. If no
row is found, then the row is inserted. This effectively creates
a new group and initializes its aggregate values. If a row is found,
the new aggregate values are updated based on evaluating the new
values. The GROUP INSERTING
operator
returns rows ordered by the grouping columns.
select city, total_authors = count(*) from authors group by city plan '(group_inserting (i_scan auidind authors ))'
QUERY PLAN FOR STATEMENT 1 (at line 1). Optimized using the Abstract Plan in the PLAN clause. STEP 1 The type of query is SELECT. 2 operator(s) under root |ROOT:EMIT Operator (VA = 2) | | |GROUP INSERTING Operator (VA = 1) | | GROUP BY | | Evaluate Grouped COUNT AGGREGATE | | Using Worktable1 for internal storage. | | | | |SCAN Operator (VA = 0) | | | FROM TABLE | | | authors | | | Table Scan. | | | Forward Scan. | | | Positioning at start of table. | | | Using I/O Size 2 Kbytes for data pages. | | | With LRU Buffer Replacement Strategy for data pages.
In this example, the group inserting operator starts by building
a worktable with a clustered index keyed on the city column.
The group inserting operator proceeds to drain the authors table.
For each row, a lookup is done on the city value.
If there is no row in the aggregation worktable with the current city value,
then the row is inserted. This creates a new group for the current city value
with an initialized count value. If the row
for the current city value is found, then an
evaluation is done to increment the COUNT AGGREGATE
value.