GroupSorted (Distinct) operator

The GroupSorted (Distinct) operator can be used to apply distinctness. It requires that the input stream is already sorted on the distinct columns. It reads a row from its child operator and initializes the current distinct columns’ values to be filtered. The row is returned to the parent operator. When the group sorted operator is called again to fetch another row, it fetches another row from its child and compares the values to the current cached values. If the value is a duplicate, then the row is discarded and the child is called again to fetch a new row. This process continues until a new distinct row is found. The distinct columns’ values for this row are cached and will be used later to eliminate nondistinct rows. The current row is returned to the parent operator for further processing.

The GroupSorted (Distinct) operator returns a sorted stream. The fact that it returns a sorted and distinct data stream are properties that the optimizer can exploit to improve performance in additional upstream processing. The GroupSorted (Distinct) operator is a nonblocking operator. It returns a distinct row to its parent as soon as it is fetched. It does not require that the entire input stream is processed before it can start returning rows. The following query collects distinct last and first author’s names.

1> select distinct au_lname, au_fname
2> from authors
3> where au_lname = "Bloom"

QUERY PLAN FOR STATEMENT 1 (at line 2).

2 operator(s) under root

The type of query is SELECT.

ROOT:EMIT Operator

	|GROUP SORTED Operator
	|Distinct
	|
	|   |SCAN Operator
	|   |  FROM TABLE
	|   |  authors
	|   |  Index : aunmind
	|   |  Forward Scan.
	|   |  Positioning by key.
	|   |  Index contains all needed columns. Base table will not be read.
	|   |  Keys are:
	|   |    au_lname ASC
	|   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |  With LRU Buffer Replacement Strategy for index leaf pages.

The SortOp (Distinct) operator is chosen in this query plan to apply the distinct property because the scan operator is returning rows in sorted order for the distinct columns au_lname and au_fname. By using the GroupSorted operator here, there is no I/O and minimal CPU overhead.

The GroupSorted (Distinct) operator can also be used to implement vector aggregation. See “Vector aggregation operators” for more information. The showplan output prints the line Distinct to indicate that this GroupSorted (Distinct) operator is implementing the distinct property.