You
can use the GROUP SORTED Distinct
operator to
apply distinctness. GROUP SORTED Distinct
requires
that the input stream is already sorted on the distinct columns.
It reads a row from its child operator and initializes the current
distinct columns’ values to be filtered.
The row is returned to the parent operator. When the GROUP
SORTED
operator is called again to fetch another
row, it fetches another row from its child and compares the values
to the current cached values. If the value is a duplicate, the row
is discarded and the child is called again to fetch a new row.
This process continues until a new distinct row is found. The distinct columns’ values for this row are cached and are used later to eliminate nondistinct rows. The current row is returned to the parent operator for further processing.
The GROUP SORTED Distinct
operator
returns a sorted stream. The fact that it returns a sorted and distinct
data stream are properties that the optimizer can use to improve
performance in additional upstream processing. The GROUP SORTED
Distinct
operator is a nonblocking operator.
It returns a distinct row to its parent as soon as it is fetched.
It does not require the entire input stream to be processed before
it can start returning rows. The following query collects distinct
last and first author’s names:
select distinct au_lname, au_fname from authors where au_lname = "Bloom" QUERY PLAN FOR STATEMENT 1 (at line 2). STEP 1 The type of query is SELECT. 2 operator(s) under root ROOT:EMIT Operator (VA = 2) |GROUP SORTED Operator (VA = 1) |Distinct | | |SCAN Operator (VA = 0) | | FROM TABLE | | authors | | Index : aunmind | | Forward Scan. | | Positioning by key. | | Index contains all needed columns. Base table will not be read. | | Keys are: | | au_lname ASC | | Using I/O Size 2 Kbytes for index leaf pages. | | With LRU Buffer Replacement Strategy for index leaf pages.
The GROUP SORTED Distinct
operator
is chosen in this query plan to apply the distinct property because
the scan operator is returning rows in sorted order for the distinct
columns au_lname and au_fname. GROUP
SORTED
incurs no I/O and minimal CPU
overhead.
You can use the GROUP SORTED Distinct
operator to
implement vector aggregation. See “Vector aggregation operators”. The showplan output
prints the line Distinct
to
indicate that this GROUP SORTED Distinct
operator
is implementing the distinct property.