MergeJoin

The MergeJoin operator is a binary operator. The left and right children are the outer and inner data streams, respectively. Both data streams must be sorted on the MergeJoin’s key values. First, a row from the outer stream is fetched. This initializes the MergeJoin’s join key values. Then, rows from the inner stream are fetched until a row with key values that match or are greater than (less than if key column is descending) is encountered. If the join key matches, the qualifying row is passed on for additional processing, and a subsequent next call to the MergeJoin operator continues fetching from the currently active stream. If the new values are greater than the current comparison key, these values are used as the new comparison join key while fetching rows from the other stream. This process continues until one of the data streams is exhausted.

Generally, the MergeJoin strategy is effective when a scan of the data streams requires that most of the rows must be processed, and that, if any of the input streams are large, they are already sorted on the join keys.

1> -- Collect all of the title ids for books written by "Bloom".
2> select ta.title_id
3>       from titleauthor ta, authors a
4> where a.au_id = ta.au_id
5>       and au_lname = "Bloom"
6> go

QUERY PLAN FOR STATEMENT 1 (at line 2).

    STEP 1
        The type of query is EXECUTE.
        Executing a newly cached statement.

QUERY PLAN FOR STATEMENT 1 (at line 2).

4 operator(s) under root

The type of query is SELECT.

ROOT:EMIT Operator

	|MERGE JOIN Operator (Join Type: Inner Join)
	| Using Worktable2 for internal storage.
	|  Key Count: 1
	|  Key Ordering: ASC
	|
	|   |SORT Operator
	|   | Using Worktable1 for internal storage.
	|   |
	|   |   |SCAN Operator
	|   |   |  FROM TABLE
	|   |   |  authors
	|   |   |  a
	|   |   |  Index : aunmind
	|   |   |  Forward Scan.
	|   |   |  Positioning by key.
	|   |   |  Keys are:
	|   |   |    au_lname ASC
	|   |   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |   |  With LRU Buffer Replacement Strategy for index leaf pages.
	|   |   |  Using I/O Size 2 Kbytes for data pages.
	|   |   |  With LRU Buffer Replacement Strategy for data pages.
	|
	|   |SCAN Operator
	|   |  FROM TABLE
	|   |  titleauthor
	|   |  ta
	|   |  Index : auidind
	|   |  Forward Scan.
	|   |  Positioning at index start.
	|   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |  With LRU Buffer Replacement Strategy for index leaf pages.
	|   |  Using I/O Size 2 Kbytes for data pages.
	|   |  With LRU Buffer Replacement Strategy for data pages.

In this example, a sort operator is the left child or outer stream. The data source for the sort operator is the authors table. The sort operator is required because the authors table has no index on au_id that would otherwise provide the necessary sorted order. A scan of the titleauthor table is the right child/inner stream. The scan uses the auidind index, which provides the necessary ordering for the MergeJoin strategy.

A row is fetched from the outer stream (the authors table is the original source) to establish an initial join key comparison value. Then rows are fetched from the titleauthor table until a row with a join key equal to or greater than the comparison key is found.

Inner stream rows with matching keys are stored in a cache in case they need to be refetched. These rows are refetched when the outer stream contains duplicate keys. When a titleauthor.au_id value that is greater than the current join key comparison value is fetched, then the MergeJoin operator starts fetching from the outer stream until a join key value equal to or greater than the current titleauthor.au_id value is found. The scan of the inner stream resumes at that point.

The MergeJoin operator’s showplan output contains a message indicating what worktable will be used for the inner stream's backing store. The worktable is written to if the inner rows with duplicate join keys no longer fits in cached memory. The width of a cached row is limited to 64KB.