HASH JOIN

The HASH JOIN operator is a binary operator. The left child generates the build input stream. The right child generates the probe input stream. The build set is generated by completely draining the build input stream when the first row is requested from the HASH JOIN operator. Every row is read from the input stream and hashed into an appropriate bucket using the hash key.

If there is not enough memory to hold the entire build set, then a portion of it spills to disk. This portion is referred to as a hash partition and should not be confused with table partitions. A hash partition consists of a collection of hash buckets. After the entire left child’s stream has been drained, the probe input is read.

Each row from the probe set is hashed. A lookup is done in the corresponding build bucket to check for rows with matching hash keys. This occurs if the build set’s bucket is memory resident. If it has been spilled, the probe row is written to the corresponding spilled probe partition. When a probe row’s key matches a build row’s key, then the necessary projection of the two row’s columns is passed up for additional processing.

Spilled partitions are processed in subsequent recursive passes of the HASH JOIN algorithm. New hash seeds are used in each pass so that the data is redistributed across different hash buckets. This recursive processing continues until the last spilled partition is completely memory resident. When a hash partition from the build set contains many duplicates, the HASH JOIN operator reverts back to NESTED LOOP JOIN processing.

Generally, the HASH JOIN strategy is good in cases where most of the rows from the source sets must be processed and there are no inherent useful orderings on the join keys or there are no interesting orderings that can be promoted to calling operators (for example, an order by clause on the join key). HASH JOINs perform particularly well if one of the data sets is small enough to be memory resident. In this case, no spilling occurs and no I/O is needed to perform that HASH JOIN algorithm.

select ta.title_id
from titleauthor ta, authors a
where a.au_id = ta.au_id
and au_lname = "Bloom"

QUERY PLAN FOR STATEMENT 1 (at line 2).

3 operator(s) under root

The type of query is SELECT.

ROOT:EMIT Operator

	|HASH JOIN Operator (Join Type: Inner Join)
	| Using Worktable1 for internal storage.
	|
	|   |SCAN Operator
	|   |  FROM TABLE
	|   |  authors
	|   |  a
	|   |  Index : aunmind
	|   |  Forward Scan.
	|   |  Positioning by key.
	|   |  Keys are:
	|   |    au_lname ASC
	|   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |  With LRU Buffer Replacement Strategy for index leaf pages.
	|   |  Using I/O Size 2 Kbytes for data pages.
	|   |  With LRU Buffer Replacement Strategy for data pages.
	|
	|   |SCAN Operator
	|   |  FROM TABLE
	|   |  titleauthor
	|   |  ta
	|   |  Index : auidind
	|   |  Forward Scan.
	|   |  Positioning at index start.
	|   |  Using I/O Size 2 Kbytes for index leaf pages.
	|   |  With LRU Buffer Replacement Strategy for index leaf pages.
	|   |  Using I/O Size 2 Kbytes for data pages.
	|   |  With LRU Buffer Replacement Strategy for data pages.

In this example, the source of the build input stream is an index scan of author.aunmind.Only rows with an au_lname value of “Bloom” are returned from this scan. These rows are then hashed on their au_id value and placed into their corresponding hash bucket. After the initial build phase is completed, the probe stream is opened and scanned. Each row from the source index, titleauthor.auidind, is hashed on the au_id column. The resulting hash value is used to determine which bucket in the build set should be searched for matching hash keys. Each row from the build set’s hash bucket is compared to the probe row’s hash key for equality. If the row matches, the titleauthor.au_id column is returned to the EMIT operator.

The HASH JOIN operator’s showplan output contains a message indicating the worktable to be used for the spilled partition’s backing store. The input row width is limited to 64 kilobytes.