The HASH
JOIN
operator is a binary operator. The left
child generates the build input stream. The right child generates
the probe input stream. The build set is generated by completely
draining the build input stream when the first row is requested
from the HASH JOIN
operator.
Every row is read from the input stream and hashed into an appropriate
bucket using the hash key.
If there is not enough memory to hold the entire build set, then a portion of it spills to disk. This portion is referred to as a hash partition and should not be confused with table partitions. A hash partition consists of a collection of hash buckets. After the entire left child’s stream has been drained, the probe input is read.
Each row from the probe set is hashed. A lookup is done in the corresponding build bucket to check for rows with matching hash keys. This occurs if the build set’s bucket is memory resident. If it has been spilled, the probe row is written to the corresponding spilled probe partition. When a probe row’s key matches a build row’s key, then the necessary projection of the two row’s columns is passed up for additional processing.
Spilled partitions are processed in subsequent recursive passes
of the HASH JOIN
algorithm.
New hash seeds are used in each pass so that the data is redistributed
across different hash buckets. This recursive processing continues until
the last spilled partition is completely memory resident. When a
hash partition from the build set contains many duplicates, the HASH
JOIN
operator reverts back to NESTED
LOOP JOIN
processing.
Generally, the HASH JOIN
strategy
is good in cases where most of the rows from the source sets must
be processed and there are no inherent useful orderings on the join
keys or there are no interesting orderings that can be promoted
to calling operators (for example, an order by clause
on the join key). HASH JOIN
s perform
particularly well if one of the data sets is small enough to be
memory resident. In this case, no spilling occurs and no I/O
is needed to perform that HASH JOIN
algorithm.
select ta.title_id from titleauthor ta, authors a where a.au_id = ta.au_id and au_lname = "Bloom" QUERY PLAN FOR STATEMENT 1 (at line 2). 3 operator(s) under root The type of query is SELECT. ROOT:EMIT Operator |HASH JOIN Operator (Join Type: Inner Join) | Using Worktable1 for internal storage. | | |SCAN Operator | | FROM TABLE | | authors | | a | | Index : aunmind | | Forward Scan. | | Positioning by key. | | Keys are: | | au_lname ASC | | Using I/O Size 2 Kbytes for index leaf pages. | | With LRU Buffer Replacement Strategy for index leaf pages. | | Using I/O Size 2 Kbytes for data pages. | | With LRU Buffer Replacement Strategy for data pages. | | |SCAN Operator | | FROM TABLE | | titleauthor | | ta | | Index : auidind | | Forward Scan. | | Positioning at index start. | | Using I/O Size 2 Kbytes for index leaf pages. | | With LRU Buffer Replacement Strategy for index leaf pages. | | Using I/O Size 2 Kbytes for data pages. | | With LRU Buffer Replacement Strategy for data pages.
In this example, the source of the build input stream is an
index scan of author.aunmind.Only rows with
an au_lname value of “Bloom” are
returned from this scan. These rows are then hashed on their au_id value
and placed into their corresponding hash bucket. After the initial
build phase is completed, the probe stream is opened and scanned.
Each row from the source index, titleauthor.auidind,
is hashed on the au_id column. The
resulting hash value is used to determine which bucket in the build
set should be searched for matching hash keys. Each row from the
build set’s hash bucket is compared to the probe row’s
hash key for equality. If the row matches, the titleauthor.au_id column
is returned to the EMIT
operator.
The HASH JOIN
operator’s showplan output
contains a message indicating the worktable to be used for the spilled
partition’s backing store. The input row width is limited
to 64 kilobytes.