What is the difference between a hash join and a merge join Oracle RDBMS

Question

What are the performance gains losses between hash joins and merge joins  specifically in Oracle RDBMS

User · Accepted Answer

A  sort merge  join is performed by sorting the two data sets to be joined according to the join keys and then merging them together  The merge is very cheap  but the sort can be prohibitively expensive especially if the sort spills to disk  The cost of the sort can be lowered if one of the data sets can be accessed in sorted order via an index  although accessing a high proportion of blocks of a table via an index scan can also be very expensive in comparison to a full table scan   A hash join is performed by hashing one data set into memory based on join columns and reading the other one and probing the hash table for matches  The hash join is very low cost when the hash table can be held entirely in memory  with the total cost amounting to very little more than the cost of reading the data sets  The cost rises if the hash table has to be spilled to disk in a one-pass sort  and rises considerably for a multipass sort    In pre-10g  outer joins from a large to a small table were problematic performance-wise  as the optimiser could not resolve the need to access the smaller table first for a hash join  but the larger table first for an outer join  Consequently hash joins were not available in this situation    The cost of a hash join can be reduced by partitioning both tables on the join key s   This allows the optimiser to infer that rows from a partition in one table will only find a match in a particular partition of the other table  and for tables having n partitions the hash join is executed as n independent hash joins  This has the following effects    The size of each hash table is reduced  hence reducing the maximum amount of memory required and potentially removing the need for the operation to require temporary disk space  For parallel query operations the amount of inter-process messaging is vastly reduced  reducing CPU usage and improving performance  as each hash join can be performed by one pair of PQ processes  For non-parallel query operations the memory requirement is reduced by a factor of n  and the first rows are projected from the query earlier    You should note that hash joins can only be used for equi-joins  but merge joins are more flexible   In general  if you are joining large amounts of data in an equi-join then a hash join is going to be a better bet   This topic is very well covered in the documentation   http   download oracle com docs cd B28359 01 server 111 b28274 optimops htm i51523  12 1 docs  https   docs oracle com database 121 TGSQL tgsql join htm

User · Answer

I just want to edit this for posterity that the tags for oracle weren t added when I answered this question   My response was more applicable to MS SQL   Merge join is the best possible as it exploits the ordering  resulting in a single pass down the tables to do the join   IF you have two tables  or covering indexes  that have their ordering the same such as a primary key and an index of a table on that key then a merge join would result if you performed that action   Hash join is the next best  as it s usually done when one table has a small number  relatively  of items  its effectively creating a temp table with hashes for each row which is then searched continuously to create the join   Worst case is nested loop which is order  n   m  which means there is no ordering or size to exploit and the join is simply  for each row in table x  search table y for joins to do

[sql] What is the difference between a hash join and a merge join (Oracle RDBMS )?

Examples related to sql

Examples related to performance

Examples related to oracle

Examples related to join