Survey of DB Join Algorithm

A tour of Join

Join is a popular operation among Database Systems- A standard join concept is to combine multiple columns from one or more tables in relational database- ANSI-standard SQL specifies five types of JOIN: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS- The join operation will produce a temporary table which containing the results meeting the conditions- In the followings, we will first introduce different join operation types- We assume that have two tables named R and S-

1- CROSS JOIN: CROSS JOIN can be seen as the Cartesian products of two tables- The sum of items is |R| * |S|-
2- Equi-join & Natural Join: Natural Join is a special case of Equi-join- We can write as this via SQL sentences:

1
2
3
SELECT *
FROM employee JOIN department
ON employee-DepartmentID = department-DepartmentID;

- We need to use several comparators (one in mostly cases) to make the choosing rules- Natural Join uses equality comparisons between `R` and `S`-

3- Outer Join: Outer Join includes left outer joins, right outer joins, and full outer joins (depending on which table’s rows are retained (left, right, or both))-
4- Self Join: A table join with itself-

Join Algorithms

We can divide join algorithms to three main types: loop join, sort merge join and hash join

1- Nested loop join: The most naive one with a O(n^2) complexity (like a Bruce-force algorithm)-

1
2
3
4
For each tuple r in R do
For each tuple s in S do
If r and s satisfy the join condition
Then output the tuple <r,s>

2- Block-nested loop (BNL) & Hash Join: Load R to memory as a hash table, and scan S

  • Raw hash join
  • Radix join
    • Radix Partition: Translation look-aside buffers (TLBs) => multi-pass radix partitioning join
      Foreach input tuple t do
        k = hash(t)
        p[k][pos[k]] = t
        pos[k] ++
      
  • No-partition hash join: All worker threads populate a shared hash table with all tuples of R-
    3- Sort merge join: Sort R and S by a same order, and then use two-points algorithm to distinct and produce the results-
  • Raw sort merge join
    • Multi-way method
    • Multi-pass method
  • Massive parallel sort merge join (MPSM)

Sort vs- Hash

Distributed Hash Join