A tour of Join
Join is a popular operation among Database Systems- A standard join concept is to combine multiple columns from one or more tables in relational database- ANSI-standard SQL specifies five types of JOIN: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS- The join operation will produce a temporary table which containing the results meeting the conditions- In the followings, we will first introduce different join operation types- We assume that have two tables named R
and S
-
1- CROSS JOIN: CROSS JOIN can be seen as the Cartesian products of two tables- The sum of items is |R| * |S|
-
2- Equi-join & Natural Join: Natural Join is a special case of Equi-join- We can write as this via SQL sentences:
- We need to use several comparators (one in mostly cases) to make the choosing rules- Natural Join uses equality comparisons between `R` and `S`-
3- Outer Join: Outer Join includes left outer joins, right outer joins, and full outer joins (depending on which table’s rows are retained (left, right, or both))-
4- Self Join: A table join with itself-
Join Algorithms
We can divide join algorithms to three main types: loop join, sort merge join and hash join
1- Nested loop join: The most naive one with a O(n^2)
complexity (like a Bruce-force algorithm)-
2- Block-nested loop (BNL) & Hash Join: Load R
to memory as a hash table, and scan S
- Raw hash join
- Radix join
- Radix Partition: Translation look-aside buffers (TLBs) => multi-pass radix partitioning join
Foreach input tuple t do k = hash(t) p[k][pos[k]] = t pos[k] ++
- Radix Partition: Translation look-aside buffers (TLBs) => multi-pass radix partitioning join
- No-partition hash join: All worker threads populate a shared hash table with all tuples of R-
3- Sort merge join: SortR
andS
by a same order, and then use two-points algorithm to distinct and produce the results- - Raw sort merge join
- Multi-way method
- Multi-pass method
- Massive parallel sort merge join (MPSM)