[sql] Difference between a theta join, equijoin and natural join

I'm having trouble understanding relational algebra when it comes to theta joins, equijoins and natural joins. Could someone please help me better understand it? If I use the = sign on a theta join is it exactly the same as just using a natural join?

This question is related to sql database relational-database relational-algebra

The answer is


@outis's answer is good: concise and correct as regards relations.

However, the situation is slightly more complicated as regards SQL.

Consider the usual suppliers and parts database but implemented in SQL:

SELECT * FROM S NATURAL JOIN SP;

would return a resultset** with columns

SNO, SNAME, STATUS, CITY, PNO, QTY

The join is performed on the column with the same name in both tables, SNO. Note that the resultset has six columns and only contains one column for SNO.

Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables S and SP are required):

SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;

The resultset will have seven columns, including two columns for SNO. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:

SNO, SNAME, STATUS, CITY, SNO, PNO, QTY

or perhaps this

S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY

In other words, NATURAL JOIN in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to change SELECT to SELECT DISTINCT yourself).


** I don't quite know what the result of SELECT * FROM table_expression; is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.


Theta Join: When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.

Equi Join: When you make a query for join using equality operator only, then that join query comes under Equi join.

Example:

> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID;
> SELECT * FROM Emp INNER JOIN Dept USING(DeptID)
This will show:
 _________________________________________________
| Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID |
|          |            |           |             |

Note: Equi join is also a theta join!

Natural Join: a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.

Note: here, the join result has only one column for each pair of same named columns.

Example

 SELECT * FROM Emp NATURAL JOIN Dept
This will show:
 _______________________________
| DeptID | Emp.Name | Dept.Name |
|        |          |           |

A theta join allows for arbitrary comparison relationships (such as ≥).

An equijoin is a theta join using the equality operator.

A natural join is an equijoin on attributes that have the same name in each relationship.

Additionally, a natural join removes the duplicate columns involved in the equality comparison so only 1 of each compared column remains; in rough relational algebraic terms: ? = pR,S-as ? ?aR=aS


While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.

The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);

This isn't very useful in practice. Consider this example.

Product(PName, Price)
====================
Laptop,   1500
Car,      20000
Airplane, 3000000


Component(PName, CName, Cost)
=============================
Laptop, CPU,    500
Laptop, hdd,    300
Laptop, case,   700
Car,    wheels, 1000

The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.

|    PNAME |   PRICE |  CNAME | COST |
--------------------------------------
|   Laptop |    1500 |    CPU |  500 |
|   Laptop |    1500 |    hdd |  300 |
|   Laptop |    1500 |   case |  700 |
|   Laptop |    1500 | wheels | 1000 |
|      Car |   20000 |    CPU |  500 |
|      Car |   20000 |    hdd |  300 |
|      Car |   20000 |   case |  700 |
|      Car |   20000 | wheels | 1000 |
| Airplane | 3000000 |    CPU |  500 |
| Airplane | 3000000 |    hdd |  300 |
| Airplane | 3000000 |   case |  700 |
| Airplane | 3000000 | wheels | 1000 |

JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:

Product JOIN Component ON Pname

The associated SQL query would be like this (you can play with all the examples here)

SELECT *
FROM Product
JOIN Component
  ON Product.Pname = Component.Pname

and the result:

|  PNAME | PRICE |  CNAME | COST |
----------------------------------
| Laptop |  1500 |    CPU |  500 |
| Laptop |  1500 |    hdd |  300 |
| Laptop |  1500 |   case |  700 |
|    Car | 20000 | wheels | 1000 |

Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.

Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:

Natural Join = the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.

Theta Join = this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:

Equi Join = the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.

Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.


Cartesian product of two tables gives all the possible combinations of tuples like the example in mathematics the cross product of two sets . since many a times there are some junk values which occupy unnecessary space in the memory too so here joins comes to rescue which give the combination of only those attribute values which are required and are meaningful.

inner join gives the repeated field in the table twice whereas natural join here solves the problem by just filtering the repeated columns and displaying it only once.else, both works the same. natural join is more efficient since it preserves the memory .Also , redundancies are removed in natural join .

equi join of two tables are such that they display only those tuples which matches the value in other table . for example : let new1 and new2 be two tables . if sql query select * from new1 join new2 on new1.id = new.id (id is the same column in two tables) then start from new2 table and join which matches the id in second table . besides , non equi join do not have equality operator they have <,>,and between operator .

theta join consists of all the comparison operator including equality and others < , > comparison operator. when it uses equality(=) operator it is known as equi join .


Natural Join: Natural join can be possible when there is at least one common attribute in two relations.

Theta Join: Theta join can be possible when two act on particular condition.

Equi Join: Equi can be possible when two act on equity condition. It is one type of theta join.


Natural is a subset of Equi which is a subset of Theta.

If I use the = sign on a theta join is it exactly the same as just using a natural join???

Not necessarily, but it would be an Equi. Natural means you are matching on all similarly named columns, Equi just means you are using '=' exclusively (and not 'less than', like, etc)

This is pure academia though, you could work with relational databases for years and never hear anyone use these terms.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to database

Implement specialization in ER diagram phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' cannot be loaded Room - Schema export directory is not provided to the annotation processor so we cannot export the schema SQL Query Where Date = Today Minus 7 Days MySQL Error: : 'Access denied for user 'root'@'localhost' SQL Server date format yyyymmdd How to create a foreign key in phpmyadmin WooCommerce: Finding the products in database TypeError: tuple indices must be integers, not str

Examples related to relational-database

Laravel - Eloquent "Has", "With", "WhereHas" - What do they mean? What is the difference between a candidate key and a primary key? Does the join order matter in SQL? Difference between 3NF and BCNF in simple terms (must be able to explain to an 8-year old) How to perform a LEFT JOIN in SQL Server between two SELECT statements? Difference between a theta join, equijoin and natural join Foreign Key to multiple tables What is the difference between a Relational and Non-Relational Database? Difference between one-to-many and many-to-one relationship NoSql vs Relational database

Examples related to relational-algebra

Difference between a theta join, equijoin and natural join What is the difference between Select and Project Operations What are projection and selection?