[sql] SQL left join vs multiple tables on FROM line?

Most SQL dialects accept both the following queries:

SELECT a.foo, b.foo
FROM a, b
WHERE a.x = b.x

SELECT a.foo, b.foo
FROM a
LEFT JOIN b ON a.x = b.x

Now obviously when you need an outer join, the second syntax is required. But when doing an inner join why should I prefer the second syntax to the first (or vice versa)?

This question is related to sql syntax join

The answer is


The JOIN syntax keeps conditions near the table they apply to. This is especially useful when you join a large amount of tables.

By the way, you can do an outer join with the first syntax too:

WHERE a.x = b.x(+)

Or

WHERE a.x *= b.x

Or

WHERE a.x = b.x or a.x not in (select x from b)

To the database, they end up being the same. For you, though, you'll have to use that second syntax in some situations. For the sake of editing queries that end up having to use it (finding out you needed a left join where you had a straight join), and for consistency, I'd pattern only on the 2nd method. It'll make reading queries easier.


When you need an outer join the second syntax is not always required:

Oracle:

SELECT a.foo, b.foo
  FROM a, b
 WHERE a.x = b.x(+)

MSSQLServer (although it's been deprecated in 2000 version)/Sybase:

SELECT a.foo, b.foo
  FROM a, b
 WHERE a.x *= b.x

But returning to your question. I don't know the answer, but it is probably related to the fact that a join is more natural (syntactically, at least) than adding an expression to a where clause when you are doing exactly that: joining.


The first way is the older standard. The second method was introduced in SQL-92, http://en.wikipedia.org/wiki/SQL. The complete standard can be viewed at http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt .

It took many years before database companies adopted the SQL-92 standard.

So the reason why the second method is preferred, it is the SQL standard according the ANSI and ISO standards committee.


The second is preferred because it is far less likely to result in an accidental cross join by forgetting to put inthe where clause. A join with no on clause will fail the syntax check, an old style join with no where clause will not fail, it will do a cross join.

Additionally when you later have to a left join, it is helpful for maintenance that they all be in the same structure. And the old syntax has been out of date since 1992, it is well past time to stop using it.

Plus I have found that many people who exclusively use the first syntax don't really understand joins and understanding joins is critical to getting correct results when querying.


The SELECT * FROM table1, table2, ... syntax is ok for a couple of tables, but it becomes exponentially (not necessarily a mathematically accurate statement) harder and harder to read as the number of tables increases.

The JOIN syntax is harder to write (at the beginning), but it makes it explicit what criteria affects which tables. This makes it much harder to make a mistake.

Also, if all the joins are INNER, then both versions are equivalent. However, the moment you have an OUTER join anywhere in the statement, things get much more complicated and it's virtually guarantee that what you write won't be querying what you think you wrote.


I hear a lot of people complain the first one is too difficult to understand and that it is unclear. I don't see a problem with it, but after having that discussion, I use the second one even on INNER JOINS for clarity.


I think there are some good reasons on this page to adopt the second method -using explicit JOINs. The clincher though is that when the JOIN criteria are removed from the WHERE clause it becomes much easier to see the remaining selection criteria in the WHERE clause.

In really complex SELECT statements it becomes much easier for a reader to understand what is going on.


Well the first and second queries may yield different results because a LEFT JOIN includes all records from the first table, even if there are no corresponding records in the right table.


Basically, when your FROM clause lists tables like so:

SELECT * FROM
  tableA, tableB, tableC

the result is a cross product of all the rows in tables A, B, C. Then you apply the restriction WHERE tableA.id = tableB.a_id which will throw away a huge number of rows, then further ... AND tableB.id = tableC.b_id and you should then get only those rows you are really interested in.

DBMSs know how to optimise this SQL so that the performance difference to writing this using JOINs is negligible (if any). Using the JOIN notation makes the SQL statement more readable (IMHO, not using joins turns the statement into a mess). Using the cross product, you need to provide join criteria in the WHERE clause, and that's the problem with the notation. You are crowding your WHERE clause with stuff like

    tableA.id = tableB.a_id 
AND tableB.id = tableC.b_id 

which is only used to restrict the cross product. WHERE clause should only contain RESTRICTIONS to the resultset. If you mix table join criteria with resultset restrictions, you (and others) will find your query harder to read. You should definitely use JOINs and keep the FROM clause a FROM clause, and the WHERE clause a WHERE clause.


Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to syntax

What is the 'open' keyword in Swift? Check if returned value is not null and if so assign it, in one line, with one method call Inline for loop What does %>% function mean in R? R - " missing value where TRUE/FALSE needed " Printing variables in Python 3.4 How to replace multiple patterns at once with sed? What's the meaning of "=>" (an arrow formed from equals & greater than) in JavaScript? How can I fix MySQL error #1064? What do >> and << mean in Python?

Examples related to join

Pandas Merging 101 pandas: merge (join) two data frames on multiple columns How to use the COLLATE in a JOIN in SQL Server? How to join multiple collections with $lookup in mongodb How to join on multiple columns in Pyspark? Pandas join issue: columns overlap but no suffix specified MySQL select rows where left join is null How to return rows from left table not found in right table? Why do multiple-table joins produce duplicate rows? pandas three-way joining multiple dataframes on columns