[mysql] SQL query return data from multiple tables

I would like to know the following:

  • how to get data from multiple tables in my database?
  • what types of methods are there to do this?
  • what are joins and unions and how are they different from one another?
  • When should I use each one compared to the others?

I am planning to use this in my (for example - PHP) application, but don't want to run multiple queries against the database, what options do I have to get data from multiple tables in a single query?

Note: I am writing this as I would like to be able to link to a well written guide on the numerous questions that I constantly come across in the PHP queue, so I can link to this for further detail when I post an answer.

The answers cover off the following:

  1. Part 1 - Joins and Unions
  2. Part 2 - Subqueries
  3. Part 3 - Tricks and Efficient Code
  4. Part 4 - Subqueries in the From Clause
  5. Part 5 - Mixed Bag of John's Tricks

This question is related to mysql sql select

The answer is


Ok, I found this post very interesting and I would like to share some of my knowledge on creating a query. Thanks for this Fluffeh. Others who may read this and may feel that I'm wrong are 101% free to edit and criticise my answer. (Honestly, I feel very thankful for correcting my mistake(s).)

I'll be posting some of the frequently asked questions in MySQL tag.


Trick No. 1 (rows that matches to multiple conditions)

Given this schema

CREATE TABLE MovieList
(
    ID INT,
    MovieName VARCHAR(25),
    CONSTRAINT ml_pk PRIMARY KEY (ID),
    CONSTRAINT ml_uq UNIQUE (MovieName)
);

INSERT INTO MovieList VALUES (1, 'American Pie');
INSERT INTO MovieList VALUES (2, 'The Notebook');
INSERT INTO MovieList VALUES (3, 'Discovery Channel: Africa');
INSERT INTO MovieList VALUES (4, 'Mr. Bean');
INSERT INTO MovieList VALUES (5, 'Expendables 2');

CREATE TABLE CategoryList
(
    MovieID INT,
    CategoryName VARCHAR(25),
    CONSTRAINT cl_uq UNIQUE(MovieID, CategoryName),
    CONSTRAINT cl_fk FOREIGN KEY (MovieID) REFERENCES MovieList(ID)
);

INSERT INTO CategoryList VALUES (1, 'Comedy');
INSERT INTO CategoryList VALUES (1, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Drama');
INSERT INTO CategoryList VALUES (3, 'Documentary');
INSERT INTO CategoryList VALUES (4, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Action');

QUESTION

Find all movies that belong to at least both Comedy and Romance categories.

Solution

This question can be very tricky sometimes. It may seem that a query like this will be the answer:-

SELECT  DISTINCT a.MovieName
FROM    MovieList a
        INNER JOIN CategoryList b
            ON a.ID = b.MovieID
WHERE   b.CategoryName = 'Comedy' AND
        b.CategoryName = 'Romance'

SQLFiddle Demo

which is definitely very wrong because it produces no result. The explanation of this is that there is only one valid value of CategoryName on each row. For instance, the first condition returns true, the second condition is always false. Thus, by using AND operator, both condition should be true; otherwise, it will be false. Another query is like this,

SELECT  DISTINCT a.MovieName
FROM    MovieList a
        INNER JOIN CategoryList b
            ON a.ID = b.MovieID
WHERE   b.CategoryName IN ('Comedy','Romance')

SQLFiddle Demo

and the result is still incorrect because it matches to record that has at least one match on the categoryName. The real solution would be by counting the number of record instances per movie. The number of instance should match to the total number of the values supplied in the condition.

SELECT  a.MovieName
FROM    MovieList a
        INNER JOIN CategoryList b
            ON a.ID = b.MovieID
WHERE   b.CategoryName IN ('Comedy','Romance')
GROUP BY a.MovieName
HAVING COUNT(*) = 2

SQLFiddle Demo (the answer)


Trick No. 2 (maximum record for each entry)

Given schema,

CREATE TABLE Software
(
    ID INT,
    SoftwareName VARCHAR(25),
    Descriptions VARCHAR(150),
    CONSTRAINT sw_pk PRIMARY KEY (ID),
    CONSTRAINT sw_uq UNIQUE (SoftwareName)  
);

INSERT INTO Software VALUES (1,'PaintMe','used for photo editing');
INSERT INTO Software VALUES (2,'World Map','contains map of different places of the world');
INSERT INTO Software VALUES (3,'Dictionary','contains description, synonym, antonym of the words');

CREATE TABLE VersionList
(
    SoftwareID INT,
    VersionNo INT,
    DateReleased DATE,
    CONSTRAINT sw_uq UNIQUE (SoftwareID, VersionNo),
    CONSTRAINT sw_fk FOREIGN KEY (SOftwareID) REFERENCES Software(ID)
);

INSERT INTO VersionList VALUES (3, 2, '2009-12-01');
INSERT INTO VersionList VALUES (3, 1, '2009-11-01');
INSERT INTO VersionList VALUES (3, 3, '2010-01-01');
INSERT INTO VersionList VALUES (2, 2, '2010-12-01');
INSERT INTO VersionList VALUES (2, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 3, '2011-12-01');
INSERT INTO VersionList VALUES (1, 2, '2010-12-01');
INSERT INTO VersionList VALUES (1, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 4, '2012-12-01');

QUESTION

Find the latest version on each software. Display the following columns: SoftwareName,Descriptions,LatestVersion (from VersionNo column),DateReleased

Solution

Some SQL developers mistakenly use MAX() aggregate function. They tend to create like this,

SELECT  a.SoftwareName, a.Descriptions,
        MAX(b.VersionNo) AS LatestVersion, b.DateReleased
FROM    Software a
        INNER JOIN VersionList b
            ON a.ID = b.SoftwareID
GROUP BY a.ID
ORDER BY a.ID

SQLFiddle Demo

(most RDBMS generates a syntax error on this because of not specifying some of the non-aggregated columns on the group by clause) the result produces the correct LatestVersion on each software but obviously the DateReleased are incorrect. MySQL doesn't support Window Functions and Common Table Expression yet as some RDBMS do already. The workaround on this problem is to create a subquery which gets the individual maximum versionNo on each software and later on be joined on the other tables.

SELECT  a.SoftwareName, a.Descriptions,
        b.LatestVersion, c.DateReleased
FROM    Software a
        INNER JOIN
        (
            SELECT  SoftwareID, MAX(VersionNO) LatestVersion
            FROM    VersionList
            GROUP BY SoftwareID
        ) b ON a.ID = b.SoftwareID
        INNER JOIN VersionList c
            ON  c.SoftwareID = b.SoftwareID AND
                c.VersionNO = b.LatestVersion
GROUP BY a.ID
ORDER BY a.ID

SQLFiddle Demo (the answer)


So that was it. I'll be posting another soon as I recall any other FAQ on MySQL tag. Thank you for reading this little article. I hope that you have atleast get even a little knowledge from this.

UPDATE 1


Trick No. 3 (Finding the latest record between two IDs)

Given Schema

CREATE TABLE userList
(
    ID INT,
    NAME VARCHAR(20),
    CONSTRAINT us_pk PRIMARY KEY (ID),
    CONSTRAINT us_uq UNIQUE (NAME)  
);

INSERT INTO userList VALUES (1, 'Fluffeh');
INSERT INTO userList VALUES (2, 'John Woo');
INSERT INTO userList VALUES (3, 'hims056');

CREATE TABLE CONVERSATION
(
    ID INT,
    FROM_ID INT,
    TO_ID INT,
    MESSAGE VARCHAR(250),
    DeliveryDate DATE
);

INSERT INTO CONVERSATION VALUES (1, 1, 2, 'hi john', '2012-01-01');
INSERT INTO CONVERSATION VALUES (2, 2, 1, 'hello fluff', '2012-01-02');
INSERT INTO CONVERSATION VALUES (3, 1, 3, 'hey hims', '2012-01-03');
INSERT INTO CONVERSATION VALUES (4, 1, 3, 'please reply', '2012-01-04');
INSERT INTO CONVERSATION VALUES (5, 3, 1, 'how are you?', '2012-01-05');
INSERT INTO CONVERSATION VALUES (6, 3, 2, 'sample message!', '2012-01-05');

QUESTION

Find the latest conversation between two users.

Solution

SELECT    b.Name SenderName,
          c.Name RecipientName,
          a.Message,
          a.DeliveryDate
FROM      Conversation a
          INNER JOIN userList b
            ON a.From_ID = b.ID
          INNER JOIN userList c
            ON a.To_ID = c.ID
WHERE     (LEAST(a.FROM_ID, a.TO_ID), GREATEST(a.FROM_ID, a.TO_ID), DeliveryDate)
IN
(
    SELECT  LEAST(FROM_ID, TO_ID) minFROM,
            GREATEST(FROM_ID, TO_ID) maxTo,
            MAX(DeliveryDate) maxDate
    FROM    Conversation
    GROUP BY minFROM, maxTo
)

SQLFiddle Demo


Part 3 - Tricks and Efficient Code

MySQL in() efficiency

I thought I would add some extra bits, for tips and tricks that have come up.

One question I see come up a fair bit, is How do I get non-matching rows from two tables and I see the answer most commonly accepted as something like the following (based on our cars and brands table - which has Holden listed as a brand, but does not appear in the cars table):

select
    a.ID,
    a.brand
from
    brands a
where
    a.ID not in(select brand from cars)

And yes it will work.

+----+--------+
| ID | brand  |
+----+--------+
|  6 | Holden |
+----+--------+
1 row in set (0.00 sec)

However it is not efficient in some database. Here is a link to a Stack Overflow question asking about it, and here is an excellent in depth article if you want to get into the nitty gritty.

The short answer is, if the optimiser doesn't handle it efficiently, it may be much better to use a query like the following to get non matched rows:

select
    a.brand
from
    brands a
        left join cars b
            on a.id=b.brand
where
    b.brand is null

+--------+
| brand  |
+--------+
| Holden |
+--------+
1 row in set (0.00 sec)

Update Table with same table in subquery

Ahhh, another oldie but goodie - the old You can't specify target table 'brands' for update in FROM clause.

MySQL will not allow you to run an update... query with a subselect on the same table. Now, you might be thinking, why not just slap it into the where clause right? But what if you want to update only the row with the max() date amoung a bunch of other rows? You can't exactly do that in a where clause.

update 
    brands 
set 
    brand='Holden' 
where 
    id=
        (select 
            id 
        from 
            brands 
        where 
            id=6);
ERROR 1093 (HY000): You can't specify target table 'brands' 
for update in FROM clause

So, we can't do that eh? Well, not exactly. There is a sneaky workaround that a surprisingly large number of users don't know about - though it does include some hackery that you will need to pay attention to.

You can stick the subquery within another subquery, which puts enough of a gap between the two queries so that it will work. However, note that it might be safest to stick the query within a transaction - this will prevent any other changes being made to the tables while the query is running.

update 
    brands 
set 
    brand='Holden' 
where id=
    (select 
        id 
    from 
        (select 
            id 
        from 
            brands 
        where 
            id=6
        ) 
    as updateTable);

Query OK, 0 rows affected (0.02 sec)
Rows matched: 1  Changed: 0  Warnings: 0

Part 2 - Subqueries

Okay, now the boss has burst in again - I want a list of all of our cars with the brand and a total of how many of that brand we have!

This is a great opportunity to use the next trick in our bag of SQL goodies - the subquery. If you are unfamiliar with the term, a subquery is a query that runs inside another query. There are many different ways to use them.

For our request, lets first put a simple query together that will list each car and the brand:

select
    a.ID,
    b.brand
from
    cars a
        join brands b
            on a.brand=b.ID

Now, if we wanted to simply get a count of cars sorted by brand, we could of course write this:

select
    b.brand,
    count(a.ID) as countCars
from
    cars a
        join brands b
            on a.brand=b.ID
group by
    b.brand

+--------+-----------+
| brand  | countCars |
+--------+-----------+
| BMW    |         2 |
| Ford   |         2 |
| Nissan |         1 |
| Smart  |         1 |
| Toyota |         5 |
+--------+-----------+

So, we should be able to simply add in the count function to our original query right?

select
    a.ID,
    b.brand,
    count(a.ID) as countCars
from
    cars a
        join brands b
            on a.brand=b.ID
group by
    a.ID,
    b.brand

+----+--------+-----------+
| ID | brand  | countCars |
+----+--------+-----------+
|  1 | Toyota |         1 |
|  2 | Ford   |         1 |
|  3 | Nissan |         1 |
|  4 | Smart  |         1 |
|  5 | Toyota |         1 |
|  6 | BMW    |         1 |
|  7 | Ford   |         1 |
|  8 | Toyota |         1 |
|  9 | Toyota |         1 |
| 10 | BMW    |         1 |
| 11 | Toyota |         1 |
+----+--------+-----------+
11 rows in set (0.00 sec)

Sadly, no, we can't do that. The reason is that when we add in the car ID (column a.ID) we have to add it into the group by - so now, when the count function works, there is only one ID matched per ID.

This is where we can however use a subquery - in fact we can do two completely different types of subquery that will return the same results that we need for this. The first is to simply put the subquery in the select clause. This means each time we get a row of data, the subquery will run off, get a column of data and then pop it into our row of data.

select
    a.ID,
    b.brand,
    (
    select
        count(c.ID)
    from
        cars c
    where
        a.brand=c.brand
    ) as countCars
from
    cars a
        join brands b
            on a.brand=b.ID

+----+--------+-----------+
| ID | brand  | countCars |
+----+--------+-----------+
|  2 | Ford   |         2 |
|  7 | Ford   |         2 |
|  1 | Toyota |         5 |
|  5 | Toyota |         5 |
|  8 | Toyota |         5 |
|  9 | Toyota |         5 |
| 11 | Toyota |         5 |
|  3 | Nissan |         1 |
|  4 | Smart  |         1 |
|  6 | BMW    |         2 |
| 10 | BMW    |         2 |
+----+--------+-----------+
11 rows in set (0.00 sec)

And Bam!, this would do us. If you noticed though, this sub query will have to run for each and every single row of data we return. Even in this little example, we only have five different Brands of car, but the subquery ran eleven times as we have eleven rows of data that we are returning. So, in this case, it doesn't seem like the most efficient way to write code.

For a different approach, lets run a subquery and pretend it is a table:

select
    a.ID,
    b.brand,
    d.countCars
from
    cars a
        join brands b
            on a.brand=b.ID
        join
            (
            select
                c.brand,
                count(c.ID) as countCars
            from
                cars c
            group by
                c.brand
            ) d
            on a.brand=d.brand

+----+--------+-----------+
| ID | brand  | countCars |
+----+--------+-----------+
|  1 | Toyota |         5 |
|  2 | Ford   |         2 |
|  3 | Nissan |         1 |
|  4 | Smart  |         1 |
|  5 | Toyota |         5 |
|  6 | BMW    |         2 |
|  7 | Ford   |         2 |
|  8 | Toyota |         5 |
|  9 | Toyota |         5 |
| 10 | BMW    |         2 |
| 11 | Toyota |         5 |
+----+--------+-----------+
11 rows in set (0.00 sec)

Okay, so we have the same results (ordered slightly different - it seems the database wanted to return results ordered by the first column we picked this time) - but the same right numbers.

So, what's the difference between the two - and when should we use each type of subquery? First, lets make sure we understand how that second query works. We selected two tables in the from clause of our query, and then wrote a query and told the database that it was in fact a table instead - which the database is perfectly happy with. There can be some benefits to using this method (as well as some limitations). Foremost is that this subquery ran once. If our database contained a large volume of data, there could well be a massive improvement over the first method. However, as we are using this as a table, we have to bring in extra rows of data - so that they can actually be joined back to our rows of data. We also have to be sure that there are enough rows of data if we are going to use a simple join like in the query above. If you recall, the join will only pull back rows that have matching data on both sides of the join. If we aren't careful, this could result in valid data not being returned from our cars table if there wasn't a matching row in this subquery.

Now, looking back at the first subquery, there are some limitations as well. because we are pulling data back into a single row, we can ONLY pull back one row of data. Subqueries used in the select clause of a query very often use only an aggregate function such as sum, count, max or another similar aggregate function. They don't have to, but that is often how they are written.

So, before we move on, lets have a quick look at where else we can use a subquery. We can use it in the where clause - now, this example is a little contrived as in our database, there are better ways of getting the following data, but seeing as it is only for an example, lets have a look:

select
    ID,
    brand
from
    brands
where
    brand like '%o%'

+----+--------+
| ID | brand  |
+----+--------+
|  1 | Ford   |
|  2 | Toyota |
|  6 | Holden |
+----+--------+
3 rows in set (0.00 sec)

This returns us a list of brand IDs and Brand names (the second column is only added to show us the brands) that contain the letter o in the name.

Now, we could use the results of this query in a where clause this:

select
    a.ID,
    b.brand
from
    cars a
        join brands b
            on a.brand=b.ID
where
    a.brand in
        (
        select
            ID
        from
            brands
        where
            brand like '%o%'
        )

+----+--------+
| ID | brand  |
+----+--------+
|  2 | Ford   |
|  7 | Ford   |
|  1 | Toyota |
|  5 | Toyota |
|  8 | Toyota |
|  9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)

As you can see, even though the subquery was returning the three brand IDs, our cars table only had entries for two of them.

In this case, for further detail, the subquery is working as if we wrote the following code:

select
    a.ID,
    b.brand
from
    cars a
        join brands b
            on a.brand=b.ID
where
    a.brand in (1,2,6)

+----+--------+
| ID | brand  |
+----+--------+
|  1 | Toyota |
|  2 | Ford   |
|  5 | Toyota |
|  7 | Ford   |
|  8 | Toyota |
|  9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)

Again, you can see how a subquery vs manual inputs has changed the order of the rows when returning from the database.

While we are discussing subqueries, lets see what else we can do with a subquery:

  • You can place a subquery within another subquery, and so on and so on. There is a limit which depends on your database, but short of recursive functions of some insane and maniacal programmer, most folks will never hit that limit.
  • You can place a number of subqueries into a single query, a few in the select clause, some in the from clause and a couple more in the where clause - just remember that each one you put in is making your query more complex and likely to take longer to execute.

If you need to write some efficient code, it can be beneficial to write the query a number of ways and see (either by timing it or by using an explain plan) which is the optimal query to get your results. The first way that works may not always be the best way of doing it.


Hopes this makes it find the tables as you're reading through the thing:

jsfiddle

mysql> show columns from colors;                                                         
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+           
| id    | int(3)      | NO   | PRI | NULL    | auto_increment |
| color | varchar(15) | YES  |     | NULL    |                |
| paint | varchar(10) | YES  |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+

You can use the concept of multiple queries in the FROM keyword. Let me show you one example:

SELECT DISTINCT e.id,e.name,d.name,lap.lappy LAPTOP_MAKE,c_loc.cnty COUNTY    
FROM  (
          SELECT c.id cnty,l.name
          FROM   county c, location l
          WHERE  c.id=l.county_id AND l.end_Date IS NOT NULL
      ) c_loc, emp e 
      INNER JOIN dept d ON e.deptno =d.id
      LEFT JOIN 
      ( 
         SELECT l.id lappy, c.name cmpy
         FROM   laptop l, company c
         WHERE l.make = c.name
      ) lap ON e.cmpy_id=lap.cmpy

You can use as many tables as you want to. Use outer joins and union where ever it's necessary, even inside table subqueries.

That's a very easy method to involve as many as tables and fields.


Examples related to mysql

Implement specialization in ER diagram How to post query parameters with Axios? PHP with MySQL 8.0+ error: The server requested authentication method unknown to the client Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' is not supported How to resolve Unable to load authentication plugin 'caching_sha2_password' issue Connection Java-MySql : Public Key Retrieval is not allowed How to grant all privileges to root user in MySQL 8.0 MySQL 8.0 - Client does not support authentication protocol requested by server; consider upgrading MySQL client

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to select

Warning: Use the 'defaultValue' or 'value' props on <select> instead of setting 'selected' on <option> SQL query to check if a name begins and ends with a vowel Angular2 *ngFor in select list, set active based on string from object SQL: Two select statements in one query How to get selected value of a dropdown menu in ReactJS DATEDIFF function in Oracle How to filter an array of objects based on values in an inner array with jq? Select unique values with 'select' function in 'dplyr' library how to set select element as readonly ('disabled' doesnt pass select value on server) Trying to use INNER JOIN and GROUP BY SQL with SUM Function, Not Working