[sql] Left Join without duplicate rows from left table

Please look at the following query:

tbl_Contents

Content_Id  Content_Title    Content_Text
10002   New case Study   New case Study
10003   New case Study   New case Study
10004   New case Study   New case Study
10005   New case Study   New case Study
10006   New case Study   New case Study
10007   New case Study   New case Study
10008   New case Study   New case Study
10009   New case Study   New case Study
10010   SEO News Title   SEO News Text
10011   SEO News Title   SEO News Text
10012   Publish Contents SEO News Text

tbl_Media

Media_Id    Media_Title  Content_Id
1000    New case Study   10012
1001    SEO News Title   10010
1002    SEO News Title   10011
1003    Publish Contents 10012

QUERY

SELECT 
C.Content_ID,
C.Content_Title,
M.Media_Id

FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
ORDER BY C.Content_DatePublished ASC

RESULT

10002   New case Study  2014-03-31 13:39:29.280 NULL
10003   New case Study  2014-03-31 14:23:06.727 NULL
10004   New case Study  2014-03-31 14:25:53.143 NULL
10005   New case Study  2014-03-31 14:26:06.993 NULL
10006   New case Study  2014-03-31 14:30:18.153 NULL
10007   New case Study  2014-03-31 14:30:42.513 NULL
10008   New case Study  2014-03-31 14:31:56.830 NULL
10009   New case Study  2014-03-31 14:35:18.040 NULL
10010   SEO News Title  2014-03-31 15:22:15.983 1001
10011   SEO News Title  2014-03-31 15:22:30.333 1002
10012   Publish         2014-03-31 15:25:11.753 1000
10012   Publish         2014-03-31 15:25:11.753 1003

10012 are coming twice...!

My query is returning duplicate rows from tbl_Contents (left table in the join)

Some rows in tbl_Contents has more than 1 associated rows in tbl_Media. I need all rows from tbl_Contents even if there are Null values exists in the tbl_Media BUT NO DUPLICATE RECORDS.

This question is related to sql join duplicates

The answer is


You can do this using generic SQL with group by:

SELECT C.Content_ID, C.Content_Title, MAX(M.Media_Id)
FROM tbl_Contents C LEFT JOIN
     tbl_Media M
     ON M.Content_Id = C.Content_Id 
GROUP BY C.Content_ID, C.Content_Title
ORDER BY MAX(C.Content_DatePublished) ASC;

Or with a correlated subquery:

SELECT C.Content_ID, C.Contt_Title,
       (SELECT M.Media_Id
        FROM tbl_Media M
        WHERE M.Content_Id = C.Content_Id
        ORDER BY M.MEDIA_ID DESC
        LIMIT 1
       ) as Media_Id
FROM tbl_Contents C 
ORDER BY C.Content_DatePublished ASC;

Of course, the syntax for limit 1 varies between databases. Could be top. Or rownum = 1. Or fetch first 1 rows. Or something like that.


Using the DISTINCT flag will remove duplicate rows.

SELECT DISTINCT
C.Content_ID,
C.Content_Title,
M.Media_Id

FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
ORDER BY C.Content_DatePublished ASC

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to join

Pandas Merging 101 pandas: merge (join) two data frames on multiple columns How to use the COLLATE in a JOIN in SQL Server? How to join multiple collections with $lookup in mongodb How to join on multiple columns in Pyspark? Pandas join issue: columns overlap but no suffix specified MySQL select rows where left join is null How to return rows from left table not found in right table? Why do multiple-table joins produce duplicate rows? pandas three-way joining multiple dataframes on columns

Examples related to duplicates

Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C Remove duplicates from a dataframe in PySpark How to "select distinct" across multiple data frame columns in pandas? How to find duplicate records in PostgreSQL Drop all duplicate rows across multiple columns in Python Pandas Left Join without duplicate rows from left table Finding duplicate integers in an array and display how many times they occurred How do I use SELECT GROUP BY in DataTable.Select(Expression)? How to delete duplicate rows in SQL Server? Python copy files to a new directory and rename if file name already exists