[sql] How to return only 1 row if multiple duplicate rows and still return rows that are not duplicates?

I have a temptable that looks like this:

RequestID   | CreatedDate          | HistoryStatus           
CF-0000001  | 8/26/2009 1:07:01 PM | For Review   
CF-0000001  | 8/26/2009 1:07:01 PM | Completed  
CF-0000112  | 8/26/2009 1:07:01 PM | For Review   
CF-0000113  | 8/26/2009 1:07:01 PM | For Review  
CF-0000114  | 8/26/2009 1:07:01 PM | Completed  
CF-0000115  | 8/26/2009 1:07:01 PM | Completed   

And how I'd like the table to look at the end is like this:

RequestID   | CreatedDate          | HistoryStatus           
CF-0000001  | 8/26/2009 1:07:01 PM | Completed  
CF-0000112  | 8/26/2009 1:07:01 PM | For Review  
CF-0000113  | 8/26/2009 1:07:01 PM | For Review  
CF-0000114  | 8/26/2009 1:07:01 PM | Completed  
CF-0000115  | 8/26/2009 1:07:01 PM | Completed

I.e. the duplicate CF-0000001 should be removed.

How can I return or should i say choose only ONE row if there are multiple duplicate rows and still return rows that are not duplicates?

This question is related to sql

The answer is


From the title I'm guessing you only need one result per unique row? If this is the case, take a look at the GROUP BY clause (or SELECT DISTINCT).


select * from temptable
where rnum --unique key
 in 

( 
 SELECT RNUM --unique key
  FROM temptable
 WHERE (  HistoryStatus
) IN (SELECT                HistoryStatus

                             FROM temptable
                            GROUP BY                
HistoryStatus 
                           HAVING COUNT(*) <= 1));

I have not tested this code. I have used similar code and it works. The syntax is in Oracle.


If this is a SQL question, and I understand what you are asking, (it's not entirely clear), just add distinct to the query

   Select Distinct * From TempTable

Try this if you want to display one of duplicate rows based on RequestID and CreatedDate and show the latest HistoryStatus.

with t as (select row_number()over(partition by RequestID,CreatedDate order by RequestID) as rnum,* from tbltmp)
Select RequestID,CreatedDate,HistoryStatus from t a where  rnum in (SELECT Max(rnum) FROM t GROUP BY RequestID,CreatedDate having t.RequestID=a.RequestID)

or if you want to select one of duplicate rows considering CreatedDate only and show the latest HistoryStatus then try the query below.

with t as (select row_number()over(partition by CreatedDate order by RequestID) as rnum,* from tbltmp)
Select RequestID,CreatedDate,HistoryStatus from t  where  rnum = (SELECT Max(rnum) FROM t)

Or if you want to select one of duplicate rows considering Request ID only and show the latest HistoryStatus then use the query below

with t as (select row_number()over(partition by RequestID order by RequestID) as rnum,* from tbltmp)
Select RequestID,CreatedDate,HistoryStatus from t a where  rnum in (SELECT Max(rnum) FROM t GROUP BY RequestID,CreatedDate having t.RequestID=a.RequestID)

All the above queries I have written in sql server 2005.


To fetch only one distinct record from duplicate column of two rows you can use "rowid" column which is maintained by oracle itself as Primary key,so first try

"select rowid,RequestID,CreatedDate,HistoryStatus  from temptable;"

and then you can fetch second row only by it's value of 'rowid' column by using in SELECT statement.


select t.*
from (
    select RequestID, max(CreatedDate) as MaxCreatedDate
    from table1
    group by RequestID
) tm
inner join table1 t on tm.RequestID = t.RequestID and tm.MaxCreatedDate = t.CreatedDate

If you have a one to many relationship in your query, duplicate rows may occurs on one side.

Suppose the following

TABLE TEAM
ID       TEAM_NAME
0        BULLS
1        LAKERS


TABLE PLAYER
ID       TEAM_ID     PLAYER_NAME
0        0           JORDAN
1        0           PIPPEN

And you execute a query like

SELECT 
    TEAM.TEAM_NAME, 
    PLAYER.PLAYER_NAME 
FROM TEAM
INNER JOIN PLAYER

You will get

TEAM_NAME   PLAYER_NAME
BULLS       JORDAN
BULLS       PIPPEN

So you will have duplicate TEAM NAME. Even using DISTINCT clause, your result set will contain duplicate TEAM NAME

So if you do not want duplicate TEAM_NAME in your query, do the following

SELECT ID, TEAM_NAME FROM TEAM

And for each team ID encountered executes

SELECT PLAYER_NAME FROM PLAYER WHERE TEAM_ID = <PUT_TEAM_ID_RIGHT_HERE>

So this way you will not get duplicates references on one side

regards,


using namespaces and subqueries You can do it:

declare @data table (RequestID varchar(20), CreatedDate datetime, HistoryStatus varchar(20))
insert into @data values ('CF-0000001','8/26/2009 1:07:01 PM','For Review');
insert into @data values ('CF-0000001','8/26/2009 1:07:01 PM','Completed');  
insert into @data values ('CF-0000112','8/26/2009 1:07:01 PM','For Review');   
insert into @data values ('CF-0000113','8/26/2009 1:07:01 PM','For Review');  
insert into @data values ('CF-0000114','8/26/2009 1:07:01 PM','Completed');  
insert into @data values ('CF-0000115','8/26/2009 1:07:01 PM','Completed');

select d1.RequestID,d1.CreatedDate,d1.HistoryStatus 
from @data d1 
where d1.HistoryStatus = 'Completed'
union all 
select d2.RequestID,d2.CreatedDate,d2.HistoryStatus 
from @data d2 
where d2.HistoryStatus = 'For Review' 
    and d2.RequestID not in (
        select RequestID 
        from @data 
        where HistoryStatus = 'Completed' 
            and CreatedDate = d2.CreatedDate
    )

Above query returns

CF-0000001, 2009-08-26 13:07:01.000,    Completed
CF-0000114, 2009-08-26 13:07:01.000,    Completed
CF-0000115, 2009-08-26 13:07:01.000,    Completed
CF-0000112, 2009-08-26 13:07:01.000,    For Review
CF-0000113, 2009-08-26 13:07:01.000,    For Review