SQL Left Join first match only

Question

I have a query against a large number of big tables  rows and columns  with a number of joins  however one of tables has some duplicate rows of data causing issues for my query  Since this is a read only realtime feed from another department I can t fix that data  however I am trying to prevent issues in my query from it   Given that  I need to add this crap data as a left join to my good query  The data set looks like   IDNo    FirstName   LastName        ------------------------------------------- uqx     bob     smith abc     john        willis ABC     john        willis aBc     john        willis WTF     jeff        bridges sss     bill        doe ere     sally       abby wtf     jeff        bridges        about 2 dozen columns  and 100K rows   My first instinct was to perform a distinct gave me about 80K rows   SELECT DISTINCT P IDNo FROM people P   But when I try the following  I get all the rows back   SELECT DISTINCT P   FROM people P   OR  SELECT      DISTINCT P IDNo  AS IDNoUnq       P FirstName      P LastName        etc      FROM people P   I then thought I would do a FIRST   aggregate function on all the columns  however that feels wrong too  Syntactically am I doing something wrong here   Update  Just wanted to note  These records are duplicates based on a non-key   non-indexed field of ID listed above  The ID is a text field which although has the same value  it is a different case than the other data causing the issue

User · Answer

distinct is not a function  It always operates on all columns of the select list  Your problem is a typical  quot greatest N per group quot  problem which can easily be solved using a window function  select     from     select IDNo           FirstName           LastName                          row number   over  partition by lower idno  order by firstname  as rn    from people    t where rn   1   Using the order by clause you can select which of the duplicates you want to pick  The above can be used in a left join  see below  select     from x   left join       select IDNo             FirstName             LastName                              row number   over  partition by lower idno  order by firstname  as rn      from people      p on p idno   x idno and p rn   1 where

User · Answer

Try this   SELECT    FROM people P   where P IDNo in  SELECT DISTINCT IDNo               FROM people

User · Answer

Add an identity column  PeopleID  and then use a correlated subquery to return the first value for each value   SELECT   FROM People p WHERE PeopleID         SELECT MIN PeopleID       FROM People      WHERE IDNo   p IDNo

User · Answer

After careful consideration this dillema has a few different solutions   Aggregate Everything Use an aggregate on each column to get the biggest or smallest field value  This is what I am doing since it takes 2 partially filled out records and  merges  the data   http   sqlfiddle com   3 59cde 1  SELECT   UPPER IDNo  AS user id   MAX FirstName  AS name first   MAX LastName  AS name last   MAX entry  AS row num FROM people P GROUP BY    IDNo   Get First  or Last record   http   sqlfiddle com   3 59cde 23  -- ------------------------------------------------------ -- Notes -- entry  Auto-Number primary key some sort of unique PK is required for this method -- IDNo   Should be primary key in feed  but is not  we are making an upper case version -- This gets the first entry to get last entry  change MIN   to MAX   -- ------------------------------------------------------  SELECT     PC user id    PData FirstName    PData LastName    PData entry FROM     SELECT        P2 user id       MIN P2 entry  AS rownum   FROM       SELECT         UPPER P IDNo  AS user id          P entry      FROM people P     AS P2   GROUP BY      P2 user id   AS PC LEFT JOIN people PData ON PData entry   PC rownum ORDER BY     PData entry

User · Answer

Depending on the nature of the duplicate rows  it looks like all you want is to have case-sensitivity on those columns  Setting the collation on these columns should be what you re after   SELECT DISTINCT p IDNO COLLATE SQL Latin1 General CP1 CI AS  p FirstName COLLATE SQL Latin1 General CP1 CI AS  p LastName COLLATE SQL Latin1 General CP1 CI AS FROM people P   http   msdn microsoft com en-us library ms184391 aspx

User · Answer

Turns out I was doing it wrong  I needed to perform a nested select first of just the important columns  and do a distinct select off that to prevent trash columns of  unique  data from corrupting my good data  The following appears to have resolved the issue    but I will try on the full dataset later   SELECT DISTINCT P2   FROM     SELECT       IDNo       FirstName       LastName   FROM people P   P2   Here is some play data as requested  http   sqlfiddle com   3 050e0d 3  CREATE TABLE people           entry  int         IDNo  varchar 3          FirstName  varchar 5          LastName  varchar 7      INSERT INTO people      entry  IDNo    FirstName    LastName   VALUES      1  uqx    bob    smith         2  abc    john    willis         3  ABC    john    willis         4  aBc    john    willis         5  WTF    jeff    bridges         6  Sss    bill    doe         7  sSs    bill    doe         8  ssS    bill    doe         9  ere    sally    abby         10  wtf    jeff    bridges

[sql] SQL Left Join first match only

Examples related to sql

Examples related to sql-server

Examples related to tsql

Examples related to join

Examples related to greatest-n-per-group