[c#] How do you perform a left outer join using linq extension methods

Assuming I have a left outer join as such:

from f in Foo
join b in Bar on f.Foo_Id equals b.Foo_Id into g
from result in g.DefaultIfEmpty()
select new { Foo = f, Bar = result }

How would I express the same task using extension methods? E.g.

Foo.GroupJoin(Bar, f => f.Foo_Id, b => b.Foo_Id, (f,b) => ???)
    .Select(???)

This question is related to c# linq-to-sql lambda

The answer is


Whilst the accepted answer works and is good for Linq to Objects it bugged me that the SQL query isn't just a straight Left Outer Join.

The following code relies on the LinkKit Project that allows you to pass expressions and invoke them to your query.

static IQueryable<TResult> LeftOuterJoin<TSource,TInner, TKey, TResult>(
     this IQueryable<TSource> source, 
     IQueryable<TInner> inner, 
     Expression<Func<TSource,TKey>> sourceKey, 
     Expression<Func<TInner,TKey>> innerKey, 
     Expression<Func<TSource, TInner, TResult>> result
    ) {
    return from a in source.AsExpandable()
            join b in inner on sourceKey.Invoke(a) equals innerKey.Invoke(b) into c
            from d in c.DefaultIfEmpty()
            select result.Invoke(a,d);
}

It can be used as follows

Table1.LeftOuterJoin(Table2, x => x.Key1, x => x.Key2, (x,y) => new { x,y});

Group Join method is unnecessary to achieve joining of two data sets.

Inner Join:

var qry = Foos.SelectMany
            (
                foo => Bars.Where (bar => foo.Foo_id == bar.Foo_id),
                (foo, bar) => new
                    {
                    Foo = foo,
                    Bar = bar
                    }
            );

For Left Join just add DefaultIfEmpty()

var qry = Foos.SelectMany
            (
                foo => Bars.Where (bar => foo.Foo_id == bar.Foo_id).DefaultIfEmpty(),
                (foo, bar) => new
                    {
                    Foo = foo,
                    Bar = bar
                    }
            );

EF and LINQ to SQL correctly transform to SQL. For LINQ to Objects it is beter to join using GroupJoin as it internally uses Lookup. But if you are querying DB then skipping of GroupJoin is AFAIK as performant.

Personlay for me this way is more readable compared to GroupJoin().SelectMany()


You can create extension method like:

public static IEnumerable<TResult> LeftOuterJoin<TSource, TInner, TKey, TResult>(this IEnumerable<TSource> source, IEnumerable<TInner> other, Func<TSource, TKey> func, Func<TInner, TKey> innerkey, Func<TSource, TInner, TResult> res)
    {
        return from f in source
               join b in other on func.Invoke(f) equals innerkey.Invoke(b) into g
               from result in g.DefaultIfEmpty()
               select res.Invoke(f, result);
    }

Improving on Ocelot20's answer, if you have a table you're left outer joining with where you just want 0 or 1 rows out of it, but it could have multiple, you need to Order your joined table:

var qry = Foos.GroupJoin(
      Bars.OrderByDescending(b => b.Id),
      foo => foo.Foo_Id,
      bar => bar.Foo_Id,
      (f, bs) => new { Foo = f, Bar = bs.FirstOrDefault() });

Otherwise which row you get in the join is going to be random (or more specifically, whichever the db happens to find first).


Since this seems to be the de facto SO question for left outer joins using the method (extension) syntax, I thought I would add an alternative to the currently selected answer that (in my experience at least) has been more commonly what I'm after

// Option 1: Expecting either 0 or 1 matches from the "Right"
// table (Bars in this case):
var qry = Foos.GroupJoin(
          Bars,
          foo => foo.Foo_Id,
          bar => bar.Foo_Id,
          (f,bs) => new { Foo = f, Bar = bs.SingleOrDefault() });

// Option 2: Expecting either 0 or more matches from the "Right" table
// (courtesy of currently selected answer):
var qry = Foos.GroupJoin(
                  Bars, 
                  foo => foo.Foo_Id,
                  bar => bar.Foo_Id,
                  (f,bs) => new { Foo = f, Bars = bs })
              .SelectMany(
                  fooBars => fooBars.Bars.DefaultIfEmpty(),
                  (x,y) => new { Foo = x.Foo, Bar = y });

To display the difference using a simple data set (assuming we're joining on the values themselves):

List<int> tableA = new List<int> { 1, 2, 3 };
List<int?> tableB = new List<int?> { 3, 4, 5 };

// Result using both Option 1 and 2. Option 1 would be a better choice
// if we didn't expect multiple matches in tableB.
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3    }

List<int> tableA = new List<int> { 1, 2, 3 };
List<int?> tableB = new List<int?> { 3, 3, 4 };

// Result using Option 1 would be that an exception gets thrown on
// SingleOrDefault(), but if we use FirstOrDefault() instead to illustrate:
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3    } // Misleading, we had multiple matches.
                    // Which 3 should get selected (not arbitrarily the first)?.

// Result using Option 2:
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3    }
{ A = 3, B = 3    }    

Option 2 is true to the typical left outer join definition, but as I mentioned earlier is often unnecessarily complex depending on the data set.


Turning Marc Gravell's answer into an extension method, I made the following.

internal static IEnumerable<Tuple<TLeft, TRight>> LeftJoin<TLeft, TRight, TKey>(
    this IEnumerable<TLeft> left,
    IEnumerable<TRight> right,
    Func<TLeft, TKey> selectKeyLeft,
    Func<TRight, TKey> selectKeyRight,
    TRight defaultRight = default(TRight),
    IEqualityComparer<TKey> cmp = null)
{
    return left.GroupJoin(
            right,
            selectKeyLeft,
            selectKeyRight,
            (x, y) => new Tuple<TLeft, IEnumerable<TRight>>(x, y),
            cmp ?? EqualityComparer<TKey>.Default)
        .SelectMany(
            x => x.Item2.DefaultIfEmpty(defaultRight),
            (x, y) => new Tuple<TLeft, TRight>(x.Item1, y));
}

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to linq-to-sql

Understanding SQL Server LOCKS on SELECT queries how to update the multiple rows at a time using linq to sql? LINQ: combining join and group by LINQ to SQL: Multiple joins ON multiple Columns. Is this possible? How to Select Min and Max date values in Linq Query How to do a LIKE query with linq? How to do a join in linq to sql with method syntax? How to store a list in a column of a database table Returning IEnumerable<T> vs. IQueryable<T> Entity Framework VS LINQ to SQL VS ADO.NET with stored procedures?

Examples related to lambda

Java 8 optional: ifPresent return object orElseThrow exception How to properly apply a lambda function into a pandas data frame column What are functional interfaces used for in Java 8? Java 8 lambda get and remove element from list Variable used in lambda expression should be final or effectively final Filter values only if not null using lambda in Java8 forEach loop Java 8 for Map entry set Java 8 Lambda Stream forEach with multiple statements Java 8 stream map to list of keys sorted by values Task.Run with Parameter(s)?