[c#] LINQ - Full Outer Join

I have a list of people's ID and their first name, and a list of people's ID and their surname. Some people don't have a first name and some don't have a surname; I'd like to do a full outer join on the two lists.

So the following lists:

ID  FirstName
--  ---------
 1  John
 2  Sue

ID  LastName
--  --------
 1  Doe
 3  Smith

Should produce:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue
 3             Smith

I'm new to LINQ (so forgive me if I'm being lame) and have found quite a few solutions for 'LINQ Outer Joins' which all look quite similar, but really seem to be left outer joins.

My attempts so far go something like this:

private void OuterJoinTest()
{
    List<FirstName> firstNames = new List<FirstName>();
    firstNames.Add(new FirstName { ID = 1, Name = "John" });
    firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

    List<LastName> lastNames = new List<LastName>();
    lastNames.Add(new LastName { ID = 1, Name = "Doe" });
    lastNames.Add(new LastName { ID = 3, Name = "Smith" });

    var outerJoin = from first in firstNames
        join last in lastNames
        on first.ID equals last.ID
        into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            id = first != null ? first.ID : last.ID,
            firstname = first != null ? first.Name : string.Empty,
            surname = last != null ? last.Name : string.Empty
        };
    }
}

public class FirstName
{
    public int ID;

    public string Name;
}

public class LastName
{
    public int ID;

    public string Name;
}

But this returns:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue

What am I doing wrong?

This question is related to c# .net linq outer-join full-outer-join

The answer is


I think there are problems with most of these, including the accepted answer, because they don't work well with Linq over IQueryable either due to doing too many server round trips and too much data returns, or doing too much client execution.

For IEnumerable I don't like Sehe's answer or similar because it has excessive memory use (a simple 10000000 two list test ran Linqpad out of memory on my 32GB machine).

Also, most of the others don't actually implement a proper Full Outer Join because they are using a Union with a Right Join instead of Concat with a Right Anti Semi Join, which not only eliminates the duplicate inner join rows from the result, but any proper duplicates that existed originally in the left or right data.

So here are my extensions that handle all of these issues, generate SQL as well as implementing the join in LINQ to SQL directly, executing on the server, and is faster and with less memory than others on Enumerables:

public static class Ext {
    public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return from left in leftItems
               join right in rightItems on leftKeySelector(left) equals rightKeySelector(right) into temp
               from right in temp.DefaultIfEmpty()
               select resultSelector(left, right);
    }

    public static IEnumerable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return from right in rightItems
               join left in leftItems on rightKeySelector(right) equals leftKeySelector(left) into temp
               from left in temp.DefaultIfEmpty()
               select resultSelector(left, right);
    }

    public static IEnumerable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    public static IEnumerable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector) {

        var hashLK = new HashSet<TKey>(from l in leftItems select leftKeySelector(l));
        return rightItems.Where(r => !hashLK.Contains(rightKeySelector(r))).Select(r => resultSelector(default(TLeft),r));
    }

    public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> leftItems,
        IEnumerable<TRight> rightItems,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector)  where TLeft : class {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;

    public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        var sampleAnonLR = new { left = default(TLeft), rightg = default(IEnumerable<TRight>) };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
        var parmC = Expression.Parameter(typeof(TRight), "c");
        var argLeft = Expression.PropertyOrField(parmP, "left");
        var newleftrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));

        return leftItems.AsQueryable().GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
    }

    public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        var sampleAnonLR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
        var parmC = Expression.Parameter(typeof(TLeft), "c");
        var argRight = Expression.PropertyOrField(parmP, "right");
        var newrightrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    private static Expression<Func<TP, TResult>> CastSBody<TP, TResult>(LambdaExpression ex, TP unusedP, TResult unusedRes) => (Expression<Func<TP, TResult>>)ex;

    public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        var sampleAnonLgR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
        var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
        var argLeft = Expression.Constant(default(TLeft), typeof(TLeft));
        var argRight = Expression.PropertyOrField(parmLgR, "right");
        var newrightrs = CastSBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }
}

The difference between a Right Anti-Semi-Join is mostly moot with Linq to Objects or in the source, but makes a difference on the server (SQL) side in the final answer, removing an unnecessary JOIN.

The hand coding of Expression to handle merging an Expression<Func<>> into a lambda could be improved with LinqKit, but it would be nice if the language/compiler had added some help for that. The FullOuterJoinDistinct and RightOuterJoin functions are included for completeness, but I did not re-implement FullOuterGroupJoin yet.

I wrote another version of a full outer join for IEnumerable for cases where the key is orderable, which is about 50% faster than combining the left outer join with the right anti semi join, at least on small collections. It goes through each collection after sorting just once.

I also added another answer for a version that works with EF by replacing the Invoke with a custom expansion.


Here is an extension method doing that:

public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
    var leftOuterJoin = from left in leftItems
        join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
        from right in temp.DefaultIfEmpty()
        select new { left, right };

    var rightOuterJoin = from right in rightItems
        join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
        from left in temp.DefaultIfEmpty()
        select new { left, right };

    var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);

    return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}

Performs a in-memory streaming enumeration over both inputs and invokes the selector for each row. If there is no correlation at the current iteration, one of the selector arguments will be null.

Example:

   var result = left.FullOuterJoin(
         right, 
         x=>left.Key, 
         x=>right.Key, 
         (l,r) => new { LeftKey = l?.Key, RightKey=r?.Key });
  • Requires an IComparer for the correlation type, uses the Comparer.Default if not provided.

  • Requires that 'OrderBy' is applied to the input enumerables

    /// <summary>
    /// Performs a full outer join on two <see cref="IEnumerable{T}" />.
    /// </summary>
    /// <typeparam name="TLeft"></typeparam>
    /// <typeparam name="TValue"></typeparam>
    /// <typeparam name="TRight"></typeparam>
    /// <typeparam name="TResult"></typeparam>
    /// <param name="left"></param>
    /// <param name="right"></param>
    /// <param name="leftKeySelector"></param>
    /// <param name="rightKeySelector"></param>
    /// <param name="selector">Expression defining result type</param>
    /// <param name="keyComparer">A comparer if there is no default for the type</param>
    /// <returns></returns>
    [System.Diagnostics.DebuggerStepThrough]
    public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TValue, TResult>(
        this IEnumerable<TLeft> left,
        IEnumerable<TRight> right,
        Func<TLeft, TValue> leftKeySelector,
        Func<TRight, TValue> rightKeySelector,
        Func<TLeft, TRight, TResult> selector,
        IComparer<TValue> keyComparer = null)
        where TLeft: class
        where TRight: class
        where TValue : IComparable
    {
    
        keyComparer = keyComparer ?? Comparer<TValue>.Default;
    
        using (var enumLeft = left.OrderBy(leftKeySelector).GetEnumerator())
        using (var enumRight = right.OrderBy(rightKeySelector).GetEnumerator())
        {
    
            var hasLeft = enumLeft.MoveNext();
            var hasRight = enumRight.MoveNext();
            while (hasLeft || hasRight)
            {
    
                var currentLeft = enumLeft.Current;
                var valueLeft = hasLeft ? leftKeySelector(currentLeft) : default(TValue);
    
                var currentRight = enumRight.Current;
                var valueRight = hasRight ? rightKeySelector(currentRight) : default(TValue);
    
                int compare =
                    !hasLeft ? 1
                    : !hasRight ? -1
                    : keyComparer.Compare(valueLeft, valueRight);
    
                switch (compare)
                {
                    case 0:
                        // The selector matches. An inner join is achieved
                        yield return selector(currentLeft, currentRight);
                        hasLeft = enumLeft.MoveNext();
                        hasRight = enumRight.MoveNext();
                        break;
                    case -1:
                        yield return selector(currentLeft, default(TRight));
                        hasLeft = enumLeft.MoveNext();
                        break;
                    case 1:
                        yield return selector(default(TLeft), currentRight);
                        hasRight = enumRight.MoveNext();
                        break;
                }
            }
    
        }
    
    }
    

I'm guessing @sehe's approach is stronger, but until I understand it better, I find myself leap-frogging off of @MichaelSander's extension. I modified it to match the syntax and return type of the built-in Enumerable.Join() method described here. I appended the "distinct" suffix in respect to @cadrell0's comment under @JeffMercado's solution.

public static class MyExtensions {

    public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
        this IEnumerable<TLeft> leftItems, 
        IEnumerable<TRight> rightItems, 
        Func<TLeft, TKey> leftKeySelector, 
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TResult> resultSelector
    ) {

        var leftJoin = 
            from left in leftItems
            join right in rightItems 
              on leftKeySelector(left) equals rightKeySelector(right) into temp
            from right in temp.DefaultIfEmpty()
            select resultSelector(left, right);

        var rightJoin = 
            from right in rightItems
            join left in leftItems 
              on rightKeySelector(right) equals leftKeySelector(left) into temp
            from left in temp.DefaultIfEmpty()
            select resultSelector(left, right);

        return leftJoin.Union(rightJoin);
    }

}

In the example, you would use it like this:

var test = 
    firstNames
    .FullJoinDistinct(
        lastNames,
        f=> f.ID,
        j=> j.ID,
        (f,j)=> new {
            ID = f == null ? j.ID : f.ID, 
            leftName = f == null ? null : f.Name,
            rightName = j == null ? null : j.Name
        }
    );

In the future, as I learn more, I have a feeling I'll be migrating to @sehe's logic given it's popularity. But even then I'll have to be careful, because I feel it is important to have at least one overload that matches the syntax of the existing ".Join()" method if feasible, for two reasons:

  1. Consistency in methods helps save time, avoid errors, and avoid unintended behavior.
  2. If there ever is an out-of-the-box ".FullJoin()" method in the future, I would imagine it will try to keep to the syntax of the currently existing ".Join()" method if it can. If it does, then if you want to migrate to it, you can simply rename your functions without changing the parameters or worrying about different return types breaking your code.

I'm still new with generics, extensions, Func statements, and other features, so feedback is certainly welcome.

EDIT: Didn't take me long to realize there was a problem with my code. I was doing a .Dump() in LINQPad and looking at the return type. It was just IEnumerable, so I tried to match it. But when I actually did a .Where() or .Select() on my extension I got an error: "'System Collections.IEnumerable' does not contain a definition for 'Select' and ...". So in the end I was able to match the input syntax of .Join(), but not the return behavior.

EDIT: Added "TResult" to the return type for the function. Missed that when reading the Microsoft article, and of course it makes sense. With this fix, it now seems the return behavior is in line with my goals after all.


I decided to add this as a separate answer as I am not positive it is tested enough. This is a re-implementation of the FullOuterJoin method using essentially a simplified, customized version of LINQKit Invoke/Expand for Expression so that it should work the Entity Framework. There's not much explanation as it is pretty much the same as my previous answer.

public static class Ext {
    private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;

    public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        // (lrg,r) => resultSelector(lrg.left, r)
        var sampleAnonLR = new { left = default(TLeft), rightg = default(IEnumerable<TRight>) };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(), "lrg");
        var parmC = Expression.Parameter(typeof(TRight), "r");
        var argLeft = Expression.PropertyOrField(parmP, "left");
        var newleftrs = CastSMBody(Expression.Lambda(resultSelector.Apply(argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));

        return leftItems.GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
    }

    public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) {

        // (lgr,l) => resultSelector(l, lgr.right)
        var sampleAnonLR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
        var parmP = Expression.Parameter(sampleAnonLR.GetType(), "lgr");
        var parmC = Expression.Parameter(typeof(TLeft), "l");
        var argRight = Expression.PropertyOrField(parmP, "right");
        var newrightrs = CastSMBody(Expression.Lambda(resultSelector.Apply(parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right })
                         .SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
    }

    private static Expression<Func<TParm, TResult>> CastSBody<TParm, TResult>(LambdaExpression ex, TParm unusedP, TResult unusedRes) => (Expression<Func<TParm, TResult>>)ex;

    public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {

        // newrightrs = lgr => resultSelector(default(TLeft), lgr.right)
        var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
        var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
        var argLeft = Expression.Constant(default(TLeft), typeof(TLeft));
        var argRight = Expression.PropertyOrField(parmLgR, "right");
        var newrightrs = CastSBody(Expression.Lambda(resultSelector.Apply(argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));

        return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
    }

    public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IQueryable<TLeft> leftItems,
        IQueryable<TRight> rightItems,
        Expression<Func<TLeft, TKey>> leftKeySelector,
        Expression<Func<TRight, TKey>> rightKeySelector,
        Expression<Func<TLeft, TRight, TResult>> resultSelector)  where TLeft : class where TRight : class where TResult : class {

        return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
    }

    public static Expression Apply(this LambdaExpression e, params Expression[] args) {
        var b = e.Body;

        foreach (var pa in e.Parameters.Cast<ParameterExpression>().Zip(args, (p, a) => (p, a))) {
            b = b.Replace(pa.p, pa.a);
        }

        return b.PropagateNull();
    }

    public static Expression Replace(this Expression orig, Expression from, Expression to) => new ReplaceVisitor(from, to).Visit(orig);
    public class ReplaceVisitor : System.Linq.Expressions.ExpressionVisitor {
        public readonly Expression from;
        public readonly Expression to;

        public ReplaceVisitor(Expression _from, Expression _to) {
            from = _from;
            to = _to;
        }

        public override Expression Visit(Expression node) => node == from ? to : base.Visit(node);
    }

    public static Expression PropagateNull(this Expression orig) => new NullVisitor().Visit(orig);
    public class NullVisitor : System.Linq.Expressions.ExpressionVisitor {
        public override Expression Visit(Expression node) {
            if (node is MemberExpression nme && nme.Expression is ConstantExpression nce && nce.Value == null)
                return Expression.Constant(null, nce.Type.GetMember(nme.Member.Name).Single().GetMemberType());
            else
                return base.Visit(node);
        }
    }

    public static Type GetMemberType(this MemberInfo member) {
        switch (member) {
            case FieldInfo mfi:
                return mfi.FieldType;
            case PropertyInfo mpi:
                return mpi.PropertyType;
            case EventInfo mei:
                return mei.EventHandlerType;
            default:
                throw new ArgumentException("MemberInfo must be if type FieldInfo, PropertyInfo or EventInfo", nameof(member));
        }
    }
}

I like sehe's answer, but it does not use deferred execution (the input sequences are eagerly enumerated by the calls to ToLookup). So after looking at the .NET sources for LINQ-to-objects, I came up with this:

public static class LinqExtensions
{
    public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> left,
        IEnumerable<TRight> right,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TKey, TResult> resultSelector,
        IEqualityComparer<TKey> comparator = null,
        TLeft defaultLeft = default(TLeft),
        TRight defaultRight = default(TRight))
    {
        if (left == null) throw new ArgumentNullException("left");
        if (right == null) throw new ArgumentNullException("right");
        if (leftKeySelector == null) throw new ArgumentNullException("leftKeySelector");
        if (rightKeySelector == null) throw new ArgumentNullException("rightKeySelector");
        if (resultSelector == null) throw new ArgumentNullException("resultSelector");

        comparator = comparator ?? EqualityComparer<TKey>.Default;
        return FullOuterJoinIterator(left, right, leftKeySelector, rightKeySelector, resultSelector, comparator, defaultLeft, defaultRight);
    }

    internal static IEnumerable<TResult> FullOuterJoinIterator<TLeft, TRight, TKey, TResult>(
        this IEnumerable<TLeft> left,
        IEnumerable<TRight> right,
        Func<TLeft, TKey> leftKeySelector,
        Func<TRight, TKey> rightKeySelector,
        Func<TLeft, TRight, TKey, TResult> resultSelector,
        IEqualityComparer<TKey> comparator,
        TLeft defaultLeft,
        TRight defaultRight)
    {
        var leftLookup = left.ToLookup(leftKeySelector, comparator);
        var rightLookup = right.ToLookup(rightKeySelector, comparator);
        var keys = leftLookup.Select(g => g.Key).Union(rightLookup.Select(g => g.Key), comparator);

        foreach (var key in keys)
            foreach (var leftValue in leftLookup[key].DefaultIfEmpty(defaultLeft))
                foreach (var rightValue in rightLookup[key].DefaultIfEmpty(defaultRight))
                    yield return resultSelector(leftValue, rightValue, key);
    }
}

This implementation has the following important properties:

  • Deferred execution, input sequences will not be enumerated before the output sequence is enumerated.
  • Only enumerates the input sequences once each.
  • Preserves order of input sequences, in the sense that it will yield tuples in the order of the left sequence and then the right (for the keys not present in left sequence).

These properties are important, because they are what someone new to FullOuterJoin but experienced with LINQ will expect.


Full outer join for two or more tables: First extract the column that you want to join on.

var DatesA = from A in db.T1 select A.Date; 
var DatesB = from B in db.T2 select B.Date; 
var DatesC = from C in db.T3 select C.Date;            

var Dates = DatesA.Union(DatesB).Union(DatesC); 

Then use left outer join between the extracted column and main tables.

var Full_Outer_Join =

(from A in Dates
join B in db.T1
on A equals B.Date into AB 

from ab in AB.DefaultIfEmpty()
join C in db.T2
on A equals C.Date into ABC 

from abc in ABC.DefaultIfEmpty()
join D in db.T3
on A equals D.Date into ABCD

from abcd in ABCD.DefaultIfEmpty() 
select new { A, ab, abc, abcd })
.AsEnumerable();

I've written this extensions class for an app perhaps 6 years ago, and have been using it ever since in many solutions without issues. Hope it helps.

edit: I noticed some might not know how to use an extension class.

To use this extension class, just reference its namespace in your class by adding the following line using joinext;

^ this should allow you to to see the intellisense of extension functions on any IEnumerable object collection you happen to use.

Hope this helps. Let me know if it's still not clear, and I'll hopefully write a sample example on how to use it.

Now here is the class:

namespace joinext
{    
public static class JoinExtensions
    {
        public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer,
            IEnumerable<TInner> inner,
            Func<TOuter, TKey> outerKeySelector,
            Func<TInner, TKey> innerKeySelector,
            Func<TOuter, TInner, TResult> resultSelector)
            where TInner : class
            where TOuter : class
        {
            var innerLookup = inner.ToLookup(innerKeySelector);
            var outerLookup = outer.ToLookup(outerKeySelector);

            var innerJoinItems = inner
                .Where(innerItem => !outerLookup.Contains(innerKeySelector(innerItem)))
                .Select(innerItem => resultSelector(null, innerItem));

            return outer
                .SelectMany(outerItem =>
                {
                    var innerItems = innerLookup[outerKeySelector(outerItem)];

                    return innerItems.Any() ? innerItems : new TInner[] { null };
                }, resultSelector)
                .Concat(innerJoinItems);
        }


        public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer,
            IEnumerable<TInner> inner,
            Func<TOuter, TKey> outerKeySelector,
            Func<TInner, TKey> innerKeySelector,
            Func<TOuter, TInner, TResult> resultSelector)
        {
            return outer.GroupJoin(
                inner,
                outerKeySelector,
                innerKeySelector,
                (o, i) =>
                    new { o = o, i = i.DefaultIfEmpty() })
                    .SelectMany(m => m.i.Select(inn =>
                        resultSelector(m.o, inn)
                        ));

        }



        public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(
            this IEnumerable<TOuter> outer,
            IEnumerable<TInner> inner,
            Func<TOuter, TKey> outerKeySelector,
            Func<TInner, TKey> innerKeySelector,
            Func<TOuter, TInner, TResult> resultSelector)
        {
            return inner.GroupJoin(
                outer,
                innerKeySelector,
                outerKeySelector,
                (i, o) =>
                    new { i = i, o = o.DefaultIfEmpty() })
                    .SelectMany(m => m.o.Select(outt =>
                        resultSelector(outt, m.i)
                        ));

        }

    }
}

My clean solution for situation that key is unique in both enumerables:

 private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
            IEnumerable<Ta> a, IEnumerable<Tb> b,
            Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
            Func<Ta, Tb, TResult> selector)
        {
            var alookup = a.ToLookup(key_a);
            var blookup = b.ToLookup(key_b);
            var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
            keys.UnionWith(blookup.Select(p => p.Key));
            return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
        }

so

    var ax = new[] {
        new { id = 1, first_name = "ali" },
        new { id = 2, first_name = "mohammad" } };
    var bx = new[] {
        new { id = 1, last_name = "rezaei" },
        new { id = 3, last_name = "kazemi" } };

    var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();

outputs:

f: ali l: rezaei
f: mohammad l:
f:  l: kazemi

Update 1: providing a truly generalized extension method FullOuterJoin
Update 2: optionally accepting a custom IEqualityComparer for the key type
Update 3: this implementation has recently become part of MoreLinq - Thanks guys!

Edit Added FullOuterGroupJoin (ideone). I reused the GetOuter<> implementation, making this a fraction less performant than it could be, but I'm aiming for 'highlevel' code, not bleeding-edge optimized, right now.

See it live on http://ideone.com/O36nWc

static void Main(string[] args)
{
    var ax = new[] { 
        new { id = 1, name = "John" },
        new { id = 2, name = "Sue" } };
    var bx = new[] { 
        new { id = 1, surname = "Doe" },
        new { id = 3, surname = "Smith" } };

    ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
        .ToList().ForEach(Console.WriteLine);
}

Prints the output:

{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b =  }
{ a = , b = { id = 3, surname = Smith } }

You could also supply defaults: http://ideone.com/kG4kqO

    ax.FullOuterJoin(
            bx, a => a.id, b => b.id, 
            (a, b, id) => new { a.name, b.surname },
            new { id = -1, name    = "(no firstname)" },
            new { id = -2, surname = "(no surname)" }
        )

Printing:

{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }

Explanation of terms used:

Joining is a term borrowed from relational database design:

  • A join will repeat elements from a as many times as there are elements in b with corresponding key (i.e.: nothing if b were empty). Database lingo calls this inner (equi)join.
  • An outer join includes elements from a for which no corresponding element exists in b. (i.e.: even results if b were empty). This is usually referred to as left join.
  • A full outer join includes records from a as well as b if no corresponding element exists in the other. (i.e. even results if a were empty)

Something not usually seen in RDBMS is a group join[1]:

  • A group join, does the same as described above, but instead of repeating elements from a for multiple corresponding b, it groups the records with corresponding keys. This is often more convenient when you wish to enumerate through 'joined' records, based on a common key.

See also GroupJoin which contains some general background explanations as well.


[1] (I believe Oracle and MSSQL have proprietary extensions for this)

Full code

A generalized 'drop-in' Extension class for this

internal static class MyExtensions
{
    internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   let xa = alookup[key]
                   let xb = blookup[key]
                   select projection(xa, xb, key);

        return join;
    }

    internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<TA, TB, TKey, TResult> projection,
        TA defaultA = default(TA), 
        TB defaultB = default(TB),
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   from xa in alookup[key].DefaultIfEmpty(defaultA)
                   from xb in blookup[key].DefaultIfEmpty(defaultB)
                   select projection(xa, xb, key);

        return join;
    }
}

Yet another full outer join

As was not that happy with the simplicity and the readability of the other propositions, I ended up with this :

It does not have the pretension to be fast ( about 800 ms to join 1000 * 1000 on a 2020m CPU : 2.4ghz / 2cores). To me, it is just a compact and casual full outer join.

It works the same as a SQL FULL OUTER JOIN (duplicates conservation)

Cheers ;-)

using System;
using System.Collections.Generic;
using System.Linq;
namespace NS
{
public static class DataReunion
{
    public static List<Tuple<T1, T2>> FullJoin<T1, T2, TKey>(List<T1> List1, Func<T1, TKey> KeyFunc1, List<T2> List2, Func<T2, TKey> KeyFunc2)
    {
        List<Tuple<T1, T2>> result = new List<Tuple<T1, T2>>();

        Tuple<TKey, T1>[] identifiedList1 = List1.Select(_ => Tuple.Create(KeyFunc1(_), _)).OrderBy(_ => _.Item1).ToArray();
        Tuple<TKey, T2>[] identifiedList2 = List2.Select(_ => Tuple.Create(KeyFunc2(_), _)).OrderBy(_ => _.Item1).ToArray();

        identifiedList1.Where(_ => !identifiedList2.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
            result.Add(Tuple.Create<T1, T2>(_.Item2, default(T2)));
        });

        result.AddRange(
            identifiedList1.Join(identifiedList2, left => left.Item1, right => right.Item1, (left, right) => Tuple.Create<T1, T2>(left.Item2, right.Item2)).ToList()
        );

        identifiedList2.Where(_ => !identifiedList1.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
            result.Add(Tuple.Create<T1, T2>(default(T1), _.Item2));
        });

        return result;
    }
}
}

The idea is to

  1. Build Ids based on provided key function builders
  2. Process left only items
  3. Process inner join
  4. Process right only items

Here is a succinct test that goes with it :

Place a break point at the end to manually verify that it behaves as expected

using System;
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using NS;

namespace Tests
{
[TestClass]
public class DataReunionTest
{
    [TestMethod]
    public void Test()
    {
        List<Tuple<Int32, Int32, String>> A = new List<Tuple<Int32, Int32, String>>();
        List<Tuple<Int32, Int32, String>> B = new List<Tuple<Int32, Int32, String>>();

        Random rnd = new Random();

        /* Comment the testing block you do not want to run
        /* Solution to test a wide range of keys*/

        for (int i = 0; i < 500; i += 1)
        {
            A.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "A"));
            B.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "B"));
        }

        /* Solution for essential testing*/

        A.Add(Tuple.Create(1, 2, "B11"));
        A.Add(Tuple.Create(1, 2, "B12"));
        A.Add(Tuple.Create(1, 3, "C11"));
        A.Add(Tuple.Create(1, 3, "C12"));
        A.Add(Tuple.Create(1, 3, "C13"));
        A.Add(Tuple.Create(1, 4, "D1"));

        B.Add(Tuple.Create(1, 1, "A21"));
        B.Add(Tuple.Create(1, 1, "A22"));
        B.Add(Tuple.Create(1, 1, "A23"));
        B.Add(Tuple.Create(1, 2, "B21"));
        B.Add(Tuple.Create(1, 2, "B22"));
        B.Add(Tuple.Create(1, 2, "B23"));
        B.Add(Tuple.Create(1, 3, "C2"));
        B.Add(Tuple.Create(1, 5, "E2"));

        Func<Tuple<Int32, Int32, String>, Tuple<Int32, Int32>> key = (_) => Tuple.Create(_.Item1, _.Item2);

        var watch = System.Diagnostics.Stopwatch.StartNew();
        var res = DataReunion.FullJoin(A, key, B, key);
        watch.Stop();
        var elapsedMs = watch.ElapsedMilliseconds;
        String aser = JToken.FromObject(res).ToString(Formatting.Indented);
        Console.Write(elapsedMs);
    }
}

}


Thank You everybody for the interesting posts!

I modified the code because in my case I needed

  • a personalized join predicate
  • a personalized union distinct comparer

For the ones interested this is my modified code (in VB, sorry)

    Module MyExtensions
        <Extension()>
        Friend Function FullOuterJoin(Of TA, TB, TResult)(ByVal a As IEnumerable(Of TA), ByVal b As IEnumerable(Of TB), ByVal joinPredicate As Func(Of TA, TB, Boolean), ByVal projection As Func(Of TA, TB, TResult), ByVal comparer As IEqualityComparer(Of TResult)) As IEnumerable(Of TResult)
            Dim joinL =
                From xa In a
                From xb In b.Where(Function(x) joinPredicate(xa, x)).DefaultIfEmpty()
                Select projection(xa, xb)
            Dim joinR =
                From xb In b
                From xa In a.Where(Function(x) joinPredicate(x, xb)).DefaultIfEmpty()
                Select projection(xa, xb)
            Return joinL.Union(joinR, comparer)
        End Function
    End Module

    Dim fullOuterJoin = lefts.FullOuterJoin(
        rights,
        Function(left, right) left.Code = right.Code And (left.Amount [...] Or left.Description.Contains [...]),
        Function(left, right) New CompareResult(left, right),
        New MyEqualityComparer
    )

    Public Class MyEqualityComparer
        Implements IEqualityComparer(Of CompareResult)

        Private Function GetMsg(obj As CompareResult) As String
            Dim msg As String = ""
            msg &= obj.Code & "_"
            [...]
            Return msg
        End Function

        Public Overloads Function Equals(x As CompareResult, y As CompareResult) As Boolean Implements IEqualityComparer(Of CompareResult).Equals
            Return Me.GetMsg(x) = Me.GetMsg(y)
        End Function

        Public Overloads Function GetHashCode(obj As CompareResult) As Integer Implements IEqualityComparer(Of CompareResult).GetHashCode
            Return Me.GetMsg(obj).GetHashCode
        End Function
    End Class

I think that LINQ join clause isn't the correct solution to this problem, because of join clause purpose isn't to accumulate data in such way as required for this task solution. The code to merge created separate collections becomes too complicated, maybe it is OK for learning purposes, but not for real applications. One of the ways how to solve this problem is in the code below:

class Program
{
    static void Main(string[] args)
    {
        List<FirstName> firstNames = new List<FirstName>();
        firstNames.Add(new FirstName { ID = 1, Name = "John" });
        firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

        List<LastName> lastNames = new List<LastName>();
        lastNames.Add(new LastName { ID = 1, Name = "Doe" });
        lastNames.Add(new LastName { ID = 3, Name = "Smith" });

        HashSet<int> ids = new HashSet<int>();
        foreach (var name in firstNames)
        {
            ids.Add(name.ID);
        }
        foreach (var name in lastNames)
        {
            ids.Add(name.ID);
        }
        List<FullName> fullNames = new List<FullName>();
        foreach (int id in ids)
        {
            FullName fullName = new FullName();
            fullName.ID = id;
            FirstName firstName = firstNames.Find(f => f.ID == id);
            fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
            LastName lastName = lastNames.Find(l => l.ID == id);
            fullName.LastName = lastName != null ? lastName.Name : string.Empty;
            fullNames.Add(fullName);
        }
    }
}
public class FirstName
{
    public int ID;

    public string Name;
}

public class LastName
{
    public int ID;

    public string Name;
}
class FullName
{
    public int ID;

    public string FirstName;

    public string LastName;
}

If real collections are large for HashSet formation instead foreach loops can be used the code below:

List<int> firstIds = firstNames.Select(f => f.ID).ToList();
List<int> LastIds = lastNames.Select(l => l.ID).ToList();
HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet

As you've found, Linq doesn't have an "outer join" construct. The closest you can get is a left outer join using the query you stated. To this, you can add any elements of the lastname list that aren't represented in the join:

outerJoin = outerJoin.Concat(lastNames.Select(l=>new
                            {
                                id = l.ID,
                                firstname = String.Empty,
                                surname = l.Name
                            }).Where(l=>!outerJoin.Any(o=>o.id == l.id)));

I really hate these linq expressions, this is why SQL exists:

select isnull(fn.id, ln.id) as id, fn.firstname, ln.lastname
   from firstnames fn
   full join lastnames ln on ln.id=fn.id

Create this as sql view in database and import it as entity.

Of course, (distinct) union of left and right joins will make it too, but it is stupid.


Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to .net

You must add a reference to assembly 'netstandard, Version=2.0.0.0 How to use Bootstrap 4 in ASP.NET Core No authenticationScheme was specified, and there was no DefaultChallengeScheme found with default authentification and custom authorization .net Core 2.0 - Package was restored using .NetFramework 4.6.1 instead of target framework .netCore 2.0. The package may not be fully compatible Update .NET web service to use TLS 1.2 EF Core add-migration Build Failed What is the difference between .NET Core and .NET Standard Class Library project types? Visual Studio 2017 - Could not load file or assembly 'System.Runtime, Version=4.1.0.0' or one of its dependencies Nuget connection attempt failed "Unable to load the service index for source" Token based authentication in Web API without any user interface

Examples related to linq

Async await in linq select How to resolve Value cannot be null. Parameter name: source in linq? What does Include() do in LINQ? Selecting multiple columns with linq query and lambda expression System.Collections.Generic.List does not contain a definition for 'Select' lambda expression join multiple tables with select and where clause LINQ select one field from list of DTO objects to array The model backing the 'ApplicationDbContext' context has changed since the database was created Check if two lists are equal Why is this error, 'Sequence contains no elements', happening?

Examples related to outer-join

How to return rows from left table not found in right table? Rewrite left outer join involving multiple tables from Informix to Oracle What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN? LINQ - Full Outer Join How to do a FULL OUTER JOIN in MySQL? Oracle "(+)" Operator Top 1 with a left join SQL SELECT from multiple tables LINQ to SQL - Left Outer Join with multiple join conditions LEFT JOIN vs. LEFT OUTER JOIN in SQL Server

Examples related to full-outer-join

LINQ - Full Outer Join How to do a FULL OUTER JOIN in MySQL? SQL Server: What is the difference between CROSS JOIN and FULL OUTER JOIN?