[c#] LINQ Aggregate algorithm explained

This might sound lame, but I have not been able to find a really good explanation of Aggregate.

Good means short, descriptive, comprehensive with a small and clear example.

This question is related to c# .net linq aggregate

The answer is


Aggregate is basically used to Group or Sum up data.

According to MSDN "Aggregate Function Applies an accumulator function over a sequence."

Example 1: Add all the numbers in a array.

int[] numbers = new int[] { 1,2,3,4,5 };
int aggregatedValue = numbers.Aggregate((total, nextValue) => total + nextValue);

*important: The initial aggregate value by default is the 1 element in the sequence of collection. i.e: the total variable initial value will be 1 by default.

variable explanation

total: it will hold the sum up value(aggregated value) returned by the func.

nextValue: it is the next value in the array sequence. This value is than added to the aggregated value i.e total.

Example 2: Add all items in an array. Also set the initial accumulator value to start adding with from 10.

int[] numbers = new int[] { 1,2,3,4,5 };
int aggregatedValue = numbers.Aggregate(10, (total, nextValue) => total + nextValue);

arguments explanation:

the first argument is the initial(starting value i.e seed value) which will be used to start addition with the next value in the array.

the second argument is a func which is a func that takes 2 int.

1.total: this will hold same as before the sum up value(aggregated value) returned by the func after the calculation.

2.nextValue: : it is the next value in the array sequence. This value is than added to the aggregated value i.e total.

Also debugging this code will give you a better understanding of how aggregate work.


Learned a lot from Jamiec's answer.

If the only need is to generate CSV string, you may try this.

var csv3 = string.Join(",",chars);

Here is a test with 1 million strings

0.28 seconds = Aggregate w/ String Builder 
0.30 seconds = String.Join 

Source code is here


It partly depends on which overload you're talking about, but the basic idea is:

  • Start with a seed as the "current value"
  • Iterate over the sequence. For each value in the sequence:
    • Apply a user-specified function to transform (currentValue, sequenceValue) into (nextValue)
    • Set currentValue = nextValue
  • Return the final currentValue

You may find the Aggregate post in my Edulinq series useful - it includes a more detailed description (including the various overloads) and implementations.

One simple example is using Aggregate as an alternative to Count:

// 0 is the seed, and for each item, we effectively increment the current value.
// In this case we can ignore "item" itself.
int count = sequence.Aggregate(0, (current, item) => current + 1);

Or perhaps summing all the lengths of strings in a sequence of strings:

int total = sequence.Aggregate(0, (current, item) => current + item.Length);

Personally I rarely find Aggregate useful - the "tailored" aggregation methods are usually good enough for me.


Aggregate used to sum columns in a multi dimensional integer array

        int[][] nonMagicSquare =
        {
            new int[] {  3,  1,  7,  8 },
            new int[] {  2,  4, 16,  5 },
            new int[] { 11,  6, 12, 15 },
            new int[] {  9, 13, 10, 14 }
        };

        IEnumerable<int> rowSums = nonMagicSquare
            .Select(row => row.Sum());
        IEnumerable<int> colSums = nonMagicSquare
            .Aggregate(
                (priorSums, currentRow) =>
                    priorSums.Select((priorSum, index) => priorSum + currentRow[index]).ToArray()
                );

Select with index is used within the Aggregate func to sum the matching columns and return a new Array; { 3 + 2 = 5, 1 + 4 = 5, 7 + 16 = 23, 8 + 5 = 13 }.

        Console.WriteLine("rowSums: " + string.Join(", ", rowSums)); // rowSums: 19, 27, 44, 46
        Console.WriteLine("colSums: " + string.Join(", ", colSums)); // colSums: 25, 24, 45, 42

But counting the number of trues in a Boolean array is more difficult since the accumulated type (int) differs from the source type (bool); here a seed is necessary in order to use the second overload.

        bool[][] booleanTable =
        {
            new bool[] { true, true, true, false },
            new bool[] { false, false, false, true },
            new bool[] { true, false, false, true },
            new bool[] { true, true, false, false }
        };

        IEnumerable<int> rowCounts = booleanTable
            .Select(row => row.Select(value => value ? 1 : 0).Sum());
        IEnumerable<int> seed = new int[booleanTable.First().Length];
        IEnumerable<int> colCounts = booleanTable
            .Aggregate(seed,
                (priorSums, currentRow) =>
                    priorSums.Select((priorSum, index) => priorSum + (currentRow[index] ? 1 : 0)).ToArray()
                );

        Console.WriteLine("rowCounts: " + string.Join(", ", rowCounts)); // rowCounts: 3, 1, 2, 2
        Console.WriteLine("colCounts: " + string.Join(", ", colCounts)); // colCounts: 3, 2, 1, 2

Everyone has given his explanation. My explanation is like that.

Aggregate method applies a function to each item of a collection. For example, let's have collection { 6, 2, 8, 3 } and the function Add (operator +) it does (((6+2)+8)+3) and returns 19

var numbers = new List<int> { 6, 2, 8, 3 };
int sum = numbers.Aggregate(func: (result, item) => result + item);
// sum: (((6+2)+8)+3) = 19

In this example there is passed named method Add instead of lambda expression.

var numbers = new List<int> { 6, 2, 8, 3 };
int sum = numbers.Aggregate(func: Add);
// sum: (((6+2)+8)+3) = 19

private static int Add(int x, int y) { return x + y; }

In addition to all the great answers here already, I've also used it to walk an item through a series of transformation steps.

If a transformation is implemented as a Func<T,T>, you can add several transformations to a List<Func<T,T>> and use Aggregate to walk an instance of T through each step.

A more concrete example

You want to take a string value, and walk it through a series of text transformations that could be built programatically.

var transformationPipeLine = new List<Func<string, string>>();
transformationPipeLine.Add((input) => input.Trim());
transformationPipeLine.Add((input) => input.Substring(1));
transformationPipeLine.Add((input) => input.Substring(0, input.Length - 1));
transformationPipeLine.Add((input) => input.ToUpper());

var text = "    cat   ";
var output = transformationPipeLine.Aggregate(text, (input, transform)=> transform(input));
Console.WriteLine(output);

This will create a chain of transformations: Remove leading and trailing spaces -> remove first character -> remove last character -> convert to upper-case. Steps in this chain can be added, removed, or reordered as needed, to create whatever kind of transformation pipeline is required.

The end result of this specific pipeline, is that " cat " becomes "A".


This can become very powerful once you realize that T can be anything. This could be used for image transformations, like filters, using BitMap as an example;


A picture is worth a thousand words

Reminder:
Func<X, Y, R> is a function with two inputs of type X and Y, that returns a result of type R.

Enumerable.Aggregate has three overloads:


Overload 1:

A Aggregate<A>(IEnumerable<A> a, Func<A, A, A> f)

Aggregate1

Example:

new[]{1,2,3,4}.Aggregate((x, y) => x + y);  // 10


This overload is simple, but it has the following limitations:

  • the sequence must contain at least one element,
    otherwise the function will throw an InvalidOperationException.
  • elements and result must be of the same type.



Overload 2:

B Aggregate<A, B>(IEnumerable<A> a, B bIn, Func<B, A, B> f)

Aggregate2

Example:

var hayStack = new[] {"straw", "needle", "straw", "straw", "needle"};
var nNeedles = hayStack.Aggregate(0, (n, e) => e == "needle" ? n+1 : n);  // 2


This overload is more general:

  • a seed value must be provided (bIn).
  • the collection can be empty,
    in this case, the function will yield the seed value as result.
  • elements and result can have different types.



Overload 3:

C Aggregate<A,B,C>(IEnumerable<A> a, B bIn, Func<B,A,B> f, Func<B,C> f2)


The third overload is not very useful IMO.
The same can be written more succinctly by using overload 2 followed by a function that transforms its result.


The illustrations are adapted from this excellent blogpost.


A short and essential definition might be this: Linq Aggregate extension method allows to declare a sort of recursive function applied on the elements of a list, the operands of whom are two: the elements in the order in which they are present into the list, one element at a time, and the result of the previous recursive iteration or nothing if not yet recursion.

In this way you can compute the factorial of numbers, or concatenate strings.


This is an explanation about using Aggregate on a Fluent API such as Linq Sorting.

var list = new List<Student>();
var sorted = list
    .OrderBy(s => s.LastName)
    .ThenBy(s => s.FirstName)
    .ThenBy(s => s.Age)
    .ThenBy(s => s.Grading)
    .ThenBy(s => s.TotalCourses);

and lets see we want to implement a sort function that take a set of fields, this is very easy using Aggregate instead of a for-loop, like this:

public static IOrderedEnumerable<Student> MySort(
    this List<Student> list,
    params Func<Student, object>[] fields)
{
    var firstField = fields.First();
    var otherFields = fields.Skip(1);

    var init = list.OrderBy(firstField);
    return otherFields.Skip(1).Aggregate(init, (resultList, current) => resultList.ThenBy(current));
}

And we can use it like this:

var sorted = list.MySort(
    s => s.LastName,
    s => s.FirstName,
    s => s.Age,
    s => s.Grading,
    s => s.TotalCourses);

Super short Aggregate works like fold in Haskell/ML/F#.

Slightly longer .Max(), .Min(), .Sum(), .Average() all iterates over the elements in a sequence and aggregates them using the respective aggregate function. .Aggregate () is generalized aggregator in that it allows the developer to specify the start state (aka seed) and the aggregate function.

I know you asked for a short explaination but I figured as others gave a couple of short answers I figured you would perhaps be interested in a slightly longer one

Long version with code One way to illustrate what does it could be show how you implement Sample Standard Deviation once using foreach and once using .Aggregate. Note: I haven't prioritized performance here so I iterate several times over the colleciton unnecessarily

First a helper function used to create a sum of quadratic distances:

static double SumOfQuadraticDistance (double average, int value, double state)
{
    var diff = (value - average);
    return state + diff * diff;
}

Then Sample Standard Deviation using ForEach:

static double SampleStandardDeviation_ForEach (
    this IEnumerable<int> ints)
{
    var length = ints.Count ();
    if (length < 2)
    {
        return 0.0;
    }

    const double seed = 0.0;
    var average = ints.Average ();

    var state = seed;
    foreach (var value in ints)
    {
        state = SumOfQuadraticDistance (average, value, state);
    }
    var sumOfQuadraticDistance = state;

    return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}

Then once using .Aggregate:

static double SampleStandardDeviation_Aggregate (
    this IEnumerable<int> ints)
{
    var length = ints.Count ();
    if (length < 2)
    {
        return 0.0;
    }

    const double seed = 0.0;
    var average = ints.Average ();

    var sumOfQuadraticDistance = ints
        .Aggregate (
            seed,
            (state, value) => SumOfQuadraticDistance (average, value, state)
            );

    return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}

Note that these functions are identical except for how sumOfQuadraticDistance is calculated:

var state = seed;
foreach (var value in ints)
{
    state = SumOfQuadraticDistance (average, value, state);
}
var sumOfQuadraticDistance = state;

Versus:

var sumOfQuadraticDistance = ints
    .Aggregate (
        seed,
        (state, value) => SumOfQuadraticDistance (average, value, state)
        );

So what .Aggregate does is that it encapsulates this aggregator pattern and I expect that the implementation of .Aggregate would look something like this:

public static TAggregate Aggregate<TAggregate, TValue> (
    this IEnumerable<TValue> values,
    TAggregate seed,
    Func<TAggregate, TValue, TAggregate> aggregator
    )
{
    var state = seed;

    foreach (var value in values)
    {
        state = aggregator (state, value);
    }

    return state;
}

Using the Standard deviation functions would look something like this:

var ints = new[] {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
var average = ints.Average ();
var sampleStandardDeviation = ints.SampleStandardDeviation_Aggregate ();
var sampleStandardDeviation2 = ints.SampleStandardDeviation_ForEach ();

Console.WriteLine (average);
Console.WriteLine (sampleStandardDeviation);
Console.WriteLine (sampleStandardDeviation2);

IMHO

So does .Aggregate help readability? In general I love LINQ because I think .Where, .Select, .OrderBy and so on greatly helps readability (if you avoid inlined hierarhical .Selects). Aggregate has to be in Linq for completeness reasons but personally I am not so convinced that .Aggregate adds readability compared to a well written foreach.


Definition

Aggregate method is an extension method for generic collections. Aggregate method applies a function to each item of a collection. Not just only applies a function, but takes its result as initial value for the next iteration. So, as a result, we will get a computed value (min, max, avg, or other statistical value) from a collection.

Therefore, Aggregate method is a form of safe implementation of a recursive function.

Safe, because the recursion will iterate over each item of a collection and we can’t get any infinite loop suspension by wrong exit condition. Recursive, because the current function’s result is used as a parameter for the next function call.

Syntax:

collection.Aggregate(seed, func, resultSelector);
  • seed - initial value by default;
  • func - our recursive function. It can be a lambda-expression, a Func delegate or a function type T F(T result, T nextValue);
  • resultSelector - it can be a function like func or an expression to compute, transform, change, convert the final result.

How it works:

var nums = new[]{1, 2};
var result = nums.Aggregate(1, (result, n) => result + n); //result = (1 + 1) + 2 = 4
var result2 = nums.Aggregate(0, (result, n) => result + n, response => (decimal)response/2.0); //result2 = ((0 + 1) + 2)*1.0/2.0 = 3*1.0/2.0 = 3.0/2.0 = 1.5

Practical usage:

  1. Find Factorial from a number n:

int n = 7;
var numbers = Enumerable.Range(1, n);
var factorial = numbers.Aggregate((result, x) => result * x);

which is doing the same thing as this function:

public static int Factorial(int n)
{
   if (n < 1) return 1;

   return n * Factorial(n - 1);
}
  1. Aggregate() is one of the most powerful LINQ extension method, like Select() and Where(). We can use it to replace the Sum(), Min(). Max(), Avg() functionality, or to change it by implementing addition context:
    var numbers = new[]{3, 2, 6, 4, 9, 5, 7};
    var avg = numbers.Aggregate(0.0, (result, x) => result + x, response => (double)response/(double)numbers.Count());
    var min = numbers.Aggregate((result, x) => (result < x)? result: x);
  1. More complex usage of extension methods:
    var path = @“c:\path-to-folder”;

    string[] txtFiles = Directory.GetFiles(path).Where(f => f.EndsWith(“.txt”)).ToArray<string>();
    var output = txtFiles.Select(f => File.ReadAllText(f, Encoding.Default)).Aggregate<string>((result, content) => result + content);

    File.WriteAllText(path + “summary.txt”, output, Encoding.Default);

    Console.WriteLine(“Text files merged into: {0}”, output); //or other log info

Examples related to c#

How can I convert this one line of ActionScript to C#? Microsoft Advertising SDK doesn't deliverer ads How to use a global array in C#? How to correctly write async method? C# - insert values from file into two arrays Uploading into folder in FTP? Are these methods thread safe? dotnet ef not found in .NET Core 3 HTTP Error 500.30 - ANCM In-Process Start Failure Best way to "push" into C# array

Examples related to .net

You must add a reference to assembly 'netstandard, Version=2.0.0.0 How to use Bootstrap 4 in ASP.NET Core No authenticationScheme was specified, and there was no DefaultChallengeScheme found with default authentification and custom authorization .net Core 2.0 - Package was restored using .NetFramework 4.6.1 instead of target framework .netCore 2.0. The package may not be fully compatible Update .NET web service to use TLS 1.2 EF Core add-migration Build Failed What is the difference between .NET Core and .NET Standard Class Library project types? Visual Studio 2017 - Could not load file or assembly 'System.Runtime, Version=4.1.0.0' or one of its dependencies Nuget connection attempt failed "Unable to load the service index for source" Token based authentication in Web API without any user interface

Examples related to linq

Async await in linq select How to resolve Value cannot be null. Parameter name: source in linq? What does Include() do in LINQ? Selecting multiple columns with linq query and lambda expression System.Collections.Generic.List does not contain a definition for 'Select' lambda expression join multiple tables with select and where clause LINQ select one field from list of DTO objects to array The model backing the 'ApplicationDbContext' context has changed since the database was created Check if two lists are equal Why is this error, 'Sequence contains no elements', happening?

Examples related to aggregate

Pandas group-by and sum SELECT list is not in GROUP BY clause and contains nonaggregated column Aggregate multiple columns at once Pandas sum by groupby, but exclude certain columns Extract the maximum value within each group in a dataframe How to group dataframe rows into list in pandas groupby Mean per group in a data.frame Summarizing multiple columns with dplyr? data.frame Group By column Compute mean and standard deviation by group for multiple variables in a data.frame