Whilst the most upvoted answer is correct but it lacks usage of multi-core processing. In my case, having 12 cores I use PLink:
Parallel.ForEach(
File.ReadLines(filename), //returns IEumberable<string>: lazy-loading
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
(line, state, index) =>
{
//process line value
}
);
Worth mentioning, I got that as an interview question asking return Top 10 most occurrences:
var result = new ConcurrentDictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
Parallel.ForEach(
File.ReadLines(filename),
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
(line, state, index) =>
{
result.AddOrUpdate(line, 1, (key, val) => val + 1);
}
);
return result
.OrderByDescending(x => x.Value)
.Take(10)
.Select(x => x.Value);
Benchmarking:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
[Host] : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT
DefaultJob : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
GetTopWordsSync | 33.03 s | 0.175 s | 0.155 s | 1194000 | 314000 | 7000 | 7.06 GB |
GetTopWordsParallel | 10.89 s | 0.121 s | 0.113 s | 1225000 | 354000 | 8000 | 7.18 GB |
And as you can see it's 75% performance improvement.