[c#] Proper use of 'yield return'

The yield keyword is one of those keywords in C# that continues to mystify me, and I've never been confident that I'm using it correctly.

Of the following two pieces of code, which is the preferred and why?

Version 1: Using yield return

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        foreach (Product product in products)
        {
            yield return product;
        }
    }
}

Version 2: Return the list

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList<Product>();
    }
}

This question is related to c# yield-return

The answer is


The usage of yield is similar to the keyword return, except that it will return a generator. And the generator object will only traverse once.

yield has two benefits:

  1. You do not need to read these values twice;
  2. You can get many child nodes but do not have to put them all in memory.

There is another clear explanation maybe help you.


Yield return can be very powerful for algorithms where you need to iterate through millions of objects. Consider the following example where you need to calculate possible trips for ride sharing. First we generate possible trips:

    static IEnumerable<Trip> CreatePossibleTrips()
    {
        for (int i = 0; i < 1000000; i++)
        {
            yield return new Trip
            {
                Id = i.ToString(),
                Driver = new Driver { Id = i.ToString() }
            };
        }
    }

Then iterate through each trip:

    static void Main(string[] args)
    {
        foreach (var trip in CreatePossibleTrips())
        {
            // possible trip is actually calculated only at this point, because of yield
            if (IsTripGood(trip))
            {
                // match good trip
            }
        }
    }

If you use List instead of yield, you will need to allocation 1 million objects to memory (~190mb) and this simple example will take ~1400ms to run. However, if you use yield, you don't need to put all these temp objects to memory and you will get significantly faster algorithm speed: this example will take only ~400ms to run with no memory consumption at all.


Assuming your products LINQ class uses a similar yield for enumerating/iterating, the first version is more efficient because its only yielding one value each time its iterated over.

The second example is converting the enumerator/iterator to a list with the ToList() method. This means it manually iterates over all the items in the enumerator and then returns a flat list.


Populating a temporary list is like downloading the whole video, whereas using yield is like streaming that video.


I would have used version 2 of the code in this case. Since you have the full-list of products available and that's what expected by the "consumer" of this method call, it would be required to send the complete information back to the caller.

If caller of this method requires "one" information at a time and the consumption of the next information is on-demand basis, then it would be beneficial to use yield return which will make sure the command of execution will be returned to the caller when a unit of information is available.

Some examples where one could use yield return is:

  1. Complex, step-by-step calculation where caller is waiting for data of a step at a time
  2. Paging in GUI - where user might never reach to the last page and only sub-set of information is required to be disclosed on current page

To answer your questions, I would have used the version 2.


This is kinda besides the point, but since the question is tagged best-practices I'll go ahead and throw in my two cents. For this type of thing I greatly prefer to make it into a property:

public static IEnumerable<Product> AllProducts
{
    get {
        using (AdventureWorksEntities db = new AdventureWorksEntities()) {
            var products = from product in db.Product
                           select product;

            return products;
        }
    }
}

Sure, it's a little more boiler-plate, but the code that uses this will look much cleaner:

prices = Whatever.AllProducts.Select (product => product.price);

vs

prices = Whatever.GetAllProducts().Select (product => product.price);

Note: I wouldn't do this for any methods that may take a while to do their work.


Assuming your products LINQ class uses a similar yield for enumerating/iterating, the first version is more efficient because its only yielding one value each time its iterated over.

The second example is converting the enumerator/iterator to a list with the ToList() method. This means it manually iterates over all the items in the enumerator and then returns a flat list.


Yield has two great uses

It helps to provide custom iteration with out creating temp collections. ( loading all data and looping)

It helps to do stateful iteration. ( streaming)

Below is a simple video which i have created with full demonstration in order to support the above two points

http://www.youtube.com/watch?v=4fju3xcm21M


And what about this?

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList();
    }
}

I guess this is much cleaner. I do not have VS2008 at hand to check, though. In any case, if Products implements IEnumerable (as it seems to - it is used in a foreach statement), I'd return it directly.


Assuming your products LINQ class uses a similar yield for enumerating/iterating, the first version is more efficient because its only yielding one value each time its iterated over.

The second example is converting the enumerator/iterator to a list with the ToList() method. This means it manually iterates over all the items in the enumerator and then returns a flat list.


As a conceptual example for understanding when you ought to use yield, let's say the method ConsumeLoop() processes the items returned/yielded by ProduceList():

void ConsumeLoop() {
    foreach (Consumable item in ProduceList())        // might have to wait here
        item.Consume();
}

IEnumerable<Consumable> ProduceList() {
    while (KeepProducing())
        yield return ProduceExpensiveConsumable();    // expensive
}

Without yield, the call to ProduceList() might take a long time because you have to complete the list before returning:

//pseudo-assembly
Produce consumable[0]                   // expensive operation, e.g. disk I/O
Produce consumable[1]                   // waiting...
Produce consumable[2]                   // waiting...
Produce consumable[3]                   // completed the consumable list
Consume consumable[0]                   // start consuming
Consume consumable[1]
Consume consumable[2]
Consume consumable[3]

Using yield, it becomes rearranged, sort of interleaved:

//pseudo-assembly
Produce consumable[0]
Consume consumable[0]                   // immediately yield & Consume
Produce consumable[1]                   // ConsumeLoop iterates, requesting next item
Consume consumable[1]                   // consume next
Produce consumable[2]
Consume consumable[2]                   // consume next
Produce consumable[3]
Consume consumable[3]                   // consume next

And lastly, as many before have already suggested, you should use Version 2 because you already have the completed list anyway.


And what about this?

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList();
    }
}

I guess this is much cleaner. I do not have VS2008 at hand to check, though. In any case, if Products implements IEnumerable (as it seems to - it is used in a foreach statement), I'd return it directly.


As a conceptual example for understanding when you ought to use yield, let's say the method ConsumeLoop() processes the items returned/yielded by ProduceList():

void ConsumeLoop() {
    foreach (Consumable item in ProduceList())        // might have to wait here
        item.Consume();
}

IEnumerable<Consumable> ProduceList() {
    while (KeepProducing())
        yield return ProduceExpensiveConsumable();    // expensive
}

Without yield, the call to ProduceList() might take a long time because you have to complete the list before returning:

//pseudo-assembly
Produce consumable[0]                   // expensive operation, e.g. disk I/O
Produce consumable[1]                   // waiting...
Produce consumable[2]                   // waiting...
Produce consumable[3]                   // completed the consumable list
Consume consumable[0]                   // start consuming
Consume consumable[1]
Consume consumable[2]
Consume consumable[3]

Using yield, it becomes rearranged, sort of interleaved:

//pseudo-assembly
Produce consumable[0]
Consume consumable[0]                   // immediately yield & Consume
Produce consumable[1]                   // ConsumeLoop iterates, requesting next item
Consume consumable[1]                   // consume next
Produce consumable[2]
Consume consumable[2]                   // consume next
Produce consumable[3]
Consume consumable[3]                   // consume next

And lastly, as many before have already suggested, you should use Version 2 because you already have the completed list anyway.


This is what Chris Sells tells about those statements in The C# Programming Language;

I sometimes forget that yield return is not the same as return , in that the code after a yield return can be executed. For example, the code after the first return here can never be executed:

int F() {
    return 1;
    return 2; // Can never be executed
}

In contrast, the code after the first yield return here can be executed:

IEnumerable<int> F() {
    yield return 1;
    yield return 2; // Can be executed
}

This often bites me in an if statement:

IEnumerable<int> F() {
    if(...) {
        yield return 1; // I mean this to be the only thing returned
    }
    yield return 2; // Oops!
}

In these cases, remembering that yield return is not “final” like return is helpful.


The two pieces of code are really doing two different things. The first version will pull members as you need them. The second version will load all the results into memory before you start to do anything with it.

There's no right or wrong answer to this one. Which one is preferable just depends on the situation. For example, if there's a limit of time that you have to complete your query and you need to do something semi-complicated with the results, the second version could be preferable. But beware large resultsets, especially if you're running this code in 32-bit mode. I've been bitten by OutOfMemory exceptions several times when doing this method.

The key thing to keep in mind is this though: the differences are in efficiency. Thus, you should probably go with whichever one makes your code simpler and change it only after profiling.


I know this is an old question, but I'd like to offer one example of how the yield keyword can be creatively used. I have really benefited from this technique. Hopefully this will be of assistance to anyone else who stumbles upon this question.

Note: Don't think about the yield keyword as merely being another way to build a collection. A big part of the power of yield comes in the fact that execution is paused in your method or property until the calling code iterates over the next value. Here's my example:

Using the yield keyword (alongside Rob Eisenburg's Caliburn.Micro coroutines implementation) allows me to express an asynchronous call to a web service like this:

public IEnumerable<IResult> HandleButtonClick() {
    yield return Show.Busy();

    var loginCall = new LoginResult(wsClient, Username, Password);
    yield return loginCall;
    this.IsLoggedIn = loginCall.Success;

    yield return Show.NotBusy();
}

What this will do is turn my BusyIndicator on, call the Login method on my web service, set my IsLoggedIn flag to the return value, and then turn the BusyIndicator back off.

Here's how this works: IResult has an Execute method and a Completed event. Caliburn.Micro grabs the IEnumerator from the call to HandleButtonClick() and passes it into a Coroutine.BeginExecute method. The BeginExecute method starts iterating through the IResults. When the first IResult is returned, execution is paused inside HandleButtonClick(), and BeginExecute() attaches an event handler to the Completed event and calls Execute(). IResult.Execute() can perform either a synchronous or an asynchronous task and fires the Completed event when it's done.

LoginResult looks something like this:

public LoginResult : IResult {
    // Constructor to set private members...

    public void Execute(ActionExecutionContext context) {
        wsClient.LoginCompleted += (sender, e) => {
            this.Success = e.Result;
            Completed(this, new ResultCompletionEventArgs());
        };
        wsClient.Login(username, password);
    }

    public event EventHandler<ResultCompletionEventArgs> Completed = delegate { };
    public bool Success { get; private set; }
}

It may help to set up something like this and step through the execution to watch what's going on.

Hope this helps someone out! I've really enjoyed exploring the different ways yield can be used.


This is kinda besides the point, but since the question is tagged best-practices I'll go ahead and throw in my two cents. For this type of thing I greatly prefer to make it into a property:

public static IEnumerable<Product> AllProducts
{
    get {
        using (AdventureWorksEntities db = new AdventureWorksEntities()) {
            var products = from product in db.Product
                           select product;

            return products;
        }
    }
}

Sure, it's a little more boiler-plate, but the code that uses this will look much cleaner:

prices = Whatever.AllProducts.Select (product => product.price);

vs

prices = Whatever.GetAllProducts().Select (product => product.price);

Note: I wouldn't do this for any methods that may take a while to do their work.


Return the list directly. Benefits:

  • It's more clear
  • The list is reusable. (the iterator is not) not actually true, Thanks Jon

You should use the iterator (yield) from when you think you probably won't have to iterate all the way to the end of the list, or when it has no end. For example, the client calling is going to be searching for the first product that satisfies some predicate, you might consider using the iterator, although that's a contrived example, and there are probably better ways to accomplish it. Basically, if you know in advance that the whole list will need to be calculated, just do it up front. If you think that it won't, then consider using the iterator version.


The two pieces of code are really doing two different things. The first version will pull members as you need them. The second version will load all the results into memory before you start to do anything with it.

There's no right or wrong answer to this one. Which one is preferable just depends on the situation. For example, if there's a limit of time that you have to complete your query and you need to do something semi-complicated with the results, the second version could be preferable. But beware large resultsets, especially if you're running this code in 32-bit mode. I've been bitten by OutOfMemory exceptions several times when doing this method.

The key thing to keep in mind is this though: the differences are in efficiency. Thus, you should probably go with whichever one makes your code simpler and change it only after profiling.


This is kinda besides the point, but since the question is tagged best-practices I'll go ahead and throw in my two cents. For this type of thing I greatly prefer to make it into a property:

public static IEnumerable<Product> AllProducts
{
    get {
        using (AdventureWorksEntities db = new AdventureWorksEntities()) {
            var products = from product in db.Product
                           select product;

            return products;
        }
    }
}

Sure, it's a little more boiler-plate, but the code that uses this will look much cleaner:

prices = Whatever.AllProducts.Select (product => product.price);

vs

prices = Whatever.GetAllProducts().Select (product => product.price);

Note: I wouldn't do this for any methods that may take a while to do their work.


And what about this?

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList();
    }
}

I guess this is much cleaner. I do not have VS2008 at hand to check, though. In any case, if Products implements IEnumerable (as it seems to - it is used in a foreach statement), I'd return it directly.


And what about this?

public static IEnumerable<Product> GetAllProducts()
{
    using (AdventureWorksEntities db = new AdventureWorksEntities())
    {
        var products = from product in db.Product
                       select product;

        return products.ToList();
    }
}

I guess this is much cleaner. I do not have VS2008 at hand to check, though. In any case, if Products implements IEnumerable (as it seems to - it is used in a foreach statement), I'd return it directly.


Yield has two great uses

It helps to provide custom iteration with out creating temp collections. ( loading all data and looping)

It helps to do stateful iteration. ( streaming)

Below is a simple video which i have created with full demonstration in order to support the above two points

http://www.youtube.com/watch?v=4fju3xcm21M


Assuming your products LINQ class uses a similar yield for enumerating/iterating, the first version is more efficient because its only yielding one value each time its iterated over.

The second example is converting the enumerator/iterator to a list with the ToList() method. This means it manually iterates over all the items in the enumerator and then returns a flat list.


Return the list directly. Benefits:

  • It's more clear
  • The list is reusable. (the iterator is not) not actually true, Thanks Jon

You should use the iterator (yield) from when you think you probably won't have to iterate all the way to the end of the list, or when it has no end. For example, the client calling is going to be searching for the first product that satisfies some predicate, you might consider using the iterator, although that's a contrived example, and there are probably better ways to accomplish it. Basically, if you know in advance that the whole list will need to be calculated, just do it up front. If you think that it won't, then consider using the iterator version.


Yield return can be very powerful for algorithms where you need to iterate through millions of objects. Consider the following example where you need to calculate possible trips for ride sharing. First we generate possible trips:

    static IEnumerable<Trip> CreatePossibleTrips()
    {
        for (int i = 0; i < 1000000; i++)
        {
            yield return new Trip
            {
                Id = i.ToString(),
                Driver = new Driver { Id = i.ToString() }
            };
        }
    }

Then iterate through each trip:

    static void Main(string[] args)
    {
        foreach (var trip in CreatePossibleTrips())
        {
            // possible trip is actually calculated only at this point, because of yield
            if (IsTripGood(trip))
            {
                // match good trip
            }
        }
    }

If you use List instead of yield, you will need to allocation 1 million objects to memory (~190mb) and this simple example will take ~1400ms to run. However, if you use yield, you don't need to put all these temp objects to memory and you will get significantly faster algorithm speed: this example will take only ~400ms to run with no memory consumption at all.


I know this is an old question, but I'd like to offer one example of how the yield keyword can be creatively used. I have really benefited from this technique. Hopefully this will be of assistance to anyone else who stumbles upon this question.

Note: Don't think about the yield keyword as merely being another way to build a collection. A big part of the power of yield comes in the fact that execution is paused in your method or property until the calling code iterates over the next value. Here's my example:

Using the yield keyword (alongside Rob Eisenburg's Caliburn.Micro coroutines implementation) allows me to express an asynchronous call to a web service like this:

public IEnumerable<IResult> HandleButtonClick() {
    yield return Show.Busy();

    var loginCall = new LoginResult(wsClient, Username, Password);
    yield return loginCall;
    this.IsLoggedIn = loginCall.Success;

    yield return Show.NotBusy();
}

What this will do is turn my BusyIndicator on, call the Login method on my web service, set my IsLoggedIn flag to the return value, and then turn the BusyIndicator back off.

Here's how this works: IResult has an Execute method and a Completed event. Caliburn.Micro grabs the IEnumerator from the call to HandleButtonClick() and passes it into a Coroutine.BeginExecute method. The BeginExecute method starts iterating through the IResults. When the first IResult is returned, execution is paused inside HandleButtonClick(), and BeginExecute() attaches an event handler to the Completed event and calls Execute(). IResult.Execute() can perform either a synchronous or an asynchronous task and fires the Completed event when it's done.

LoginResult looks something like this:

public LoginResult : IResult {
    // Constructor to set private members...

    public void Execute(ActionExecutionContext context) {
        wsClient.LoginCompleted += (sender, e) => {
            this.Success = e.Result;
            Completed(this, new ResultCompletionEventArgs());
        };
        wsClient.Login(username, password);
    }

    public event EventHandler<ResultCompletionEventArgs> Completed = delegate { };
    public bool Success { get; private set; }
}

It may help to set up something like this and step through the execution to watch what's going on.

Hope this helps someone out! I've really enjoyed exploring the different ways yield can be used.


The usage of yield is similar to the keyword return, except that it will return a generator. And the generator object will only traverse once.

yield has two benefits:

  1. You do not need to read these values twice;
  2. You can get many child nodes but do not have to put them all in memory.

There is another clear explanation maybe help you.


The two pieces of code are really doing two different things. The first version will pull members as you need them. The second version will load all the results into memory before you start to do anything with it.

There's no right or wrong answer to this one. Which one is preferable just depends on the situation. For example, if there's a limit of time that you have to complete your query and you need to do something semi-complicated with the results, the second version could be preferable. But beware large resultsets, especially if you're running this code in 32-bit mode. I've been bitten by OutOfMemory exceptions several times when doing this method.

The key thing to keep in mind is this though: the differences are in efficiency. Thus, you should probably go with whichever one makes your code simpler and change it only after profiling.


I would have used version 2 of the code in this case. Since you have the full-list of products available and that's what expected by the "consumer" of this method call, it would be required to send the complete information back to the caller.

If caller of this method requires "one" information at a time and the consumption of the next information is on-demand basis, then it would be beneficial to use yield return which will make sure the command of execution will be returned to the caller when a unit of information is available.

Some examples where one could use yield return is:

  1. Complex, step-by-step calculation where caller is waiting for data of a step at a time
  2. Paging in GUI - where user might never reach to the last page and only sub-set of information is required to be disclosed on current page

To answer your questions, I would have used the version 2.


Return the list directly. Benefits:

  • It's more clear
  • The list is reusable. (the iterator is not) not actually true, Thanks Jon

You should use the iterator (yield) from when you think you probably won't have to iterate all the way to the end of the list, or when it has no end. For example, the client calling is going to be searching for the first product that satisfies some predicate, you might consider using the iterator, although that's a contrived example, and there are probably better ways to accomplish it. Basically, if you know in advance that the whole list will need to be calculated, just do it up front. If you think that it won't, then consider using the iterator version.


Populating a temporary list is like downloading the whole video, whereas using yield is like streaming that video.


The two pieces of code are really doing two different things. The first version will pull members as you need them. The second version will load all the results into memory before you start to do anything with it.

There's no right or wrong answer to this one. Which one is preferable just depends on the situation. For example, if there's a limit of time that you have to complete your query and you need to do something semi-complicated with the results, the second version could be preferable. But beware large resultsets, especially if you're running this code in 32-bit mode. I've been bitten by OutOfMemory exceptions several times when doing this method.

The key thing to keep in mind is this though: the differences are in efficiency. Thus, you should probably go with whichever one makes your code simpler and change it only after profiling.