CSharpFeeds - All your C# feeds in one place.

Sponsors

Tuesday, July 27, 2010

Iterate, damn you!

by skeet via Jon Skeet: Coding Blog on 7/27/2010 6:52:23 PM

Do you know the hardest thing about presenting code with surprising results? It's hard to do so without effectively inviting readers to look for the trick. Not that that's always enough - I failed the last of Neal and Eric's C# puzzlers at NDC, for example. (If you haven't already watched the video, please do so now. It's far more entertaining than this blog post.) Anyway, this one may be obvious to some of you, but there are some interesting aspects even when you've got the twist, as it were.

What does the following code print?

using System;
using System.Collections.Generic;

public class WeirdIterators
{
    static void ShowNext(IEnumerator<int> iterator)
    {
        if (iterator.MoveNext())
        {
            Console.WriteLine(iterator.Current);
        }
        else
        {
            Console.WriteLine("Done");
        }
    }
    
    static void Main()
    {
        List<int> values = new List<int> { 1, 2 };
        using (var iterator = values.GetEnumerator())
        {
            ShowNext(iterator);
            ShowNext(iterator);
            ShowNext(iterator);
        }
    }
}

If you guessed "1, 2, Done" despite the title of the post and the hints that it was surprising, then you're at least brave and firm in your beliefs. I suspect most readers will correctly guess that it prints "1, 1, 1" - but I also suspect some of you won't have worked out why.

Let's look at the signature of List<T>.GetEnumerator(). We'd expect it to be

public IEnumerator<T> GetEnumerator()

right? That's what IEnumerable<T> says it'll look like. Well, no. List<T> uses explicit interface implementation for IEnumerable<T>. The signature actually looks like this:

public List<T>.Enumerator GetEnumerator()

Hmm... that's an unfamiliar type. Let's have another look in MSDN...

[SerializableAttribute]
public struct Enumerator : IEnumerator<T>, 
    IDisposable, IEnumerator

(It's nested in List<T> of course.) Now that's odd... it's a struct. You don't see many custom structs around, beyond the familiar ones in the System namespace. And hang on, don't iterators fundamentally have to be mutable.

Ah. "Mutable value type" - a phrase to strike terror into the hearts of right-headed .NET developers everywhere.

So what's going on? If we're losing all the changes to the value, why is it printing "1, 1, 1" instead of throwing an exception due to printing out Current without first moving?

Well, we're fetching the iterator into a variable of type List<int>.Enumerator, and then calling ShowNext() three times. On each call, the value is boxed (creating a copy), and the reference to the box is passed to ShowNext().

Within ShowNext(), the value within the box changes when we call MoveNext() - which is how it's able to get the real first element with Current. So that mutation isn't lost... until we return from the method. The box is now eligible for garbage collection, and no change has been made to the iterator variable's value. On the next call to ShowNext(), a new box is created and we see the first item again...

How can we fix it?

There are various things we can do to fix the code - or at least, to make it display "1, 2, Done". We can then find other ways of breaking it again :)

Change the type of the values variable

How does the compiler work out the type of the iterator variable? Why, it looks at the return type of values.GetEnumerator(). And how does it find that? It looks at the type of the values variable, and then finds the GetEnumerator() method. In this case it finds List<int>.GetEnumerator(), so it makes the iterator variable type List<int>.Enumerator.

If suppose just change values to be of type IList<int> (or IEnumerable<int>, or ICollection<int>):

IList<int> values = new List<int> { 1, 2 };

The compiler uses the interface implementation of GetEnumerator() on List<T>. Now that could return a different type entirely - but it actually returns a boxed List<T>.Enumerator. We can see that by just printing out iterator.GetType().

So if it's just returning the same value as before, why does it work?

Well, this time we're boxing once - the iterator gets boxed on its way out of the GetEnumerator() method, and the same box is used for all three calls to ShowNext(). No extra copies are created, and the changes within the box don't get lost.

Change the type of the iterator variable

This is exactly the same as the previous fix - except we don't need to change the type of values. We can just explicitly state the type of iterator:

using (IEnumerator<int> iterator = values.GetEnumerator())

The reason this works is the same as before - we box once, and the changes within the box don't get lost.

Pass the iterator variable by reference

The initial problem was due to the mutations involved in ShowNext() getting lost due to repeated boxing. We've seen how to solve it by reducing the number of boxing operations down to one, but can we remove them entirely?

Well yes, we can. If we want changes to the value of the parameter in ShowNext() to be propagated back to the caller, we just need to pass the variable by reference. When passing by reference the parameter and argument types have to match exactly of course, so we can't leave the iterator variable being type List<T>.Enumerator without changing the parameter type. Now we could explicitly change the type of the parameter to List<T>.Enumerator - but that would tie our implementation down rather a lot, wouldn't it? Let's use generics instead:

static void ShowNext<T>(ref T iterator)
    where T : IEnumerator<int>

Now we can pass iterator by reference and the compiler will infer the type. The interface members (MoveNext() and Current) will be called using constrained calls, so there's no boxing involved...

... except that when you try to just change the method calls to use ref, it doesn't work - because apparently you can't pass a "using variable" by reference. I'd never come across that rule before. Interesting. Fortunately, we can (roughly) expand out the using statement ourselves, like this:

var iterator = values.GetEnumerator();
try
{
    ShowNext(ref iterator);
    ShowNext(ref iterator);
    ShowNext(ref iterator);
}
finally
{
    iterator.Dispose();
}

Again, this fixes the problem - and this time there's no boxing involved.

Let's quickly look at one more example of it not working, before I finish...

Dynamic typing to the anti-rescue

What happens if we change the type of iterator to dynamic (and set everything else back the way it was)? I'll readily admit, I really didn't know what was going to happen here. There are two competing forces:

  • The dynamic type is often really just object behind the scenes... so it will be boxed once, right? That means the changes within the box won't get lost. (This would give "1, 2, Done")
  • The dynamic type is in many ways meant to act as if you'd declared a variable of the type which it actually turns out to be at execution time - so in this case it should work as if the variable was of type List<int>.Enumerator, just like our original code. (This would give "1, 1, 1")

What actually happens? I believe it actually boxes the value returned from GetEnumerator() - and then the C# binder DLR makes sure that the value type behaviour is preserved by copying the box before passing it to ShowNext(). In other words, both bits of intuition are right, but the second effectively overrules the first. Wow. (See the comment below from Chris Burrows for more information about this. I'm sure he's right that it's the only design that makes sense. This is a pretty pathological example in various ways.)

Conclusion

Just say "no" to mutable value types. They do weird things.

(Fortunately the vast majority of the time this particular one won't be a problem - it's rare to use iterators explicitly in the first place, and when you do you very rarely pass them to another method.)

email it!bookmark it!digg it!

Original Post: Iterate, damn you!

Subscribe

New Feed

Product Spotlight

Recently Updated Sources

Legal Note

The content of the postings is owned by the respective author. CSharpFeeds is not responsible for the contents of the postings. This site is automatically generated and cannot be reviewed for abusive content. If you find abusive content on CSharpFeeds, please contact us. Designated trademarks and brands are the property of their respective owners. All rights reserved.

Advertise with us