[c#] Why do we need boxing and unboxing in C#?

Why do we need boxing and unboxing in C#?

I know what boxing and unboxing is, but I can't comprehend the real use of it. Why and where should I use it?

short s = 25;

object objshort = s;  //Boxing

short anothershort = (short)objshort;  //Unboxing

This question is related to c# .net boxing

The answer is


The best way to understand this is to look at lower-level programming languages C# builds on.

In the lowest-level languages like C, all variables go one place: The Stack. Each time you declare a variable it goes on the Stack. They can only be primitive values, like a bool, a byte, a 32-bit int, a 32-bit uint, etc. The Stack is both simple and fast. As variables are added they just go one on top of another, so the first you declare sits at say, 0x00, the next at 0x01, the next at 0x02 in RAM, etc. In addition, variables are often pre-addressed at compile-time, so their address is known before you even run the program.

In the next level up, like C++, a second memory structure called the Heap is introduced. You still mostly live in the Stack, but special ints called Pointers can be added to the Stack, that store the memory address for the first byte of an Object, and that Object lives in the Heap. The Heap is kind of a mess and somewhat expensive to maintain, because unlike Stack variables they don't pile linearly up and then down as a program executes. They can come and go in no particular sequence, and they can grow and shrink.

Dealing with pointers is hard. They're the cause of memory leaks, buffer overruns, and frustration. C# to the rescue.

At a higher level, C#, you don't need to think about pointers - the .Net framework (written in C++) thinks about these for you and presents them to you as References to Objects, and for performance, lets you store simpler values like bools, bytes and ints as Value Types. Underneath the hood, Objects and stuff that instantiates a Class go on the expensive, Memory-Managed Heap, while Value Types go in that same Stack you had in low-level C - super-fast.

For the sake of keeping the interaction between these 2 fundamentally different concepts of memory (and strategies for storage) simple from a coder's perspective, Value Types can be Boxed at any time. Boxing causes the value to be copied from the Stack, put in an Object, and placed on the Heap - more expensive, but, fluid interaction with the Reference world. As other answers point out, this will occur when you for example say:

bool b = false; // Cheap, on Stack
object o = b; // Legal, easy to code, but complex - Boxing!
bool b2 = (bool)o; // Unboxing!

A strong illustration of the advantage of Boxing is a check for null:

if (b == null) // Will not compile - bools can't be null
if (o == null) // Will compile and always return false

Our object o is technically an address in the Stack that points to a copy of our bool b, which has been copied to the Heap. We can check o for null because the bool's been Boxed and put there.

In general you should avoid Boxing unless you need it, for example to pass an int/bool/whatever as an object to an argument. There are some basic structures in .Net that still demand passing Value Types as object (and so require Boxing), but for the most part you should never need to Box.

A non-exhaustive list of historical C# structures that require Boxing, that you should avoid:

  • The Event system turns out to have a Race Condition in naive use of it, and it doesn't support async. Add in the Boxing problem and it should probably be avoided. (You could replace it for example with an async event system that uses Generics.)

  • The old Threading and Timer models forced a Box on their parameters but have been replaced by async/await which are far cleaner and more efficient.

  • The .Net 1.1 Collections relied entirely on Boxing, because they came before Generics. These are still kicking around in System.Collections. In any new code you should be using the Collections from System.Collections.Generic, which in addition to avoiding Boxing also provide you with stronger type-safety.

You should avoid declaring or passing your Value Types as objects, unless you have to deal with the above historical problems that force Boxing, and you want to avoid the performance hit of Boxing it later when you know it's going to be Boxed anyway.

Per Mikael's suggestion below:

Do This

using System.Collections.Generic;

var employeeCount = 5;
var list = new List<int>(10);

Not This

using System.Collections;

Int32 employeeCount = 5;
var list = new ArrayList(10);

Update

This answer originally suggested Int32, Bool etc cause boxing, when in fact they are simple aliases for Value Types. That is, .Net has types like Bool, Int32, String, and C# aliases them to bool, int, string, without any functional difference.


In .net, every instance of Object, or any type derived therefrom, includes a data structure which contains information about its type. "Real" value types in .net do not contain any such information. To allow data in value types to be manipulated by routines that expect to receive types derived from object, the system automatically defines for each value type a corresponding class type with the same members and fields. Boxing creates a new instances of this class type, copying the fields from a value type instance. Unboxing copies the fields from an instance of the class type to an instance of the value type. All of the class types which are created from value types are derived from the ironically named class ValueType (which, despite its name, is actually a reference type).


When a method only takes a reference type as a parameter (say a generic method constrained to be a class via the new constraint), you will not be able to pass a reference type to it and have to box it.

This is also true for any methods that take object as a parameter - this will have to be a reference type.


The last place I had to unbox something was when writing some code that retrieved some data from a database (I wasn't using LINQ to SQL, just plain old ADO.NET):

int myIntValue = (int)reader["MyIntValue"];

Basically, if you're working with older APIs before generics, you'll encounter boxing. Other than that, it isn't that common.


Boxing isn't really something that you use - it is something the runtime uses so that you can handle reference and value types in the same way when necessary. For example, if you used an ArrayList to hold a list of integers, the integers got boxed to fit in the object-type slots in the ArrayList.

Using generic collections now, this pretty much goes away. If you create a List<int>, there is no boxing done - the List<int> can hold the integers directly.


Boxing is required, when we have a function that needs object as a parameter, but we have different value types that need to be passed, in that case we need to first convert value types to object data types before passing it to the function.

I don't think that is true, try this instead:

class Program
    {
        static void Main(string[] args)
        {
            int x = 4;
            test(x);
        }

        static void test(object o)
        {
            Console.WriteLine(o.ToString());
        }
    }

That runs just fine, I didn't use boxing/unboxing. (Unless the compiler does that behind the scenes?)


In the .NET framework, there are two species of types--value types and reference types. This is relatively common in OO languages.

One of the important features of object oriented languages is the ability to handle instances in a type-agnostic manner. This is referred to as polymorphism. Since we want to take advantage of polymorphism, but we have two different species of types, there has to be some way to bring them together so we can handle one or the other the same way.

Now, back in the olden days (1.0 of Microsoft.NET), there weren't this newfangled generics hullabaloo. You couldn't write a method that had a single argument that could service a value type and a reference type. That's a violation of polymorphism. So boxing was adopted as a means to coerce a value type into an object.

If this wasn't possible, the framework would be littered with methods and classes whose only purpose was to accept the other species of type. Not only that, but since value types don't truly share a common type ancestor, you'd have to have a different method overload for each value type (bit, byte, int16, int32, etc etc etc).

Boxing prevented this from happening. And that's why the British celebrate Boxing Day.


Boxing and Unboxing are specifically used to treat value-type objects as reference-type; moving their actual value to the managed heap and accessing their value by reference.

Without boxing and unboxing you could never pass value-types by reference; and that means you could not pass value-types as instances of Object.


Boxing is the conversion of a value to a reference type with the data at some offset in an object on the heap.

As for what boxing actually does. Here are some examples

Mono C++

void* mono_object_unbox (MonoObject *obj)
 {    
MONO_EXTERNAL_ONLY_GC_UNSAFE (void*, mono_object_unbox_internal (obj));
 }

#define MONO_EXTERNAL_ONLY_GC_UNSAFE(t, expr) \
    t result;       \
    MONO_ENTER_GC_UNSAFE;   \
    result = expr;      \
    MONO_EXIT_GC_UNSAFE;    \
    return result;

static inline gpointer
mono_object_unbox_internal (MonoObject *obj)
{
    /* add assert for valuetypes? */
    g_assert (m_class_is_valuetype (mono_object_class (obj)));
    return mono_object_get_data (obj);
}

static inline gpointer
mono_object_get_data (MonoObject *o)
{
    return (guint8*)o + MONO_ABI_SIZEOF (MonoObject);
}

#define MONO_ABI_SIZEOF(type) (MONO_STRUCT_SIZE (type))
#define MONO_STRUCT_SIZE(struct) MONO_SIZEOF_ ## struct
#define MONO_SIZEOF_MonoObject (2 * MONO_SIZEOF_gpointer)

typedef struct {
    MonoVTable *vtable;
    MonoThreadsSync *synchronisation;
} MonoObject;

Unboxing an object in Mono is a process of casting a pointer at an offset of 2 gpointers in the object (e.g. 16 bytes). A gpointer is a void*. This makes sense when looking at the definition of MonoObject as it's clearly just a header for the data.

C++

To box a value in C++ you could do something like:

#include <iostream>
#define Object void*

template<class T> Object box(T j){
  return new T(j);
}

template<class T> T unbox(Object j){
  T temp = *(T*)j;
  delete j;
  return temp;
}

int main() {
  int j=2;
  Object o = box(j);
  int k = unbox<int>(o);
  std::cout << k;
}

In general, you typically will want to avoid boxing your value types.

However, there are rare occurances where this is useful. If you need to target the 1.1 framework, for example, you will not have access to the generic collections. Any use of the collections in .NET 1.1 would require treating your value type as a System.Object, which causes boxing/unboxing.

There are still cases for this to be useful in .NET 2.0+. Any time you want to take advantage of the fact that all types, including value types, can be treated as an object directly, you may need to use boxing/unboxing. This can be handy at times, since it allows you to save any type in a collection (by using object instead of T in a generic collection), but in general, it is better to avoid this, as you're losing type safety. The one case where boxing frequently occurs, though, is when you're using Reflection - many of the calls in reflection will require boxing/unboxing when working with value types, since the type is not known in advance.