Monday 24 December 2012

When is a bool not a bool?

The answer: it never was in any modern computer. It's just an integer that holds one of two special values. A computer is incapable of storing a single bit by itself. All modern programming languages store a boolean value as one or more bytes behind the scenes, even though it would theoretically fit into one bit. But enough theory. Let's examine the implementation details of a bool in C#.

First of all, lets figure out how to read and write the internal integer value of a bool. There are at least two ways to do so. The safe way looks like this:

[StructLayout(LayoutKind.Explicit)]
struct BoolIntUnion
{
    [FieldOffset(0)]
    public bool Bool;
    [FieldOffset(0)]
    public int Int;
}

If you're wondering what that is, it's the C# version of the C/C++ union. Both the bool and the int are in the same memory location, and thus share the same value. What this means is that changes to one variable are reflected in the other. What it means to us, right now, is that we can change the internal value of the bool at will. Allow me to demonstrate.

BoolIntUnion u = new BoolIntUnion();
u.Bool = true;
Console.WriteLine(u.Int);

Can you guess what this prints? It's not too hard. It simply outputs 1. So, this means that the internal value of true is actually 1. Similarly, false turns out to be 0. (By the way, bool in C# is stored as 4 bytes -- at least in x86, which is why I paired it with an int.)

I did mention that there is another way to convert between bool and int. This uses unsafe code.

bool b = true;
bool *pB = &b;
int *pI = (int*)pB;
int i = *pI;

From now on, I'm not going to write out this conversion code every time. Instead, I'll use these utility methods:

static unsafe int Int(bool value)
{
    return *(int*)(&value);
}

static unsafe bool Bool(int value)
{
    return *(bool*)(&value);
}

Now, lets start our investigation. If true is 1 and false is 0, what about the other 4294967294 possible values? Are they true, false, or neither? As it turns out, none of those options fully describe the behavior of bool.

First, what happens if we convert an abnormal bool value to a string? Lets see.

Console.WriteLine(Bool(42));

Did you expect "True", "False", empty string, or maybe even "FileNotFound"? Well, the answer is "True". Even though the internal value is 42 instead of 1, it still results in "True".

Now let's try it as a condition in an if.

if (Bool(42))
{
    Console.WriteLine("if");
}
else
{
    Console.WriteLine("else");
}

Not surprisingly, this prints "if".

So far, it seems to be acting just as if it were the value true itself. Let's try comparing it directly to true.

Console.WriteLine(Bool(42) == true);

And we get... "True". So, any non-zero integer value stored in a bool is identical to true. That was easy. End of story.

Nope.

Your compiler lied to you.

Even if you compiled with optimizations disabled, your compiler performed a small optimization that happened to give us the wrong result.

Have a look at the generated IL for Main, pulled out of ILSpy (sorry, no syntax highlighting):

.method private hidebysig static 
 void Main (
  string[] args
 ) cil managed 
{
 // Method begins at RVA 0x2050
 // Code size 15 (0xf)
 .maxstack 8
 .entrypoint

 IL_0000: nop
 IL_0001: ldc.i4.s 42
 IL_0003: call bool booltest.Program::Bool(int32)
 IL_0008: call void [mscorlib]System.Console::WriteLine(bool)
 IL_000d: nop
 IL_000e: ret
} // end of method Program::Main

Wait a second. Isn't there supposed to be a comparison between the two calls? There is, but the compiler removed it, as it considers == true to be a no-op.

What we can do to fix this? We can avoid hard-coding the value, and instead calculate it at runtime. For example:

Console.WriteLine(Bool(42) == bool.Parse(bool.TrueString));

Now it prints "False". Obviously the compiler wasn't expecting what we did, so we got the wrong result initially.

We can see now that not all non-zero values are identical. In fact, internally, bools are treated just like ints. As in C/C++, a non-zero integer is a 'truthy' value when used as a boolean. In C#, however, the language goes to great lengths to ensure a bool can only contain 0 or 1, and that an int can never be treated as a bool. As you have seen, there are ways to circumvent this, and when we do, we end up with behavior similar to C/C++.

I hope you learned something by reading this, as I certainly didn't know all this until recently. In reality, bool is just an int and some syntactic sugar. This might be completely expected by C++ programmers, but for people that have only used C#, it may come as a surprise. At the very least, I hope everyone who read this learned something they didn't know before.

Class dismissed.