C#’s two NULLs
In my work time i do a lot of C#. Inevitably i also run into a lot of nullable data. This is a short story of how i tried to make dealing with nullable data cleaner, coming from languages with monadic Option types. Did it go well? No.
On This Page:
Method Chaining & Pipelining
If you ever worked with a programming language that is at least inspired by functional ones, you may be familiar with the programming style of »method chaining« or »pipelining«. The idea is simple: You write your program as a series of transformations on your data, building a kind of assembly-line. And you keep this assembly-line going for as long as you can, not interrupting it with such… frivolous concepts as… control flow structures.
In my spare time i do a lot of Rust. In Rust you’ll find method chaining and monadic types almost everywhere. Here’s what such method chaining code looks like in actual production code:
// Excerpt from `bevy_ecs`, file `src/query/filter.rs`,
// line 495 onwards in an almost 1 year old version i
// happened to have on disk.
fetch
.table_ticks
.debug_checked_unwrap
.get
.deref
.is_newer_than
Just a clean sequence of x.this().that().whatever()
. That
programming style has made it so far that even a control flow
keyword was adjusted in its design to be chained:
The infamous .await
.
What if there is no data?
Consider what happens if you only sometimes have data
to toss through your assembly-line. (Yes, i’m sticking
with that naming here.) We either need to bypass
the assembly-line when we have no data, or need to come up with
a reasonable default value to throw around. Thankfully, in
Rust we can do all that without breaking the flow with explicit
if
s or match
es. We’ll just use the Option<T>
monad
to continue our method chaining as usual:
// First, acquire the data… if it exists at all.
// We now either get `Option::Some(T)` or `Option::None`.
let the_stuff : = … ;
// Now toss it into our assembly-line.
the_stuff
.and_then
// We henceforth require an instance of `T`, no more
// `None`s allowed! So let’s just return `the_stuff`
// if it’s `Some`, or else return `some_default`.
.unwrap_or
// All this code is mandatory? No problem, we now
// definitely have an instance of `T` for the assembly-line.
.mandatory_process
.let the_result;
In my spare time i do a lot of C++.
And even C++ can do this now, with the very, uhh, efficiently
implemented std::optional<T>
and std::expected<T, E>
monads:
std::expected<void, Error>
Notice the .and_then
and .transform
calls. They almost
look like the Rust examples. In Rust we just say .map
instead of .transform
.
Smells like NULL
This Option<T>
or std::optional<T>
monad may feel familiar
in purpose. Aren’t null
able types kinda the same?
In my work time i do a lot of C#. In C#, we have explicitly
null
able types, called T?
. While a T
only holds
instances of T
, a T?
may instead hold a null
, meaning
it currently holds nothing. C# has also acquired some syntax
sugar to deal with null
ables in a nicer way than breaking
out to that… unclean… nasty explicit control flow.
// We can provide defaults!
T? maybeSomething = … ;
var alwaysSomething = maybeSomething ?? someDefault;
// We can call methods, only if the value isn’t `null`!
// Else we just forward the `null`.
string? maybeText = maybeSomething?.;
// And of course, we can always break the flow:
if } something)
Notice all the question marks. ??
offers a fallback value,
x?.F()
calls F
only if x
ain’t null
. Even if what
F()
returns is never null
, with the ?.
method call
operator it now can be null
.
Let’s try and take one of our previous Rust examples and translate it into C# with pure assembly-lining:
// First, acquire the data… if it exists at all.
// We now either get `T` or `null`.
T? theStuff = … ;
// Now toss it into our assembly-line.
var theResult
=
.;
That looks… almost not terrible. Did you notice the ()
around the ??
, however? We can’t just call .MandatoryProcess()
without them, or we’d incorrectly call it on someDefault
only. But that’s not all. In the Rust example, we were able
to pass our in-flight data into an arbitrary closure, inside
of which we may even be calling freestanding functions or
methods on other objects. In C#, we can only do that by
either falling back onto if (x is {} xNotNull)
, or
wrapping that quick snippet of special-purpose code into an
extension method.
C# AndThen NULL
It would be quite nice, if — with the power of these extension
methods — we could turn any T?
in C# into an
assembly-line-friendly monad just like Option<T>
or
std::optional<T>
. To do that, we’d want to at least add
the methods .Map
, .AndThen
, and .OrDefault
. We can be
clever here and merge .Map
and .AndThen
into one by
utilising the fact that you can assign any value of type T
to any variable of type T?
. .Map
is just a .AndThen
where the result of our closure shouldn’t be null
. So let’s
discard .Map
and focus on .AndThen
.
public static R? something)
else {
// ERROR: Cannot convert null to type parameter 'R'
// because it could be a non-nullable value type.
// Consider using 'default(R)' instead.
return null;
}
}
Well that’s strange. Our return type clearly states that R?
is either something or null
, and we return null
. It looks
like our ?
annotation is completely ignored here, and C#
only bothers with whether R
is a »value type«. Let’s
see what happens if we return default(R);
instead:
string? a_s = null;
string? b_s = "wat";
string? x_s = a_s.; // null
string? y_s = b_s.; // "wat…"
int? a_i = null;
int? b_i = 7;
int? x_i = a_i.; // null
int? y_i = b_i.; // 9
That looks promising! Let’s mix it up a bit, literally:
string? a_s = null;
string? b_s = "7";
int? x_s = a_s.; // 0
int? y_s = b_s.; // 7
int? a_i = null;
int? b_i = 7;
string? x_i = a_i.; // null
string? y_i = b_i.; // "7"
What? Did you see that? x_s
should’ve been null
, but
instead it’s 0
! And there is one more nasty trap in this
code. Let’s add some type annotations to make a more subtle
bug show itself:
// ERROR: 'int?' does not contain a definition for 'AndThen'
// and the best extension method overload
// 'Ext.AndThen<int, string>(int, Func<int, string?>)'
// requires a receiver of type 'int'
string? x_i = a_i.;
What C# is telling us here is that it treats a_i
as an
int
, not as an int?
when we remind it of the types it
should be using. What C# actually inferred, however, is
this incorrect signature:
// Now it works! ... kinda?
string? x_i = a_i.;
It’s a subtle thing and may not bother you in practice,
but the i
in our function is now typed int?
, i.e.
»null
allowed«. The whole point of this method is to
statically rule null
out, however. Well, that, and to
be assembly-lining-friendly. And what was with that 0
?
That’s an actual bug, and it’s caused by default(R)
not
being what we wanted. The default
of R
, here int
, is
0
, not null
.
Something’s off…
C#’s Two NULLs
I did indeed mix something up, and that is »value types« and
»reference types«. int
is our value type, living happily
on our program stack, while string
is our reference type,
where the actual text data lives on the GC heap and but a
pointer to it lives in our program stack. Pointer? That sounds
null
-y. All values of reference types were
null
able, which is implemented by just storing the null
pointer in them. Explicit null
ability was a privilege of
the value types.
When you write int?
, it’s syntax sugar for Nullable<int>
.
Looks an awful lot like std::optional<int>
, don’t it? It’s
just another value type with a bool hasValue
inside. You
store null
, that bool
is false
. Later with the nullable
syntax for reference types, like string?
, our above code
looks the same, but it’s only sugar on top of this old
design. We have null
references, and we have null
Nullable<T>
s. This is why C# complained about return null
and asked for return default(R)
instead.
int?
and string?
have two different null
s!
Let’s fix our code such that we return null
again. To do that,
we have to tell C#, whether our types are reference or value
types. First, start with the old-fashioned reference types only.
public static R? something)
else {
return null;
}
}
With this, our old string → string
code works just fine again.
How about int → int
? Well, we just have to… duplicate the
entirety of that function and replace class
with struct
.
Easy, right?
public static R?
Now C# complains:
// ERROR: The type 'int?' must be a reference type in order to
// use it as parameter 'T' in the generic type or method
// 'Ext.AndThen<T, R>(T?, Func<T, R?>)'
int? x_i = a_i.;
What happened is that C# is too stupid to figure out to use
the where T: struct
overload instead of the where T: class
one. Amazing. We can make this code work, at a price:
// Yes, C# is too stupid to figure that one out itself.
int? x_i = a_i.;
I hope our code doesn’t have any bugs, lest we have to fix it
in two identical places. Nothing a bit of code generation
can’t help with, though, right? So let’s just accept this
as a fact of life and move on. int → int
works-ish, huzzah,
and it didn’t break string → string
. Now what could possibly
go wrong for, say, int → string
?
public static R?
public static R?
Yes, we have 2 types involved, each one being
either class
or struct
. So in order to fully cover all
cases, we have to copy-paste implement
the almost exact same junk 2²=4 times. If we had 3 types
involved, we’d be looking at 2³=8 redundant implementations!
Good thing C# doesn’t have variadic generics, right?
Of course, to rub some salt into this wound, C#’s type inferrence proved itself subpar once again:
// ERROR!
int? x_s = a_s.;
// GOOD!
int? x_s = a_s.;
Great, we can now do method chaining on any T?
as if ’twas
a monad. We saw the implementation of .AndThen
, we leave
the implementation of .OrDefault
as an exercise to you, the
reader.
What were we doing again?
Here’s a reminder: All of this started with method chaining,
structuring code like an assembly line. We’ve seen that C#
has some nice syntax sugar for dealing with null
s:
x ?? default
is Rust’s.unwrap_or
, but it chains poorly.x?.DoStuff()
is Rust’s.map(T::do_stuff)
, but it can only deal with methods onx
.
And we’ve already seen examples of wanting to deal with
not-methods-of-x
: int.Parse(string)
, or just i + 2
.
Well, with .AndThen
we finally can! At the price that
sometimes our code will not compile, unless we babysit
C# and explicitly annotate the generic types. And write
4 almost identical implementations of .AndThen
, for as
we discovered, C#’s type system leaks implementation details
of value and reference types, which show up whenever we have
to deal with null
. It’s easier to just break the flow.
What could have been…
So, how did Rust implement its Option<T>
, combatting this
2ⁿ explosion of redundant implementations for
values vs. references?
In my spare time i don’t do a lot of C#.
Terms used
- Monad
-
A wrapper type around control flow structures.
if…else
?Option
.loop…break
?Iterator
.await…return
?Future
.try…catch
?Result
.