📅︎ 25.02.2024

✏︎ by

Evy

⏱︎ 9 kg⋅eV ±12σ

C#’s two NULLs

In my work time i do a lot of C#. Inevitably i also run into a lot of nullable data. This is a short story of how i tried to make dealing with nullable data cleaner, coming from languages with monadic Option types. Did it go well? No.

Method Chaining & Pipelining

If you ever worked with a programming language that is at least inspired by functional ones, you may be familiar with the programming style of »method chaining« or »pipelining«. The idea is simple: You write your program as a series of transformations on your data, building a kind of assembly-line. And you keep this assembly-line going for as long as you can, not interrupting it with such… frivolous concepts as… control flow structures.

In my spare time i do a lot of Rust. In Rust you’ll find method chaining and monadic types almost everywhere. Here’s what such method chaining code looks like in actual production code:

// Excerpt from `bevy_ecs`, file `src/query/filter.rs`,
// line 495 onwards in an almost 1 year old version i
// happened to have on disk.
fetch
    .table_ticks
    .debug_checked_unwrap()
    .get(table_row.index())
    .deref()
    .is_newer_than(fetch.last_run, fetch.this_run)

Just a clean sequence of x.this().that().whatever(). That programming style has made it so far that even a control flow keyword was adjusted in its design to be chained: The infamous .await.

What if there is no data?

Consider what happens if you only sometimes have data to toss through your assembly-line. (Yes, i’m sticking with that naming here.) We either need to bypass the assembly-line when we have no data, or need to come up with a reasonable default value to throw around. Thankfully, in Rust we can do all that without breaking the flow with explicit ifs or matches. We’ll just use the Option<T> monad to continue our method chaining as usual:

// First, acquire the data… if it exists at all.
// We now either get `Option::Some(T)` or `Option::None`.
let the_stuff : Option<T> = … ;

// Now toss it into our assembly-line.
the_stuff
    .and_then(|something: T|
        // All this is skipped if we got `None`.
        something
            .this()
            .that()
            .whatever()
    )
    // We henceforth require an instance of `T`, no more
    // `None`s allowed! So let’s just return `the_stuff`
    // if it’s `Some`, or else return `some_default`.
    .unwrap_or(some_default)
    // All this code is mandatory? No problem, we now
    // definitely have an instance of `T` for the assembly-line.
    .mandatory_process()
    .let the_result;

I’m kidding, by the way. Unfortunately, .let, .for, .if, .match, .return, and .yield are not valid chained Rust keywords yet. I’m serious, by the way. I want them.

In my spare time i do a lot of C++. And even C++ can do this now, with the very, uhh, efficiently implemented std::optional<T> and std::expected<T, E> monads:

  std::expected<void, Error> Init() noexcept {
    return
      R(sqlite3_initialize())
      // Takes an `expected`, and only if it holds a value,
      // we run code that itself returns an `expected`.
      .and_then([](auto r) {
        (void) r;
        return R(sqlite3_auto_extension(&sqlite3_dbdata_init)));
      })
      // Takes an expected, and only if it holds a value,
      // we run code that returns a new value.
      .transform([](auto r) {
        (void) r;
        // Not interested in the result here. This is allowed to fail.
        (void) sqlite3_soft_heap_limit64(…);
      });
  }

I made some adjustments to make this code blog friendly. The R() function has a better name and converts SQLite result codes to a std::expected<T, E> monad. For all these SQLite result monads, i use the shorter type aliases Result and NoResult, and i removed a bunch of comments irrelevant to the topic.

Notice the .and_then and .transform calls. They almost look like the Rust examples. In Rust we just say .map instead of .transform.

Smells like NULL

This Option<T> or std::optional<T> monad may feel familiar in purpose. Aren’t nullable types kinda the same?

In my work time i do a lot of C#. In C#, we have explicitly nullable types, called T?. While a T only holds instances of T, a T? may instead hold a null, meaning it currently holds nothing. C# has also acquired some syntax sugar to deal with nullables in a nicer way than breaking out to that… unclean… nasty explicit control flow.

// We can provide defaults!
T?   maybeSomething = … ;
var alwaysSomething = maybeSomething ?? someDefault;

// We can call methods, only if the value isn’t `null`!
// Else we just forward the `null`.
string? maybeText = maybeSomething?.ToString();

// And of course, we can always break the flow:
if (maybeSomething is {} something) {
    var definitelyText = something.ToString();
}

Notice all the question marks. ?? offers a fallback value, x?.F() calls F only if x ain’t null. Even if what F() returns is never null, with the ?. method call operator it now can be null.

Let’s try and take one of our previous Rust examples and translate it into C# with pure assembly-lining:

// First, acquire the data… if it exists at all.
// We now either get `T` or `null`.
T? theStuff = … ;

// Now toss it into our assembly-line.
var theResult
    = (theStuff
    ?.This()
    ?.That()
    ?.Whatever()
    ?? someDefault)
    .MandatoryProcess();

That looks… almost not terrible. Did you notice the () around the ??, however? We can’t just call .MandatoryProcess() without them, or we’d incorrectly call it on someDefault only. But that’s not all. In the Rust example, we were able to pass our in-flight data into an arbitrary closure, inside of which we may even be calling freestanding functions or methods on other objects. In C#, we can only do that by either falling back onto if (x is {} xNotNull), or wrapping that quick snippet of special-purpose code into an extension method.

C# AndThen NULL

It would be quite nice, if — with the power of these extension methods — we could turn any T? in C# into an assembly-line-friendly monad just like Option<T> or std::optional<T>. To do that, we’d want to at least add the methods .Map, .AndThen, and .OrDefault. We can be clever here and merge .Map and .AndThen into one by utilising the fact that you can assign any value of type T to any variable of type T?. .Map is just a .AndThen where the result of our closure shouldn’t be null. So let’s discard .Map and focus on .AndThen.

public static R? AndThen<T, R>(this T? self, Func<T, R?> doThat) {
  if (self is {} something) {
    return doThat.Invoke(something);
  }
  else {
    // ERROR: Cannot convert null to type parameter 'R'
    // because it could be a non-nullable value type.
    // Consider using 'default(R)' instead.
    return null;
  }
}

Well that’s strange. Our return type clearly states that R? is either something or null, and we return null. It looks like our ? annotation is completely ignored here, and C# only bothers with whether R is a »value type«. Let’s see what happens if we return default(R); instead:

string? a_s = null;
string? b_s = "wat";

string? x_s = a_s.AndThen(s => s + "…"); // null
string? y_s = b_s.AndThen(s => s + "…"); // "wat…"

int? a_i = null;
int? b_i = 7;

int? x_i = a_i.AndThen(i => i + 2); // null
int? y_i = b_i.AndThen(i => i + 2); // 9

That looks promising! Let’s mix it up a bit, literally:

string? a_s = null;
string? b_s = "7";

int? x_s = a_s.AndThen(s => int.Parse(s)); // 0
int? y_s = b_s.AndThen(s => int.Parse(s)); // 7

int? a_i = null;
int? b_i = 7;

string? x_i = a_i.AndThen(i => i.ToString()); // null
string? y_i = b_i.AndThen(i => i.ToString()); // "7"

What? Did you see that? x_s should’ve been null, but instead it’s 0! And there is one more nasty trap in this code. Let’s add some type annotations to make a more subtle bug show itself:

// ERROR: 'int?' does not contain a definition for 'AndThen'
// and the best extension method overload
// 'Ext.AndThen<int, string>(int, Func<int, string?>)'
// requires a receiver of type 'int'
string? x_i = a_i.AndThen<int, string>(i => i.ToString());

What C# is telling us here is that it treats a_i as an int, not as an int? when we remind it of the types it should be using. What C# actually inferred, however, is this incorrect signature:

// Now it works!    ... kinda?
string? x_i = a_i.AndThen<int?, string>(i => i.ToString());

It’s a subtle thing and may not bother you in practice, but the i in our function is now typed int?, i.e. »null allowed«. The whole point of this method is to statically rule null out, however. Well, that, and to be assembly-lining-friendly. And what was with that 0? That’s an actual bug, and it’s caused by default(R) not being what we wanted. The default of R, here int, is 0, not null.

Something’s off…

C#’s Two NULLs

I did indeed mix something up, and that is »value types« and »reference types«. int is our value type, living happily on our program stack, while string is our reference type, where the actual text data lives on the GC heap and but a pointer to it lives in our program stack. Pointer? That sounds null-y. All values of reference types were nullable, which is implemented by just storing the null pointer in them. Explicit nullability was a privilege of the value types.

When you write int?, it’s syntax sugar for Nullable<int>. Looks an awful lot like std::optional<int>, don’t it? It’s just another value type with a bool hasValue inside. You store null, that bool is false. Later with the nullable syntax for reference types, like string?, our above code looks the same, but it’s only sugar on top of this old design. We have null references, and we have null Nullable<T>s. This is why C# complained about return null and asked for return default(R) instead.

int? and string? have two different nulls!

Let’s fix our code such that we return null again. To do that, we have to tell C#, whether our types are reference or value types. First, start with the old-fashioned reference types only.

public static R? AndThen<T, R>(this T? self, Func<T, R?> doThat)
where T: class
where R: class {
    if (self is {} something) {
        return doThat.Invoke(something);
    }
    else {
        return null;
    }
}

With this, our old string → string code works just fine again. How about int → int? Well, we just have to… duplicate the entirety of that function and replace class with struct. Easy, right?

public static R? AndThen<T, R>(this T? self, Func<T, R?> doThat)
where T: struct
where R: struct {
    … same …
}

Now C# complains:

// ERROR: The type 'int?' must be a reference type in order to
// use it as parameter 'T' in the generic type or method
// 'Ext.AndThen<T, R>(T?, Func<T, R?>)'
int? x_i = a_i.AndThen(i => i + 2);

What happened is that C# is too stupid to figure out to use the where T: struct overload instead of the where T: class one. Amazing. We can make this code work, at a price:

// Yes, C# is too stupid to figure that one out itself.
int? x_i = a_i.AndThen<int, int>(i => i + 2);

I hope our code doesn’t have any bugs, lest we have to fix it in two identical places. Nothing a bit of code generation can’t help with, though, right? So let’s just accept this as a fact of life and move on. int → int works-ish, huzzah, and it didn’t break string → string. Now what could possibly go wrong for, say, int → string?

public static R? AndThen<T, R>(this T? self, Func<T, R?> doThat)
where T: class
where R: struct {
    … same …
}
public static R? AndThen<T, R>(this T? self, Func<T, R?> doThat)
where T: struct
where R: class {
    … same …
}

Yes, we have 2 types involved, each one being either class or struct. So in order to fully cover all cases, we have to ~~copy-paste~~ implement the almost exact same junk 2²=4 times. If we had 3 types involved, we’d be looking at 2³=8 redundant implementations! Good thing C# doesn’t have variadic generics, right?

Of course, to rub some salt into this wound, C#’s type inferrence proved itself subpar once again:

// ERROR!
int? x_s = a_s.AndThen(s => int.Parse(s));

// GOOD!
int? x_s = a_s.AndThen<string, int>(s => int.Parse(s));

Great, we can now do method chaining on any T? as if ’twas a monad. We saw the implementation of .AndThen, we leave the implementation of .OrDefault as an exercise to you, the reader.

What were we doing again?

Here’s a reminder: All of this started with method chaining, structuring code like an assembly line. We’ve seen that C# has some nice syntax sugar for dealing with nulls:

x ?? default is Rust’s .unwrap_or, but it chains poorly.
x?.DoStuff() is Rust’s .map(T::do_stuff), but it can only deal with methods on x.

And we’ve already seen examples of wanting to deal with not-methods-of-x: int.Parse(string), or just i + 2.

Well, with .AndThen we finally can! At the price that sometimes our code will not compile, unless we babysit C# and explicitly annotate the generic types. And write 4 almost identical implementations of .AndThen, for as we discovered, C#’s type system leaks implementation details of value and reference types, which show up whenever we have to deal with null. It’s easier to just break the flow.

What could have been…

So, how did Rust implement its Option<T>, combatting this 2ⁿ explosion of redundant implementations for values vs. references?

#[derive(Copy, PartialOrd, Eq, Ord, Debug, Hash)]
pub enum Option<T> {
    None,
    Some(T),
}

impl<T> Option<T> {
    pub fn and_then<U, F>(self, f: F) -> Option<U>
    where
        F: FnOnce(T) -> Option<U>,
    {
        match self {
            Some(x) => f(x),
            None => None,
        }
    }
}

In my spare time i don’t do a lot of C#.

Terms used

Monad

A wrapper type around control flow structures.

if…else? Option.
loop…break? Iterator.
await…return? Future.
try…catch? Result.

Or, you know, that endofunctor thing.

C#’s two NULLs

On This Page:

Method Chaining & Pipelining

What if there is no data?

Smells like NULL

C# AndThen NULL

C#’s Two NULLs

What were we doing again?

What could have been…

Terms used

Evyl Blue