Negative Overhead Abstractions

Today i’d like to coin a new term: The Negative Overhead Abstraction™. In this post we’ll briefly explore C++’s use of them, their relation to zero overhead abstractions, why they suck, and how we could do better.

The overhead of indexing an array

Array index operators are a popular abstraction over manually adding offsets to addresses to then dereference some pointers. Here’s how you do that in C++:

int touch_mem_1(const std::vector<int>& data, size_t index) {
  // Semantically equivalent to `*(data.data() + index)`,
  // the manual pointer magic we’re abstracting away.
  return data[index];
}

The above C++ code is not memory safe. We accept any arbitrary index and will blindly trust it being inside data’s bounds. But it doesn’t add any runtime overhead via, say, bounds checking, so people usually think of C++’s index operator implementation as being »zero overhead«.

int touch_mem_2(const std::vector<int>& data, size_t index) {
  // This performs a bounds check and throws an exception if the check
  // fails. The exception mechanism adds unwinding tables to our
  // generated code. In case you were screaming at `touch_mem_1` from
  // before: This is why i passed in a `std::vector<T>&`, not a C array.
  // Also, i haven’t written much C++ since before ranges came about,
  // so i didn’t want to risk any API mistakes in using them for this
  // blog post.
  return data.at(index);
}

This variant, on the other hand, is considered being »positive overhead«. Both, because adding bounds checks is a positive thing^{[citation needed]}, and because it adds overhead in code to be executed and data added to the program binary. Before, we didn’t have to deal with exceptions and stack unwinding, now we do, even though exceptions aren’t related to the problem at hand. We assume the index is in range, that there’s data at that index to fetch for us. We didn’t ask for being able to handle the case where there is no data, so even semantically, the exception machinery adds overhead†.

Now consider this semantically equivalent code snippet in Rust:

fn touch_mem_3(data: &[i32], index: usize) -> i32 {
  data[index]
}

If we were to compile code like that with panic="abort", there would be no additional unwinding tables. This code performs a bounds check just like C++ using std::vector<T>::at(size_t). Unlike C++, however, it doesn’t construct an exception object to toss around. Rust instead controlledly crashes our application with a panic!. Now where would this Rust code sit on the sliding scale of overheadedness? Definitely somewhere in between touch_mem_1 and touch_mem_2.

The 3 kinds of overhead

Taking these three examples above, i propose categorising them as such:

touch_mem_1 has negative overhead.
touch_mem_3 has zero overhead.
touch_mem_2 has positive overhead.

I think we all agree on ③ here.

I did point out that under the premise of »memory safety«, touch_mem_1 is incorrect. There’s probably an old software developer joke coming to your mind:

I optimised my software so it does the incorrect thing faster!

And really, what worth is runtime code execution speed, or runtime memory use, or program binary size, if that highly efficient thing is only efficient at not or barely doing the thing we need it to? Our assumption is that the index is in bounds of data. We cannot guarantee that assumption at compile time, so we have to be sceptical. We have to verify our assumption via bounds check. As such, i claim that having that bounds check is the real overhead baseline for a correct program. This is »zero overhead«. How we then decide to deal with the error case makes the difference between zero and positive.

So touch_mem_2 has positive overhead, because if unwind tables weren’t optimised away, they are an added cost on top of the base line logic implemented in touch_mem_3.

Now it probably also makes sense how touch_mem_1 has »negative overhead«. It removes stuff, compared to our shifted baseline touch_mem_3, though at the cost of having an incorrect program. And we have many such negative overhead abstractions all around us, beyond just C++’s poor default for array indexing. Another fun example of it is compiling with -ffast-math, which among other things introduces unpredictable rounding errors for the sake of making your floating-point number crunching a few CPU cycles faster. Another silly example of negative overhead abstractions is feature removal. Parsing a CSV file faster by using an inexact floating-point number parser, or speeding up SQLite by disabling all mutexes.

Is bounds checking always zero?

No. This time, allow me to show you some compiler output from Godbolt. I didn’t do that earlier, for without a calling context, we wouldn’t see any of the claimed unwinding table overheads. Now, we do get a calling context to consider:

pub fn magic(x: [i32; 20]) -> i32 {
  touch_mem_3(&x[..], 7)
}

Compiles to:

example::magic::h03dc881a95d1f454:
        mov     eax, dword ptr [rdi + 28]
        ret

A direct memory access, like touch_mem_1 would’ve compiled to! So the baseline of what really is »zero overhead« can shift depending on context. In this context, the compiler could trivially prove that we never index out of bounds when calling touch_mem_3 here, so it was memory-safe to omit the bounds check here. Our program is now as efficient as touch_mem_1, but unlike touch_mem_1, this one is correct.

Disappointingly, this trick doesn’t always work. Sometimes our index calculations just are too hard to predict for the compiler, in which case bounds checks cannot be omitted. There are programming languages, however, which allow us to guarantee, at compile time, that we never index out of bounds, regardless of which arguments we call that function with.

Cue in »bounded integers«, aka.:

Magic

I lied. This post isn’t just about coining the term »Negative Overhead Abstractions™«. It’s also about advocating for modern systems programming languages to add bounded integers.

Assume Rust got a new kind of type annotation, a special kind of refinement. For each integer type, we are now also capable of specifying a range in which our variable of that type is considered being valid.

let x
  : i32      // base number type
  @ -3 ..= 3 // our bounds refinement
  = 9;       // uh-oh!

In this hypothetical future Rust, the above code snippet would not compile. 9_i32 has the range 9..=9, which is not a subrange of the specified -3 ..= 3, which is a type error. And yes, this means that partially overlapping ranges are also a no-go:

let a: i32 @ 0..8 = …;
let b: i32 @ 0..3 = a;

The assignment b = a is also a type error, because the possible range of a values 3..=7 is outside the required range 0..3. The trick now is that every operation on refined integers produces integers with different refinements, and we can track these range changes at compile time during type checking.

let a: i32 @ 0..=7 = …;
let b = a + 2;

If we were to inspect the refined type of b, we’d be told it’s i32 @ 2..=9. As expected, adding 2 also shifted our bounds up by 2, because in Rust, at least conceptually, the + operator is not wrapping on overflow for i32. Wait… then what would be the type of…?

let a: u8 @ 50 ..= 250 = …;
let b = a + 100;

That’s a type error! Our expected range is now 150 ..= 350, but that’s not a subrange of what the underlying u8 type is capable of representing: 0 ..= 255! We now have a compile time guarantee that the conceptually not wrapping + operator does indeed never wrap. Can we make the above code work, somehow? Sure! Checks by comparisons create »proofs« on checked values. If we then use a »proof«, we’re considering a different range which always satisfies our initial check. Sounds more complicated than it is:

let a: u8 @ 50 ..= 250 = …;

if a < 156 {
  // Within the context of this `if`, we »use« the newly created
  // »proof« that `a < 156`. Only within this `if {}` has `a` the
  // new bounds: `50 .. 156`
  let b = a + 100;
}

Now, b has the bounds 150 .. 256, which is in range for a u8, and thus the above code type checks while still guaranteeing that + cannot overflow.

So, about arrays and slices…

fn touch_mem_4<const L: usize>(data: &[i32; L..], index: usize @ ..L) -> i32 {
  data[index]
}

I sneakily added yet even more new syntax. We have fixed-length arrays [T; N] of length N. We have dynamic-length slices [T], which’s actual length is known only at runtime. Now we also have ranged slices [T; MinN ..= MaxN], where the actual length is still known only at runtime, but we now have static guarantees that this runtime length is within a certain range. In that sense, the »legacy« slice syntax is sugar for: [T; ..], while array syntax is equivalent to: [T; N..=N].

By requiring a minimum length L known at compile time, and by restricting index to always being less than L, we now have the static guarantee that we never index out of bounds, regardless of runtime length. Calling this function thus never emits a bounds check. The burden of bounds-checking is put onto you, through such »proofs« as shown before.

In today’s actual Rust, i’ve established touch_mem_3 as the base line for »zero overhead«, under the premise that you cannot have less than that in a correct (read: memory-safe) program. However, if we imagine for a bit longer this hypothetical »bounded Rust«, we can do better than a runtime bounds check while still guaranteeing memory-safety. In that »bounded Rust«, touch_mem_3 has positive overhead and touch_mem_4 is the new baseline zero.

Conclusion

touch_mem_1 still sucks.

int touch_mem_5(const std::vector<int>& data, size_t index) {
  assert(data.size() > index);
  return data[index];
}

Better.

Update: Cap’n Proto

I’ve been pointed at a blog post from 2015 in which the developers of the Cap’n Proto format wrote about their use of bounded integers to make security vulnerabilities based on overflowing pointer arithmetic impossible. To quote the article:

I want my compiler to refuse to compile arithmetic which might overflow.

They desire this capability of refusing out-of-bounds arithmetic so much, that they ended up using C++ template magic as an implementation of bounded integers. With enough dedication and enabled nightly features, you can do the same in generic Rust. Walls of complicated and verbose code later, you do arrive at arithmetic code that… is horrendous to use, but at least it’s free of bugs.

With that said, it does not seem practical to change every integer throughout the Cap’n Proto codebase to use Guarded [their templated half-bounded integer type] – using it in the API would create too much confusion and cognitive overhead for users, and would force application code to be more verbose. Therefore, this approach unfortunately will not be able to find all integer overflows throughout the entire library[…]

We absolutely need bounded integers in modern systems programming languages. We have, right here, an example where real-life program correctness is willingly sacrificed, for the tools we’re given today are just that awkward to use. We should strive to make unchecked arithmetic a negative overhead abstraction in systems programming.

Need more convincing? Then how about that one time an uncaught integer division by zero turned a US battleship into a ghost ship…

Just steal from Wuffs, k thx bai. ☺