Negative Overhead Abstractions
Today i’d like to coin a new term: The Negative Overhead Abstraction™. In this post we’ll briefly explore C++’s use of them, their relation to zero overhead abstractions, why they suck, and how we could do better.
On This Page:
The observant of you might’ve noticed me tagging this post with »memory-safety«, which is a flavour of program correctness guarantees. If you look for a definition on Wikipediaʷᵇ, you get:
Memory access, eh’? Let’s have a look at a common example of that, array indexing.
The overhead of indexing an array
Array index operators are a popular abstraction over manually adding offsets to addresses to then dereference some pointers. Here’s how you do that in C++:
int
The above C++ code is not memory safe. We accept any arbitrary index and will blindly trust it being inside data’s bounds. But it doesn’t add any runtime overhead via, say, bounds checking, so people usually think of C++’s index operator implementation as being »zero overhead«.
int
This variant, on the other hand, is considered being »positive overhead«. Both, because adding bounds checks is a positive thing[citation needed], and because it adds overhead in code to be executed and data added to the program binary. Before, we didn’t have to deal with exceptions and stack unwinding, now we do, even though exceptions aren’t related to the problem at hand. We assume the index is in range, that there’s data at that index to fetch for us. We didn’t ask for being able to handle the case where there is no data, so even semantically, the exception machinery adds overhead†.
Now consider this semantically equivalent code snippet in Rust:
If we were to compile code like that with panic="abort"
, there would
be no additional unwinding tables. This code performs a bounds check
just like C++ using std::vector<T>::at(size_t)
. Unlike C++, however,
it doesn’t construct an exception object to toss around. Rust instead
controlledly crashes our application with a panic!
. Now where would
this Rust code sit on the sliding scale of overheadedness? Definitely
somewhere in between touch_mem_1
and touch_mem_2
.
The 3 kinds of overhead
Taking these three examples above, i propose categorising them as such:
touch_mem_1
has negative overhead.touch_mem_3
has zero overhead.touch_mem_2
has positive overhead.
I think we all agree on ③ here.
I did point out that under the premise of »memory safety«, touch_mem_1
is incorrect. There’s probably an old software developer joke coming
to your mind:
I optimised my software so it does the incorrect thing faster!
And really, what worth is runtime code execution speed, or runtime memory use, or program binary size, if that highly efficient thing is only efficient at not or barely doing the thing we need it to? Our assumption is that the index is in bounds of data. We cannot guarantee that assumption at compile time, so we have to be sceptical. We have to verify our assumption via bounds check. As such, i claim that having that bounds check is the real overhead baseline for a correct program. This is »zero overhead«. How we then decide to deal with the error case makes the difference between zero and positive.
So touch_mem_2
has positive overhead, because if unwind tables
weren’t optimised away, they are an added cost on top of the base line
logic implemented in touch_mem_3
.
Now it probably also makes sense how touch_mem_1
has »negative overhead«.
It removes stuff, compared to our shifted baseline touch_mem_3
, though
at the cost of having an incorrect program. And we have many such negative
overhead abstractions all around us, beyond just C++’s poor default for
array indexing. Another fun example of it is compiling with -ffast-math
,
which among other things introduces unpredictable rounding errors for the
sake of making your floating-point number crunching a few CPU cycles faster.
Another silly example of negative overhead abstractions is feature removal.
Parsing a CSV file faster by using an inexact floating-point number parser,
or speeding up SQLite by disabling all mutexes.
Is bounds checking always zero?
No. This time, allow me to show you some compiler output from Godbolt. I didn’t do that earlier, for without a calling context, we wouldn’t see any of the claimed unwinding table overheads. Now, we do get a calling context to consider:
Compiles to:
example::magic::h03dc881a95d1f454:
mov eax, dword ptr [rdi + 28]
ret
A direct memory access, like touch_mem_1
would’ve compiled to! So the
baseline of what really is »zero overhead« can shift depending on context.
In this context, the compiler could trivially prove that we never index
out of bounds when calling touch_mem_3
here, so it was memory-safe to
omit the bounds check here. Our program is now as efficient as touch_mem_1
,
but unlike touch_mem_1
, this one is correct.
Disappointingly, this trick doesn’t always work. Sometimes our index calculations just are too hard to predict for the compiler, in which case bounds checks cannot be omitted. There are programming languages, however, which allow us to guarantee, at compile time, that we never index out of bounds, regardless of which arguments we call that function with.
Cue in »bounded integers«, aka.:
Magic
I lied. This post isn’t just about coining the term »Negative Overhead Abstractions™«. It’s also about advocating for modern systems programming languages to add bounded integers.
Assume Rust got a new kind of type annotation, a special kind of refinement. For each integer type, we are now also capable of specifying a range in which our variable of that type is considered being valid.
let x
: i32 // base number type
@ -3 ..= 3 // our bounds refinement
= 9; // uh-oh!
In this hypothetical future Rust, the above code snippet would not
compile. 9_i32
has the range 9..=9
, which is not a subrange of
the specified -3 ..= 3
, which is a type error. And yes, this means
that partially overlapping ranges are also a no-go:
let a: i32 @ 0..8 = …;
let b: i32 @ 0..3 = a;
The assignment b = a
is also a type error, because the possible range
of a values 3..=7
is outside the required range 0..3
. The
trick now is that every operation on refined integers produces integers
with different refinements, and we can track these range changes at compile
time during type checking.
let a: i32 @ 0..=7 = …;
let b = a + 2;
If we were to inspect the refined type of b, we’d be told it’s
i32 @ 2..=9
. As expected, adding 2 also shifted our bounds up by 2,
because in Rust, at least conceptually, the +
operator is not wrapping
on overflow for i32
. Wait… then what would be the type of…?
let a: u8 @ 50 ..= 250 = …;
let b = a + 100;
That’s a type error! Our expected range is now 150 ..= 350
, but that’s
not a subrange of what the underlying u8
type is capable of representing:
0 ..= 255
! We now have a compile time guarantee that the conceptually
not wrapping +
operator does indeed never wrap. Can we make the above
code work, somehow? Sure! Checks by comparisons create »proofs« on
checked values. If we then use a »proof«, we’re considering a different
range which always satisfies our initial check. Sounds more complicated
than it is:
let a: u8 @ 50 ..= 250 = …;
if a < 156
Now, b has the bounds 150 .. 256
, which is in range for a
u8
, and thus the above code type checks while still guaranteeing that
+
cannot overflow.
So, about arrays and slices…
I sneakily added yet even more new syntax. We have fixed-length arrays
[T; N]
of length N
. We have dynamic-length slices [T]
, which’s
actual length is known only at runtime. Now we also have ranged slices
[T; MinN ..= MaxN]
, where the actual length is still known only at
runtime, but we now have static guarantees that this runtime length is
within a certain range. In that sense, the »legacy« slice syntax is
sugar for: [T; ..]
, while array syntax is equivalent to: [T; N..=N]
.
By requiring a minimum length L
known at compile time, and by restricting
index to always being less than L
, we now have the static
guarantee that we never index out of bounds, regardless of runtime length.
Calling this function thus never emits a bounds check. The burden of
bounds-checking is put onto you, through such »proofs« as shown before.
In today’s actual Rust, i’ve established touch_mem_3
as the base line
for »zero overhead«, under the premise that you cannot have less than
that in a correct (read: memory-safe) program. However,
if we imagine for a bit longer this hypothetical »bounded Rust«, we can
do better than a runtime bounds check while still guaranteeing
memory-safety. In that »bounded Rust«, touch_mem_3
has positive
overhead and touch_mem_4
is the new baseline zero.
Conclusion
touch_mem_1
still sucks.
int
Better.
Update: Cap’n Proto
I’ve been pointed at a blog post from 2015 in which the developers of the Cap’n Proto format wrote about their use of bounded integers to make security vulnerabilities based on overflowing pointer arithmetic impossible. To quote the article:
They desire this capability of refusing out-of-bounds arithmetic so
much, that they ended up using C++ template
magic as
an implementation of bounded integers. With enough dedication and
enabled nightly features, you can do the same in generic Rust. Walls
of complicated and verbose code later, you do arrive at arithmetic
code that… is horrendous to use, but at least it’s free of bugs.
We absolutely need bounded integers in modern systems programming languages. We have, right here, an example where real-life program correctness is willingly sacrificed, for the tools we’re given today are just that awkward to use. We should strive to make unchecked arithmetic a negative overhead abstraction in systems programming.
Need more convincing? Then how about that one time an uncaught integer division by zero turned a US battleship into a ghost ship…
Just steal from Wuffs, k thx bai. ☺