MetalRaft: When Zero-Cost Abstractions Become Expensive

In this project, we ONLY use generic parameters. No dynamic dispatching! ‘Cause we care a lot about performance”, said the architect.

At first glance, this position appears entirely defensible. Zero-cost abstractions are one of Rust’s signature features, and avoiding unnecessary indirection is often the correct engineering choice. Yet, I knew from the start that such blanket statements always hide surprises. In other words, the statement does not address whether “zero-cost” applies beyond runtime performance, nor how those trade-offs evolve as a system grows in scope and abstraction depth.

The Experiment Setup

Months later, when I deliberately decided to explore multiple architectural variations for a well-known and sufficiently complex system in Rust, the statement came back to my mind, and I set out to build this project by fully adhering to it. The intent was clear: Show that after a specific project size is reached, the principle of diminishing returns kicks in hard and by the time the next threshold is crossed, the approach reaches its natural limits and shows its true colours. I intended to find these two thresholds.

The system in question is Raft; the inspiration came from a famous MIT course, but instead of stopping at a minimal implementation, I intentionally wanted to build a fully-fledged Raft node, targeting no-std and Embassy, to validate correctness, portability, and architectural sustainability. I also intended to hit additional targets, such as:

  • infrastructure-free validation

  • adversarial and imperative testing

  • no-std viability

  • shared core implementation

  • plug-in architecture for the storage and transport layers

Additionally, I wanted to be able to build two radically different test applications, both based on the same shared core implementation of the algorithm:

  1. A Raft mini cluster made up of 5 nodes relying on the following infrastructure: strict no-std, Embassy executor, UDP as transport mechanism, semihosting-based storage, and QEMU simulator

  2. A fully-fledged cluster running in the cloud, implemented with an infrastructure to be defined later

The very first fundamental block of this experiment was the implementation of the core algorithm in a strictly controlled environment, providing almost no infrastructural facilities. In Rust, it is possible to set up such an environment in a clear-cut way by NOT adding any dependencies, and by making extensive use of traits with associated types. This makes the system abstract, infrastructure-agnostic, and ultimately bare.

The second pillar of the approach was to create the most convenient test bed for such a core implementation. Given the nature of the core algorithm, the most convenient and technology-free instantiation of the traits consisted of memory-based stubs: in-memory storage and cache built with hash maps, in-memory transport built with channels, collections built with simple vectors and sorted hash maps. Raft involves time-based triggering events. In order to have full-control of them, mocking the time service also became required.

Having this in place, the two-pronged attack could start by alternating between adding features to the core with the creation of imperative tests and vice versa. The first target was the election functionality, which could be implemented in a reasonable amount of time and without too many surprises. The obvious second target was the so-called log replication. This required a significantly higher number of iterations and complexity, most especially to programmatically test error scenarios such as network partitioning.

First Threshold: 6 Generic Parameters

By the time this was completed, my code base looked compact, well-organized, and clean; moreover, I had a consistent body of stable integration tests that could be run in less than half a second. Nonetheless, my “raft_node” already had 6 generic parameters, its instantiation was verbose, and the methods were somewhat burdened by trait bounds. 

pub struct RaftNode<T, S, P, SM, TS, O>

where

    P: Clone,

    T: Transport<Payload = P, LogEntries = L, ChunkCollection = CC>,

    S: Storage<Payload = P>,

    SM: StateMachine<Payload = P>,

    TS: TimerService,

    O: Observer<Payload = P, LogEntries = L, ChunkCollection = CC>,

{...}

In summary, by the time I had implemented the basic Raft features, the architecture started to display the first cracks. That’s why I would tag the number of generic parameters that can be handled reasonably in a class/struct to 6. This is the first threshold I was looking for.

The next step was to incarnate the basic core Raft implementation into an unforgiving, strict, and unusual environment: embedded, no-std with Embassy as runtime. I decided to implement this in three steps:

  1. volatile storage and channel-based transport

  2. volatile storage and UDP

  3. semihosting and UDP

My oath was that should I have found any issue during the development, I would have first reproduced it programmatically in my test harness, then I would have fixed the core algorithm accordingly, and then finally resumed the implementation.

As expected, I uncovered a few issues, fixed them as promised, and managed to get a mini cluster made up of 5 Raft nodes up and running on a microcontroller simulator with reasonably real storage and transport layer implementations. This, while having an abstract, technology-agnostic definition of the algorithm “unaware” of any possible platform constraint or trade-off. In other words, the invariants held, the portability was real, and the approach worked.

For the next phase, I was presented with multiple options:

  1. Incarnate the algorithm in another platform, for example, a cloud-based one

  2. Proceed with the implementation of (some of) the so-called advanced features: log compaction, snapshots, dynamic membership, linearizable reads, pre-vote, joint consensus, and leadership transfer

Given that, from a software engineering standpoint, while being prototype-centered, I am more an abstractionist than a technologist, I opted for the second option. So, from that moment on, for each new feature, my development cycle would be three-pronged: establish correctness by adding features to the core and expanding the body of tests I had as before, and double-check that the embassy app could still run.

Over the course of this phase, reality hit, and hit hard square in the face. Each new advanced feature led to the expansion of the generic argument array and to significant refactoring of the existing tests due to the widening signatures. Some one-line methods had a ten-line definition due to the extensive trait-bound list. The Embassy application needed constant tweaking after every new generic parameter was added. The validation still held, the protocol was intact, all the tests could be run in less than a second, more and more error scenarios could be tested programmatically, and the overall structure was still in place and was reliable. Yet something was off.

Hitting the Wall: Advanced Features & 12 Generics

By the time I reached approximately 70% of the advanced features implemented, my “raft_node” had ballooned up to 12 generic parameters and was barely under control. The code was clean, decently structured, but cryptic and no longer really readable. My development velocity suffered a double hit from the generic parameter explosion and the growing complexity of the features. Some would have said that the cognitive load had overcompensated for the (possible) runtime performance benefits. The zero-cost abstractions had become expensive, to the point of being liabilities, and the bill had to be paid in different currencies: development speed, code readability, adaptability, and overall ease of use of the abstractions. I was losing control, the supposedly pure approach was now clearly showing its sins, and guilt was flowing through my veins. 

pub struct RaftNode<T, S, P, SM, C, L, CC, M, TS, O, CCC, CLK>

where

    P: Clone,

    T: Transport<Payload = P, LogEntries = L, ChunkCollection = CC>,

    S: Storage<Payload = P>,

    SM: StateMachine<Payload = P>,

    C: NodeCollection,

    L: LogEntryCollection<Payload = P>,

    CC: ChunkCollection + Clone,

    M: MapCollection,

    TS: TimerService,

    O: Observer<Payload = P, LogEntries = L, ChunkCollection = CC>,

    CCC: ConfigChangeCollection,

    CLK: Clock,

{…}

There was more.

The Missing Piece of the Puzzle: The Execution Layer

Moreover, my gut feelings had been whispering into my ears that there was something else wrong, missing, or implicit in my code base. The complexity was growing along another dimension too, and my refactoring attempts had only superficial and inconclusive effects. Few discussions with ChatGPT, Claude, and Grok, along with a long walk in the woods, allowed me to expose and formalize the issue: there was a fundamental abstraction missing. The glue between the algorithm, the infrastructure, and the outside world: the execution layer. This was implicit, smeared across multiple locations, spreading through the infrastructural abstractions, and yelling at me that he deserved his rightful place in the code.

impl<C, L, N> RaftAlgorithm<C, L, N> {

    pub fn on_message(&mut self, msg: Message) -> Vec<Action> {

   .....

    }

}

impl<S, T, TS> ExecutionLayer<S, T, TS> {

    pub fn execute(&mut self, actions: Vec<Action>) {

        for action in actions {

            match action {

                Action::SendMessage(to, msg) => self.transport.send(to, msg),

                Action::StartTimer(duration) => self.timers.start(duration),

                Action::WriteLog(entry) => self.storage.append(entry),

            }

        }

    }

}

This was the pivotal moment of the project. Although all the targets had been hit, there were four possible courses of action on my table:

  1. Extract the execution layer, refactor the codebase accordingly, add the dreaded additional generic parameters, and reassess the situation

  2. Set aside my abstractionist tendencies and wear the hat of the technologist to implement the cloud-based incarnation of the system

  3. Acknowledge that the limit of the religiously pure generic approach had been reached and crossed, the codebase needed to repent, design a more flexible version of the system consisting of a well-thought-out mix of generics and dynamic dispatching, and refactor the code accordingly

  4. 100% muscle. 0% brain: Bulldoze through the issues and finalize the implementation, no matter what

I chose option 5. I decided to stop. In my mind, there was no longer knowledge to harvest from this POC. Nonetheless, not before documenting all the positive findings, how to fix all the issues, and listing what features were still missing. Lastly, a cloud deployment would have added surface area but no new insight. At that point, the work would have been largely mechanical rather than exploratory.

Despite the limits reached, the project delivered a genuinely correct, portable, no_std-compatible Raft core with adversarial testing — knowledge that will accelerate the next greenfield effort already taking shape.

Conclusion: No Free Lunch

Zero-cost abstractions are one of Rust’s signature strengths, and they enable remarkable performance and correctness guarantees. But they are not free. When pushed far enough, the cost shifts from runtime to cognitive load, architectural rigidity, and team scalability. You have been warned!

“Absolutely NOT! In this project, we ONLY use dynamic dispatching. No generics! ‘Cause we care a lot about testability!”, yelled the other architect.

Links & Credits:

Diego Ongaro’s Raft thesis: https://raft.github.io/raft.pdf

Code: https://github.com/umbgtt10/metal_raft

Comments