If a decision can be made before the application even starts, it must never be evaluated at runtime. In ultra-low-latency HFT, runtime polymorphism (the cornerstone of Object-Oriented generic design) introduces a massive hidden tax via dynamic dispatch. Let's force the C++ compiler to evaluate everything beforehand.
Consider a classic trading engine architecture supporting multiple exchanges (CME, NASDAQ, EUREX). The standard OOP approach is to create a generic interface:
class IExchangeHandler {
public:
virtual void on_order_filled(uint64_t id, double price) = 0;
};
Each specific exchange implements this interface. When an order fires, the engine calls handler->on_order_filled(...).
The Penalty of the vptr
Whenever the `virtual` keyword is present, the compiler injects a hidden pointer (`vptr`) into your struct pointing to a Virtual Method Table (`vtable`). At runtime, the CPU must: 1. Fetch the `vtable` pointer from your object memory. 2. Jump to the `vtable` address in code memory. 3. Look up the specific implementation offset for `on_order_filled`. 4. Execute an indirect branch jump.
An indirect branch instruction murders your CPU's instruction pipeline. Because the CPU doesn't know exactly which code to execute until it resolves the vtable pointer, branch prediction completely stalls. The core halts its speculative execution pipeline, resulting in a 10 to 20 nanosecond hesitation. You cannot afford this tax.
We want the structural benefits of generic interfaces but refuse to pay the virtual dispatch cost. The solution is the Curiously Recurring Template Pattern (CRTP).
CRTP resolves polymorphism entirely during compilation by substituting virtual dispatch with strict template inheritance. The base class takes its inheriting derived class as a template parameter!
#include <iostream>
// The Base interface is templated on the specific Derived class
template <typename Derived>
class ExchangeHandlerBase {
public:
inline void dispatch_order_fill(uint64_t id, double price) noexcept {
// Static Cast: The compiler statically evaluates this explicit pointer
// cast using knowledge of the template parameter. Zero runtime cost!
static_cast<Derived*>(this)->on_order_filled_impl(id, price);
}
};
// Nasdaq explicitly inherits from the Base, passing ITSELF as the template!
class NasdaqHandler : public ExchangeHandlerBase<NasdaqHandler> {
public:
// Note: NOT virtual! Normal inline function.
inline void on_order_filled_impl(uint64_t id, double price) noexcept {
// Execute Nasdaq-specific binary logic here
}
};
// Hot path execution logic compiled strictly against the template.
template <typename HandlerType>
inline void engine_event_loop(ExchangeHandlerBase<HandlerType>& handler) {
// At compilation time, the C++ compiler inlines the exact logic
// from 'NasdaqHandler::on_order_filled_impl' directly over this call!
handler.dispatch_order_fill(12345, 150.25);
}
Because the C++ compiler perfectly deduces the Derived generic type during the build phase (e.g. g++ -O3), it generates direct raw assembly CALL instructions with a hardcoded static memory address for the Nasdaq execution logic. Furthermore, if you use __attribute__((always_inline)), the compiler will literally copy/paste the child's logic straight into the loop. Branch penalty vanished.
The other extreme form of compile-time logic is constexpr. If an execution framework needs to conditionally process complex arithmetic thresholds (e.g. converting a decimal price into an integer tick price based on the exchange's predefined tick multiplier), a traditional snippet looks like:
long parse_price(double raw_price, std::string exchange) {
if (exchange == "CME") return raw_price * 1000;
else if (exchange == "NDX") return raw_price * 100;
// ... an \`if\` statement runtime branch
}
The if branches require evaluation. By writing pure constexpr traits templates, we can hard-wire logic into constants evaluated by the compiler before the executable is even minted.
enum class Exchange { CME, NDX };
// Evaluated strictly ahead of time!
template <Exchange Ex>
consteval long get_tick_multiplier() {
if constexpr (Ex == Exchange::CME) return 1000;
if constexpr (Ex == Exchange::NDX) return 100;
}
template <Exchange Ex>
inline long parse_price(double raw_price) noexcept {
// The compiler generates: 'return raw_price * 1000;'
// There are ZERO branch conditions here at runtime.
return raw_price * get_tick_multiplier<Ex>();
}
The paradigm shift in modern HFT (particularly post-C++17 and C++20 advancements) represents a total rejection of traditional enterprise-level design patterns. You must reject virtual functions. You must reject dynamically evaluated if/else branches regarding stable configurations. By deferring massive logic paths to g++'s compilation step using CRTP and constexpr, you shift the computational latency entirely onto your build server, ensuring your production runtime paths contain nothing but pure, unadulterated register math.
This concludes the original 5-part C++ architecture core. We now continue with operational low-latency engineering guides, starting with NUMA, CPU pinning, and jitter control.