Rust Patterns & Engineering How-Tos
Speaker Intro
- Principal Firmware Architect in Microsoft SCHIE (Silicon and Cloud Hardware Infrastructure Engineering) team
- Industry veteran with expertise in security, systems programming (firmware, operating systems, hypervisors), CPU and platform architecture, and C++ systems
- Started programming in Rust in 2017 (@AWS EC2), and have been in love with the language ever since
A practical guide to intermediate-and-above Rust patterns that arise in real codebases. This is not a language tutorial — it assumes you can write basic Rust and want to level up. Each chapter isolates one concept, explains when and why to use it, and provides compilable examples with inline exercises.
Who This Is For
- Developers who have finished The Rust Programming Language but struggle with “how do I actually design this?”
- C++/C# engineers translating production systems into Rust
- Anyone who has hit a wall with generics, trait bounds, or lifetime errors and wants a systematic toolkit
Prerequisites
Before starting, you should be comfortable with:
- Ownership, borrowing, and lifetimes (basic level)
- Enums, pattern matching, and
Option/Result - Structs, methods, and basic traits (
Display,Debug,Clone) - Cargo basics:
cargo build,cargo test,cargo run
How to Use This Book
Difficulty Legend
Each chapter is tagged with a difficulty level:
| Symbol | Level | Meaning |
|---|---|---|
| 🟢 | Fundamentals | Core concepts every Rust developer needs |
| 🟡 | Intermediate | Patterns used in production codebases |
| 🔴 | Advanced | Deep language mechanics — revisit as needed |
Pacing Guide
| Chapters | Topic | Suggested Time | Checkpoint |
|---|---|---|---|
| Part I: Type-Level Patterns | |||
| 1. Generics 🟢 | Monomorphization, const generics, const fn | 1–2 hours | Can explain when dyn Trait beats generics |
| 2. Traits 🟡 | Associated types, GATs, blanket impls, vtables | 3–4 hours | Can design a trait with associated types |
| 3. Newtype & Type-State 🟡 | Zero-cost safety, compile-time FSMs | 2–3 hours | Can build a type-state builder pattern |
| 4. PhantomData 🔴 | Lifetime branding, variance, drop check | 2–3 hours | Can explain why PhantomData<fn(T)> differs from PhantomData<T> |
| Part II: Concurrency & Runtime | |||
| 5. Channels 🟢 | mpsc, crossbeam, select!, actors | 1–2 hours | Can implement a channel-based worker pool |
| 6. Concurrency 🟡 | Threads, rayon, Mutex, RwLock, atomics | 2–3 hours | Can pick the right sync primitive for a scenario |
| 7. Closures 🟢 | Fn/FnMut/FnOnce, combinators | 1–2 hours | Can write a higher-order function that accepts closures |
| 8. Smart Pointers 🟡 | Box, Rc, Arc, RefCell, Cow, Pin | 2–3 hours | Can explain when to use each smart pointer |
| Part III: Systems & Production | |||
| 9. Error Handling 🟢 | thiserror, anyhow, ? operator | 1–2 hours | Can design an error type hierarchy |
| 10. Serialization 🟡 | serde, zero-copy, binary data | 2–3 hours | Can write a custom serde deserializer |
| 11. Unsafe 🔴 | Superpowers, FFI, UB pitfalls, allocators | 2–3 hours | Can wrap unsafe code in a sound safe API |
| 12. Macros 🟡 | macro_rules!, proc macros, syn/quote | 2–3 hours | Can write a declarative macro with tt munching |
| 13. Testing 🟢 | Unit/integration/doc tests, proptest, criterion | 1–2 hours | Can set up property-based tests |
| 14. API Design 🟡 | Module layout, ergonomic APIs, feature flags | 2–3 hours | Can apply the “parse, don’t validate” pattern |
| 15. Async 🔴 | Futures, Tokio, common pitfalls | 1–2 hours | Can identify async anti-patterns |
| Appendices | |||
| Reference Card | Quick-look trait bounds, lifetimes, patterns | As needed | — |
| Capstone Project | Type-safe task scheduler | 4–6 hours | Submit a working implementation |
Total estimated time: 30–45 hours for thorough study with exercises.
Working Through Exercises
Every chapter ends with a hands-on exercise. For maximum learning:
- Try it yourself first — spend at least 15 minutes before opening the solution
- Type the code — don’t copy-paste; typing builds muscle memory
- Modify the solution — add a feature, change a constraint, break something on purpose
- Check cross-references — most exercises combine patterns from multiple chapters
The capstone project (Appendix) ties together patterns from across the book into a single, production-quality system.
Table of Contents
Part I: Type-Level Patterns
1. Generics — The Full Picture 🟢
Monomorphization, code bloat trade-offs, generics vs enums vs trait objects, const generics, const fn.
2. Traits In Depth 🟡 Associated types, GATs, blanket impls, marker traits, vtables, HRTBs, extension traits, enum dispatch.
3. The Newtype and Type-State Patterns 🟡 Zero-cost type safety, compile-time state machines, builder patterns, config traits.
4. PhantomData — Types That Carry No Data 🔴 Lifetime branding, unit-of-measure pattern, drop check, variance.
Part II: Concurrency & Runtime
5. Channels and Message Passing 🟢
std::sync::mpsc, crossbeam, select!, backpressure, actor pattern.
6. Concurrency vs Parallelism vs Threads 🟡 OS threads, scoped threads, rayon, Mutex/RwLock/Atomics, Condvar, OnceLock, lock-free patterns.
7. Closures and Higher-Order Functions 🟢
Fn/FnMut/FnOnce, closures as parameters/return values, combinators, higher-order APIs.
8. Smart Pointers and Interior Mutability 🟡 Box, Rc, Arc, Weak, Cell/RefCell, Cow, Pin, ManuallyDrop.
Part III: Systems & Production
9. Error Handling Patterns 🟢
thiserror vs anyhow, #[from], .context(), ? operator, panics.
10. Serialization, Zero-Copy, and Binary Data 🟡
serde fundamentals, enum representations, zero-copy deserialization, repr(C), bytes::Bytes.
11. Unsafe Rust — Controlled Danger 🔴 Five superpowers, sound abstractions, FFI, UB pitfalls, arena/slab allocators.
12. Macros — Code That Writes Code 🟡
macro_rules!, when (not) to use macros, proc macros, derive macros, syn/quote.
13. Testing and Benchmarking Patterns 🟢 Unit/integration/doc tests, proptest, criterion, mocking strategies.
14. Crate Architecture and API Design 🟡 Module layout, API design checklist, ergonomic parameters, feature flags, workspaces.
15. Async/Await Essentials 🔴 Futures, Tokio quick-start, common pitfalls. (For deep async coverage, see our Async Rust Training.)
Appendices
Summary and Reference Card Pattern decision guide, trait bounds cheat sheet, lifetime elision rules, further reading.
Capstone Project: Type-Safe Task Scheduler Integrate generics, traits, typestate, channels, error handling, and testing into a complete system.
1. Generics — The Full Picture 🟢
What you’ll learn:
- How monomorphization gives zero-cost generics — and when it causes code bloat
- The decision framework: generics vs enums vs trait objects
- Const generics for compile-time array sizes and
const fnfor compile-time evaluation- When to trade static dispatch for dynamic dispatch on cold paths
Monomorphization and Zero Cost
Generics in Rust are monomorphized — the compiler generates a specialized copy of each generic function for every concrete type it’s used with. This is the opposite of Java/C# where generics are erased at runtime.
fn max_of<T: PartialOrd>(a: T, b: T) -> T {
if a >= b { a } else { b }
}
fn main() {
max_of(3_i32, 5_i32); // Compiler generates max_of_i32
max_of(2.0_f64, 7.0_f64); // Compiler generates max_of_f64
max_of("a", "z"); // Compiler generates max_of_str
}
What the compiler actually produces (conceptually):
#![allow(unused)]
fn main() {
// Three separate functions — no runtime dispatch, no vtable:
fn max_of_i32(a: i32, b: i32) -> i32 { if a >= b { a } else { b } }
fn max_of_f64(a: f64, b: f64) -> f64 { if a >= b { a } else { b } }
fn max_of_str<'a>(a: &'a str, b: &'a str) -> &'a str { if a >= b { a } else { b } }
}
Why does
max_of_strneed<'a>butmax_of_i32doesn’t?i32andf64areCopytypes — the function returns an owned value. But&stris a reference, so the compiler must know the returned reference’s lifetime. The<'a>annotation says “the returned&strlives at least as long as both inputs.”
Advantages: Zero runtime cost — identical to hand-written specialized code. The optimizer can inline, vectorize, and specialize each copy independently.
Comparison with C++: Rust generics work like C++ templates but with one crucial difference — bounds checking happens at definition, not instantiation. In C++, a template compiles only when used with a specific type, leading to cryptic error messages deep in library code. In Rust, T: PartialOrd is checked when you define the function, so errors are caught early and messages are clear.
#![allow(unused)]
fn main() {
// Rust: error at definition site — "T doesn't implement Display"
fn broken<T>(val: T) {
println!("{val}"); // ❌ Error: T doesn't implement Display
}
// Fix: add the bound
fn fixed<T: std::fmt::Display>(val: T) {
println!("{val}"); // ✅
}
}
When Generics Hurt: Code Bloat
Monomorphization has a cost — binary size. Each unique instantiation duplicates the function body:
#![allow(unused)]
fn main() {
// This innocent function...
fn serialize<T: serde::Serialize>(value: &T) -> Vec<u8> {
serde_json::to_vec(value).unwrap()
}
// ...used with 50 different types → 50 copies in the binary.
}
Mitigation strategies:
#![allow(unused)]
fn main() {
// 1. Extract the non-generic core ("outline" pattern)
fn serialize<T: serde::Serialize>(value: &T) -> Result<Vec<u8>, serde_json::Error> {
// Generic part: only the serialization call
let json_value = serde_json::to_value(value)?;
// Non-generic part: extracted into a separate function
serialize_value(json_value)
}
fn serialize_value(value: serde_json::Value) -> Result<Vec<u8>, serde_json::Error> {
// This function exists only ONCE in the binary
serde_json::to_vec(&value)
}
// 2. Use trait objects (dynamic dispatch) when inlining isn't critical
fn log_item(item: &dyn std::fmt::Display) {
// One copy — uses vtable for dispatch
println!("[LOG] {item}");
}
}
Rule of thumb: Use generics for hot paths where inlining matters. Use
dyn Traitfor cold paths (error handling, logging, configuration) where a vtable call is negligible.
Generics vs Enums vs Trait Objects — Decision Guide
Three ways to handle “different types, same interface” in Rust:
| Approach | Dispatch | Known at | Extensible? | Overhead |
|---|---|---|---|---|
Generics (impl Trait / <T: Trait>) | Static (monomorphized) | Compile time | ✅ (open set) | Zero — inlined |
| Enum | Match arm | Compile time | ❌ (closed set) | Zero — no vtable |
Trait object (dyn Trait) | Dynamic (vtable) | Runtime | ✅ (open set) | Vtable pointer + indirect call |
#![allow(unused)]
fn main() {
// --- GENERICS: Open set, zero cost, compile-time ---
fn process<H: Handler>(handler: H, request: Request) -> Response {
handler.handle(request) // Monomorphized — one copy per H
}
// --- ENUM: Closed set, zero cost, exhaustive matching ---
enum Shape {
Circle(f64),
Rect(f64, f64),
Triangle(f64, f64, f64),
}
impl Shape {
fn area(&self) -> f64 {
match self {
Shape::Circle(r) => std::f64::consts::PI * r * r,
Shape::Rect(w, h) => w * h,
Shape::Triangle(a, b, c) => {
let s = (a + b + c) / 2.0;
(s * (s - a) * (s - b) * (s - c)).sqrt()
}
}
}
}
// Adding a new variant forces updating ALL match arms — the compiler
// enforces exhaustiveness. Great for "I control all the variants."
// --- TRAIT OBJECT: Open set, runtime cost, extensible ---
fn log_all(items: &[Box<dyn std::fmt::Display>]) {
for item in items {
println!("{item}"); // vtable dispatch
}
}
}
Decision flowchart:
flowchart TD
A["Do you know ALL<br>possible types at<br>compile time?"]
A -->|"Yes, small<br>closed set"| B["Enum"]
A -->|"Yes, but set<br>is open"| C["Generics<br>(monomorphized)"]
A -->|"No — types<br>determined at runtime"| D["dyn Trait"]
C --> E{"Hot path?<br>(millions of calls)"}
E -->|Yes| F["Generics<br>(inlineable)"]
E -->|No| G["dyn Trait<br>is fine"]
D --> H{"Need mixed types<br>in one collection?"}
H -->|Yes| I["Vec<Box<dyn Trait>>"]
H -->|No| C
style A fill:#e8f4f8,stroke:#2980b9,color:#000
style B fill:#d4efdf,stroke:#27ae60,color:#000
style C fill:#d4efdf,stroke:#27ae60,color:#000
style D fill:#fdebd0,stroke:#e67e22,color:#000
style F fill:#d4efdf,stroke:#27ae60,color:#000
style G fill:#fdebd0,stroke:#e67e22,color:#000
style I fill:#fdebd0,stroke:#e67e22,color:#000
style E fill:#fef9e7,stroke:#f1c40f,color:#000
style H fill:#fef9e7,stroke:#f1c40f,color:#000
Const Generics
Since Rust 1.51, you can parameterize types and functions over constant values, not just types:
#![allow(unused)]
fn main() {
// Array wrapper parameterized over size
struct Matrix<const ROWS: usize, const COLS: usize> {
data: [[f64; COLS]; ROWS],
}
impl<const ROWS: usize, const COLS: usize> Matrix<ROWS, COLS> {
fn new() -> Self {
Matrix { data: [[0.0; COLS]; ROWS] }
}
fn transpose(&self) -> Matrix<COLS, ROWS> {
let mut result = Matrix::<COLS, ROWS>::new();
for r in 0..ROWS {
for c in 0..COLS {
result.data[c][r] = self.data[r][c];
}
}
result
}
}
// The compiler enforces dimensional correctness:
fn multiply<const M: usize, const N: usize, const P: usize>(
a: &Matrix<M, N>,
b: &Matrix<N, P>, // N must match!
) -> Matrix<M, P> {
let mut result = Matrix::<M, P>::new();
for i in 0..M {
for j in 0..P {
for k in 0..N {
result.data[i][j] += a.data[i][k] * b.data[k][j];
}
}
}
result
}
// Usage:
let a = Matrix::<2, 3>::new(); // 2×3
let b = Matrix::<3, 4>::new(); // 3×4
let c = multiply(&a, &b); // 2×4 ✅
// let d = Matrix::<5, 5>::new();
// multiply(&a, &d); // ❌ Compile error: expected Matrix<3, _>, got Matrix<5, 5>
}
C++ comparison: This is similar to
template<int N>in C++, but Rust const generics are type-checked eagerly and don’t suffer from SFINAE complexity.
Const Functions (const fn)
const fn marks a function as evaluable at compile time — Rust’s equivalent
of C++ constexpr. The result can be used in const and static contexts:
#![allow(unused)]
fn main() {
// Basic const fn — evaluated at compile time when used in const context
const fn celsius_to_fahrenheit(c: f64) -> f64 {
c * 9.0 / 5.0 + 32.0
}
const BOILING_F: f64 = celsius_to_fahrenheit(100.0); // Computed at compile time
const FREEZING_F: f64 = celsius_to_fahrenheit(0.0); // 32.0
// Const constructors — create statics without lazy_static!
struct BitMask(u32);
impl BitMask {
const fn new(bit: u32) -> Self {
BitMask(1 << bit)
}
const fn or(self, other: BitMask) -> Self {
BitMask(self.0 | other.0)
}
const fn contains(&self, bit: u32) -> bool {
self.0 & (1 << bit) != 0
}
}
// Static lookup table — no runtime cost, no lazy initialization
const GPIO_INPUT: BitMask = BitMask::new(0);
const GPIO_OUTPUT: BitMask = BitMask::new(1);
const GPIO_IRQ: BitMask = BitMask::new(2);
const GPIO_IO: BitMask = GPIO_INPUT.or(GPIO_OUTPUT);
// Register maps as const arrays:
const SENSOR_THRESHOLDS: [u16; 4] = {
let mut table = [0u16; 4];
table[0] = 50; // Warning
table[1] = 70; // High
table[2] = 85; // Critical
table[3] = 100; // Shutdown
table
};
// The entire table exists in the binary — no heap, no runtime init.
}
What you CAN do in const fn (as of Rust 1.79+):
- Arithmetic, bit operations, comparisons
if/else,match,loop,while(control flow)- Creating and modifying local variables (
let mut) - Calling other
const fns - References (
&,&mut— within the const context) panic!()(becomes a compile error if reached at compile time)
What you CANNOT do (yet):
- Heap allocation (
Box,Vec,String) - Trait method calls (only inherent methods)
- Floating-point in some contexts (stabilized for basic ops)
- I/O or side effects
#![allow(unused)]
fn main() {
// const fn with panic — becomes a compile-time error:
const fn checked_div(a: u32, b: u32) -> u32 {
if b == 0 {
panic!("division by zero"); // Compile error if b is 0 at const time
}
a / b
}
const RESULT: u32 = checked_div(100, 4); // ✅ 25
// const BAD: u32 = checked_div(100, 0); // ❌ Compile error: "division by zero"
}
C++ comparison:
const fnis Rust’sconstexpr. The key difference: Rust’s version is opt-in and the compiler rigorously verifies that only const-compatible operations are used. In C++,constexprfunctions can silently fall back to runtime evaluation — in Rust, aconstcontext requires compile-time evaluation or it’s a hard error.
Practical advice: Make constructors and simple utility functions
const fnwhenever possible — it costs nothing and enables callers to use them in const contexts. For hardware diagnostic code,const fnis ideal for register definitions, bitmask construction, and threshold tables.
Key Takeaways — Generics
- Monomorphization gives zero-cost abstractions but can cause code bloat — use
dyn Traitfor cold paths- Const generics (
[T; N]) replace C++ template tricks with compile-time–checked array sizesconst fneliminateslazy_static!for compile-time–computable values
See also: Ch 2 — Traits In Depth for trait bounds, associated types, and trait objects. Ch 4 — PhantomData for zero-sized generic markers.
Exercise: Generic Cache with Eviction ★★ (~30 min)
Build a generic Cache<K, V> struct that stores key-value pairs with a configurable maximum capacity. When full, the oldest entry is evicted (FIFO). Requirements:
fn new(capacity: usize) -> Selffn insert(&mut self, key: K, value: V)— evicts the oldest if at capacityfn get(&self, key: &K) -> Option<&V>fn len(&self) -> usize- Constrain
K: Eq + Hash + Clone
🔑 Solution
use std::collections::{HashMap, VecDeque};
use std::hash::Hash;
struct Cache<K, V> {
map: HashMap<K, V>,
order: VecDeque<K>,
capacity: usize,
}
impl<K: Eq + Hash + Clone, V> Cache<K, V> {
fn new(capacity: usize) -> Self {
Cache {
map: HashMap::with_capacity(capacity),
order: VecDeque::with_capacity(capacity),
capacity,
}
}
fn insert(&mut self, key: K, value: V) {
if self.map.contains_key(&key) {
self.map.insert(key, value);
return;
}
if self.map.len() >= self.capacity {
if let Some(oldest) = self.order.pop_front() {
self.map.remove(&oldest);
}
}
self.order.push_back(key.clone());
self.map.insert(key, value);
}
fn get(&self, key: &K) -> Option<&V> {
self.map.get(key)
}
fn len(&self) -> usize {
self.map.len()
}
}
fn main() {
let mut cache = Cache::new(3);
cache.insert("a", 1);
cache.insert("b", 2);
cache.insert("c", 3);
assert_eq!(cache.len(), 3);
cache.insert("d", 4); // Evicts "a"
assert_eq!(cache.get(&"a"), None);
assert_eq!(cache.get(&"d"), Some(&4));
println!("Cache works! len = {}", cache.len());
}
2. Traits In Depth 🟡
What you’ll learn:
- Associated types vs generic parameters — and when to use each
- GATs, blanket impls, marker traits, and trait object safety rules
- How vtables and fat pointers work under the hood
- Extension traits, enum dispatch, and typed command patterns
Associated Types vs Generic Parameters
Both let a trait work with different types, but they serve different purposes:
#![allow(unused)]
fn main() {
// --- ASSOCIATED TYPE: One implementation per type ---
trait Iterator {
type Item; // Each iterator produces exactly ONE kind of item
fn next(&mut self) -> Option<Self::Item>;
}
// A custom iterator that always yields i32 — there's no choice
struct Counter { max: i32, current: i32 }
impl Iterator for Counter {
type Item = i32; // Exactly one Item type per implementation
fn next(&mut self) -> Option<i32> {
if self.current < self.max {
self.current += 1;
Some(self.current)
} else {
None
}
}
}
// --- GENERIC PARAMETER: Multiple implementations per type ---
trait Convert<T> {
fn convert(&self) -> T;
}
// A single type can implement Convert for MANY target types:
impl Convert<f64> for i32 {
fn convert(&self) -> f64 { *self as f64 }
}
impl Convert<String> for i32 {
fn convert(&self) -> String { self.to_string() }
}
}
When to use which:
| Use | When |
|---|---|
| Associated type | There’s exactly ONE natural output/result per implementing type. Iterator::Item, Deref::Target, Add::Output |
| Generic parameter | A type can meaningfully implement the trait for MANY different types. From<T>, AsRef<T>, PartialEq<Rhs> |
Intuition: If it makes sense to ask “what is the Item of this iterator?”, use associated type. If it makes sense to ask “can this convert to f64? to String? to bool?”, use a generic parameter.
#![allow(unused)]
fn main() {
// Real-world example: std::ops::Add
trait Add<Rhs = Self> {
type Output; // Associated type — addition has ONE result type
fn add(self, rhs: Rhs) -> Self::Output;
}
// Rhs is a generic parameter — you can add different types to Meters:
struct Meters(f64);
struct Centimeters(f64);
impl Add<Meters> for Meters {
type Output = Meters;
fn add(self, rhs: Meters) -> Meters { Meters(self.0 + rhs.0) }
}
impl Add<Centimeters> for Meters {
type Output = Meters;
fn add(self, rhs: Centimeters) -> Meters { Meters(self.0 + rhs.0 / 100.0) }
}
}
Generic Associated Types (GATs)
Since Rust 1.65, associated types can have generic parameters of their own. This enables lending iterators — iterators that return references tied to the iterator rather than to the underlying collection:
#![allow(unused)]
fn main() {
// Without GATs — impossible to express a lending iterator:
// trait LendingIterator {
// type Item<'a>; // ← This was rejected before 1.65
// }
// With GATs (Rust 1.65+):
trait LendingIterator {
type Item<'a> where Self: 'a;
fn next(&mut self) -> Option<Self::Item<'_>>;
}
// Example: an iterator that yields overlapping windows
struct WindowIter<'data> {
data: &'data [u8],
pos: usize,
window_size: usize,
}
impl<'data> LendingIterator for WindowIter<'data> {
type Item<'a> = &'a [u8] where Self: 'a;
fn next(&mut self) -> Option<&[u8]> {
if self.pos + self.window_size <= self.data.len() {
let window = &self.data[self.pos..self.pos + self.window_size];
self.pos += 1;
Some(window)
} else {
None
}
}
}
}
When you need GATs: Lending iterators, streaming parsers, or any trait where the associated type’s lifetime depends on the
&selfborrow. For most code, plain associated types are sufficient.
Supertraits and Trait Hierarchies
Traits can require other traits as prerequisites, forming hierarchies:
graph BT
Display["Display"]
Debug["Debug"]
Error["Error"]
Clone["Clone"]
Copy["Copy"]
PartialEq["PartialEq"]
Eq["Eq"]
PartialOrd["PartialOrd"]
Ord["Ord"]
Error --> Display
Error --> Debug
Copy --> Clone
Eq --> PartialEq
Ord --> Eq
Ord --> PartialOrd
PartialOrd --> PartialEq
style Display fill:#e8f4f8,stroke:#2980b9,color:#000
style Debug fill:#e8f4f8,stroke:#2980b9,color:#000
style Error fill:#fdebd0,stroke:#e67e22,color:#000
style Clone fill:#d4efdf,stroke:#27ae60,color:#000
style Copy fill:#d4efdf,stroke:#27ae60,color:#000
style PartialEq fill:#fef9e7,stroke:#f1c40f,color:#000
style Eq fill:#fef9e7,stroke:#f1c40f,color:#000
style PartialOrd fill:#fef9e7,stroke:#f1c40f,color:#000
style Ord fill:#fef9e7,stroke:#f1c40f,color:#000
Arrows point from subtrait to supertrait: implementing
ErrorrequiresDisplay+Debug.
A trait can require that implementors also implement other traits:
#![allow(unused)]
fn main() {
use std::fmt;
// Display is a supertrait of Error
trait Error: fmt::Display + fmt::Debug {
fn source(&self) -> Option<&(dyn Error + 'static)> { None }
}
// Any type implementing Error MUST also implement Display and Debug
// Build your own hierarchies:
trait Identifiable {
fn id(&self) -> u64;
}
trait Timestamped {
fn created_at(&self) -> chrono::DateTime<chrono::Utc>;
}
// Entity requires both:
trait Entity: Identifiable + Timestamped {
fn is_active(&self) -> bool;
}
// Implementing Entity forces you to implement all three:
struct User { id: u64, name: String, created: chrono::DateTime<chrono::Utc> }
impl Identifiable for User {
fn id(&self) -> u64 { self.id }
}
impl Timestamped for User {
fn created_at(&self) -> chrono::DateTime<chrono::Utc> { self.created }
}
impl Entity for User {
fn is_active(&self) -> bool { true }
}
}
Blanket Implementations
Implement a trait for ALL types that satisfy some bound:
#![allow(unused)]
fn main() {
// std does this: any type that implements Display automatically gets ToString
impl<T: fmt::Display> ToString for T {
fn to_string(&self) -> String {
format!("{self}")
}
}
// Now i32, &str, your custom types — anything with Display — gets to_string() for free.
// Your own blanket impl:
trait Loggable {
fn log(&self);
}
// Every Debug type is automatically Loggable:
impl<T: std::fmt::Debug> Loggable for T {
fn log(&self) {
eprintln!("[LOG] {self:?}");
}
}
// Now ANY Debug type has .log():
// 42.log(); // [LOG] 42
// "hello".log(); // [LOG] "hello"
// vec![1, 2, 3].log(); // [LOG] [1, 2, 3]
}
Caution: Blanket impls are powerful but irreversible — you can’t add a more specific impl for a type that’s already covered by a blanket impl (orphan rules + coherence). Design them carefully.
Marker Traits
Traits with no methods — they mark a type as having some property:
#![allow(unused)]
fn main() {
// Standard library marker traits:
// Send — safe to transfer between threads
// Sync — safe to share (&T) between threads
// Unpin — safe to move after pinning
// Sized — has a known size at compile time
// Copy — can be duplicated with memcpy
// Your own marker trait:
/// Marker: this sensor has been factory-calibrated
trait Calibrated {}
struct RawSensor { reading: f64 }
struct CalibratedSensor { reading: f64 }
impl Calibrated for CalibratedSensor {}
// Only calibrated sensors can be used in production:
fn record_measurement<S: Calibrated>(sensor: &S) {
// ...
}
// record_measurement(&RawSensor { reading: 0.0 }); // ❌ Compile error
// record_measurement(&CalibratedSensor { reading: 0.0 }); // ✅
}
This connects directly to the type-state pattern in Chapter 3.
Trait Object Safety Rules
Not every trait can be used as dyn Trait. A trait is object-safe only if:
- No
Self: Sizedbound on the trait itself - No generic type parameters on methods
- No use of
Selfin return position (except via indirection likeBox<Self>) - No associated functions (methods must have
&self,&mut self, orself)
#![allow(unused)]
fn main() {
// ✅ Object-safe — can be used as dyn Drawable
trait Drawable {
fn draw(&self);
fn bounding_box(&self) -> (f64, f64, f64, f64);
}
let shapes: Vec<Box<dyn Drawable>> = vec![/* ... */]; // ✅ Works
// ❌ NOT object-safe — uses Self in return position
trait Cloneable {
fn clone_self(&self) -> Self;
// ^^^^ Can't know the concrete size at runtime
}
// let items: Vec<Box<dyn Cloneable>> = ...; // ❌ Compile error
// ❌ NOT object-safe — generic method
trait Converter {
fn convert<T>(&self) -> T;
// ^^^ The vtable can't contain infinite monomorphizations
}
// ❌ NOT object-safe — associated function (no self)
trait Factory {
fn create() -> Self;
// No &self — how would you call this through a trait object?
}
}
Workarounds:
#![allow(unused)]
fn main() {
// Add `where Self: Sized` to exclude a method from the vtable:
trait MyTrait {
fn regular_method(&self); // Included in vtable
fn generic_method<T>(&self) -> T
where
Self: Sized; // Excluded from vtable — can't be called via dyn MyTrait
}
// Now dyn MyTrait is valid, but generic_method can only be called
// when the concrete type is known.
}
Rule of thumb: If you plan to use
dyn Trait, keep methods simple — no generics, noSelfin return types, noSizedbounds. When in doubt, trylet _: Box<dyn YourTrait>;and let the compiler tell you.
Trait Objects Under the Hood — vtables and Fat Pointers
A &dyn Trait (or Box<dyn Trait>) is a fat pointer — two machine words:
┌──────────────────────────────────────────────────┐
│ &dyn Drawable (on 64-bit: 16 bytes total) │
├──────────────┬───────────────────────────────────┤
│ data_ptr │ vtable_ptr │
│ (8 bytes) │ (8 bytes) │
│ ↓ │ ↓ │
│ ┌─────────┐ │ ┌──────────────────────────────┐ │
│ │ Circle │ │ │ vtable for <Circle as │ │
│ │ { │ │ │ Drawable> │ │
│ │ r: 5.0 │ │ │ │ │
│ │ } │ │ │ drop_in_place: 0x7f...a0 │ │
│ └─────────┘ │ │ size: 8 │ │
│ │ │ align: 8 │ │
│ │ │ draw: 0x7f...b4 │ │
│ │ │ bounding_box: 0x7f...c8 │ │
│ │ └──────────────────────────────┘ │
└──────────────┴───────────────────────────────────┘
How a vtable call works (e.g., shape.draw()):
- Load
vtable_ptrfrom the fat pointer (second word) - Index into the vtable to find the
drawfunction pointer - Call it, passing
data_ptras theselfargument
This is similar to C++ virtual dispatch in cost (one pointer indirection
per call), but Rust stores the vtable pointer in the fat pointer rather
than inside the object — so a plain Circle on the stack carries no
vtable pointer at all.
trait Drawable {
fn draw(&self);
fn area(&self) -> f64;
}
struct Circle { radius: f64 }
impl Drawable for Circle {
fn draw(&self) { println!("Drawing circle r={}", self.radius); }
fn area(&self) -> f64 { std::f64::consts::PI * self.radius * self.radius }
}
struct Square { side: f64 }
impl Drawable for Square {
fn draw(&self) { println!("Drawing square s={}", self.side); }
fn area(&self) -> f64 { self.side * self.side }
}
fn main() {
let shapes: Vec<Box<dyn Drawable>> = vec![
Box::new(Circle { radius: 5.0 }),
Box::new(Square { side: 3.0 }),
];
// Each element is a fat pointer: (data_ptr, vtable_ptr)
// The vtable for Circle and Square are DIFFERENT
for shape in &shapes {
shape.draw(); // vtable dispatch → Circle::draw or Square::draw
println!(" area = {:.2}", shape.area());
}
// Size comparison:
println!("size_of::<&Circle>() = {}", std::mem::size_of::<&Circle>());
// → 8 bytes (one pointer — the compiler knows the type)
println!("size_of::<&dyn Drawable>() = {}", std::mem::size_of::<&dyn Drawable>());
// → 16 bytes (data_ptr + vtable_ptr)
}
Performance cost model:
| Aspect | Static dispatch (impl Trait / generics) | Dynamic dispatch (dyn Trait) |
|---|---|---|
| Call overhead | Zero — inlined by LLVM | One pointer indirection per call |
| Inlining | ✅ Compiler can inline | ❌ Opaque function pointer |
| Binary size | Larger (one copy per type) | Smaller (one shared function) |
| Pointer size | Thin (1 word) | Fat (2 words) |
| Heterogeneous collections | ❌ | ✅ Vec<Box<dyn Trait>> |
When vtable cost matters: In tight loops calling a trait method millions of times, the indirection and inability to inline can be significant (2-10× slower). For cold paths, configuration, or plugin architectures, the flexibility of
dyn Traitis worth the small cost.
Higher-Ranked Trait Bounds (HRTBs)
Sometimes you need a function that works with references of any lifetime, not a specific one. This is where for<'a> syntax appears:
// Problem: this function needs a closure that can process
// references with ANY lifetime, not just one specific lifetime.
// ❌ This is too restrictive — 'a is fixed by the caller:
// fn apply<'a, F: Fn(&'a str) -> &'a str>(f: F, data: &'a str) -> &'a str
// ✅ HRTB: F must work for ALL possible lifetimes:
fn apply<F>(f: F, data: &str) -> &str
where
F: for<'a> Fn(&'a str) -> &'a str,
{
f(data)
}
fn main() {
let result = apply(|s| s.trim(), " hello ");
println!("{result}"); // "hello"
}
When you encounter HRTBs:
Fn(&T) -> &Utraits — the compiler infersfor<'a>automatically in most cases- Custom trait implementations that must work across different borrows
- Deserialization with
serde:for<'de> Deserialize<'de>
// serde's DeserializeOwned is defined as:
// trait DeserializeOwned: for<'de> Deserialize<'de> {}
// Meaning: "can be deserialized from data with ANY lifetime"
// (i.e., the result doesn't borrow from the input)
use serde::de::DeserializeOwned;
fn parse_json<T: DeserializeOwned>(input: &str) -> T {
serde_json::from_str(input).unwrap()
}
Practical advice: You’ll rarely write
for<'a>yourself. It mostly appears in trait bounds on closure parameters, where the compiler handles it implicitly. But recognizing it in error messages (“expected afor<'a> Fn(&'a ...)bound”) helps you understand what the compiler is asking for.
impl Trait — Argument Position vs Return Position
impl Trait appears in two positions with different semantics:
#![allow(unused)]
fn main() {
// --- Argument-Position impl Trait (APIT) ---
// "Caller chooses the type" — syntactic sugar for a generic parameter
fn print_all(items: impl Iterator<Item = i32>) {
for item in items { println!("{item}"); }
}
// Equivalent to:
fn print_all_verbose<I: Iterator<Item = i32>>(items: I) {
for item in items { println!("{item}"); }
}
// Caller decides: print_all(vec![1,2,3].into_iter())
// print_all(0..10)
// --- Return-Position impl Trait (RPIT) ---
// "Callee chooses the type" — the function picks one concrete type
fn evens(limit: i32) -> impl Iterator<Item = i32> {
(0..limit).filter(|x| x % 2 == 0)
// The concrete type is Filter<Range<i32>, Closure>
// but the caller only sees "some Iterator<Item = i32>"
}
}
Key difference:
APIT (fn foo(x: impl T)) | RPIT (fn foo() -> impl T) | |
|---|---|---|
| Who picks the type? | Caller | Callee (function body) |
| Monomorphized? | Yes — one copy per type | Yes — one concrete type |
| Turbofish? | No (foo::<X>() not allowed) | N/A |
| Equivalent to | fn foo<X: T>(x: X) | Existential type |
RPIT in Trait Definitions (RPITIT)
Since Rust 1.75, you can use -> impl Trait directly in trait definitions:
#![allow(unused)]
fn main() {
trait Container {
fn items(&self) -> impl Iterator<Item = &str>;
// ^^^^ Each implementor returns its own concrete type
}
struct CsvRow {
fields: Vec<String>,
}
impl Container for CsvRow {
fn items(&self) -> impl Iterator<Item = &str> {
self.fields.iter().map(String::as_str)
}
}
struct FixedFields;
impl Container for FixedFields {
fn items(&self) -> impl Iterator<Item = &str> {
["host", "port", "timeout"].into_iter()
}
}
}
Before Rust 1.75, you had to use
Box<dyn Iterator>or an associated type to achieve this in traits. RPITIT removes the allocation.
impl Trait vs dyn Trait — Decision Guide
Do you know the concrete type at compile time?
├── YES → Use impl Trait or generics (zero cost, inlinable)
└── NO → Do you need a heterogeneous collection?
├── YES → Use dyn Trait (Box<dyn T>, &dyn T)
└── NO → Do you need the SAME trait object across an API boundary?
├── YES → Use dyn Trait
└── NO → Use generics / impl Trait
| Feature | impl Trait | dyn Trait |
|---|---|---|
| Dispatch | Static (monomorphized) | Dynamic (vtable) |
| Performance | Best — inlinable | One indirection per call |
| Heterogeneous collections | ❌ | ✅ |
| Binary size per type | One copy each | Shared code |
| Trait must be object-safe? | No | Yes |
| Works in trait definitions | ✅ (Rust 1.75+) | Always |
Type Erasure with Any and TypeId
Sometimes you need to store values of unknown types and downcast them later — a pattern
familiar from void* in C or object in C#. Rust provides this through std::any::Any:
use std::any::Any;
// Store heterogeneous values:
fn log_value(value: &dyn Any) {
if let Some(s) = value.downcast_ref::<String>() {
println!("String: {s}");
} else if let Some(n) = value.downcast_ref::<i32>() {
println!("i32: {n}");
} else {
// TypeId lets you inspect the type at runtime:
println!("Unknown type: {:?}", value.type_id());
}
}
// Useful for plugin systems, event buses, or ECS-style architectures:
struct AnyMap(std::collections::HashMap<std::any::TypeId, Box<dyn Any + Send>>);
impl AnyMap {
fn new() -> Self { AnyMap(std::collections::HashMap::new()) }
fn insert<T: Any + Send + 'static>(&mut self, value: T) {
self.0.insert(std::any::TypeId::of::<T>(), Box::new(value));
}
fn get<T: Any + Send + 'static>(&self) -> Option<&T> {
self.0.get(&std::any::TypeId::of::<T>())?
.downcast_ref()
}
}
fn main() {
let mut map = AnyMap::new();
map.insert(42_i32);
map.insert(String::from("hello"));
assert_eq!(map.get::<i32>(), Some(&42));
assert_eq!(map.get::<String>().map(|s| s.as_str()), Some("hello"));
assert_eq!(map.get::<f64>(), None); // Never inserted
}
When to use
Any: Plugin/extension systems, type-indexed maps (typemap), error downcasting (anyhow::Error::downcast_ref). Prefer generics or trait objects when the set of types is known at compile time —Anyis a last resort that trades compile-time safety for flexibility.
Extension Traits — Adding Methods to Types You Don’t Own
Rust’s orphan rule prevents you from implementing a foreign trait on a foreign type. Extension traits are the standard workaround: define a new trait in your crate whose methods have a blanket implementation for any type that meets a bound. The caller imports the trait and the new methods appear on existing types.
This pattern is pervasive in the Rust ecosystem: itertools::Itertools, futures::StreamExt,
tokio::io::AsyncReadExt, tower::ServiceExt.
The Problem
#![allow(unused)]
fn main() {
// We want to add a .mean() method to all iterators that yield f64.
// But Iterator is defined in std and f64 is a primitive — orphan rule prevents:
//
// impl<I: Iterator<Item = f64>> I { // ❌ Cannot add inherent methods to a foreign type
// fn mean(self) -> f64 { ... }
// }
}
The Solution: An Extension Trait
#![allow(unused)]
fn main() {
/// Extension methods for iterators over numeric values.
pub trait IteratorExt: Iterator {
/// Computes the arithmetic mean. Returns `None` for empty iterators.
fn mean(self) -> Option<f64>
where
Self: Sized,
Self::Item: Into<f64>;
}
// Blanket implementation — automatically applies to ALL iterators
impl<I: Iterator> IteratorExt for I {
fn mean(self) -> Option<f64>
where
Self: Sized,
Self::Item: Into<f64>,
{
let mut sum: f64 = 0.0;
let mut count: u64 = 0;
for item in self {
sum += item.into();
count += 1;
}
if count == 0 { None } else { Some(sum / count as f64) }
}
}
// Usage — just import the trait:
use crate::IteratorExt; // One import and the method appears on all iterators
fn analyze_temperatures(readings: &[f64]) -> Option<f64> {
readings.iter().copied().mean() // .mean() is now available!
}
fn analyze_sensor_data(data: &[i32]) -> Option<f64> {
data.iter().copied().mean() // Works on i32 too (i32: Into<f64>)
}
}
Real-World Example: Diagnostic Result Extensions
#![allow(unused)]
fn main() {
use std::collections::HashMap;
struct DiagResult {
component: String,
passed: bool,
message: String,
}
/// Extension trait for Vec<DiagResult> — adds domain-specific analysis methods.
pub trait DiagResultsExt {
fn passed_count(&self) -> usize;
fn failed_count(&self) -> usize;
fn overall_pass(&self) -> bool;
fn failures_by_component(&self) -> HashMap<String, Vec<&DiagResult>>;
}
impl DiagResultsExt for Vec<DiagResult> {
fn passed_count(&self) -> usize {
self.iter().filter(|r| r.passed).count()
}
fn failed_count(&self) -> usize {
self.iter().filter(|r| !r.passed).count()
}
fn overall_pass(&self) -> bool {
self.iter().all(|r| r.passed)
}
fn failures_by_component(&self) -> HashMap<String, Vec<&DiagResult>> {
let mut map = HashMap::new();
for r in self.iter().filter(|r| !r.passed) {
map.entry(r.component.clone()).or_default().push(r);
}
map
}
}
// Now any Vec<DiagResult> has these methods:
fn report(results: Vec<DiagResult>) {
if !results.overall_pass() {
let failures = results.failures_by_component();
for (component, fails) in &failures {
eprintln!("{component}: {} failures", fails.len());
}
}
}
}
Naming Convention
The Rust ecosystem uses a consistent Ext suffix:
| Crate | Extension Trait | Extends |
|---|---|---|
itertools | Itertools | Iterator |
futures | StreamExt, FutureExt | Stream, Future |
tokio | AsyncReadExt, AsyncWriteExt | AsyncRead, AsyncWrite |
tower | ServiceExt | Service |
bytes | BufMut (partial) | &mut [u8] |
| Your crate | DiagResultsExt | Vec<DiagResult> |
When to Use
| Situation | Use Extension Trait? |
|---|---|
| Adding convenience methods to a foreign type | ✅ |
| Grouping domain-specific logic on generic collections | ✅ |
| The method needs access to private fields | ❌ (use a wrapper/newtype) |
| The method logically belongs on a new type you control | ❌ (just add it to your type) |
| You want the method available without any import | ❌ (inherent methods only) |
Enum Dispatch — Static Polymorphism Without dyn
When you have a closed set of types implementing a trait, you can replace dyn Trait
with an enum whose variants hold the concrete types. This eliminates the vtable indirection
and heap allocation while preserving the same caller-facing interface.
The Problem with dyn Trait
#![allow(unused)]
fn main() {
trait Sensor {
fn read(&self) -> f64;
fn name(&self) -> &str;
}
struct Gps { lat: f64, lon: f64 }
struct Thermometer { temp_c: f64 }
struct Accelerometer { g_force: f64 }
impl Sensor for Gps {
fn read(&self) -> f64 { self.lat }
fn name(&self) -> &str { "GPS" }
}
impl Sensor for Thermometer {
fn read(&self) -> f64 { self.temp_c }
fn name(&self) -> &str { "Thermometer" }
}
impl Sensor for Accelerometer {
fn read(&self) -> f64 { self.g_force }
fn name(&self) -> &str { "Accelerometer" }
}
// Heterogeneous collection with dyn — works, but has costs:
fn read_all_dyn(sensors: &[Box<dyn Sensor>]) -> Vec<f64> {
sensors.iter().map(|s| s.read()).collect()
// Each .read() goes through a vtable indirection
// Each Box allocates on the heap
}
}
The Enum Dispatch Solution
// Replace the trait object with an enum:
enum AnySensor {
Gps(Gps),
Thermometer(Thermometer),
Accelerometer(Accelerometer),
}
impl AnySensor {
fn read(&self) -> f64 {
match self {
AnySensor::Gps(s) => s.read(),
AnySensor::Thermometer(s) => s.read(),
AnySensor::Accelerometer(s) => s.read(),
}
}
fn name(&self) -> &str {
match self {
AnySensor::Gps(s) => s.name(),
AnySensor::Thermometer(s) => s.name(),
AnySensor::Accelerometer(s) => s.name(),
}
}
}
// Now: no heap allocation, no vtable, stored inline
fn read_all(sensors: &[AnySensor]) -> Vec<f64> {
sensors.iter().map(|s| s.read()).collect()
// Each .read() is a match branch — compiler can inline everything
}
fn main() {
let sensors = vec![
AnySensor::Gps(Gps { lat: 47.6, lon: -122.3 }),
AnySensor::Thermometer(Thermometer { temp_c: 72.5 }),
AnySensor::Accelerometer(Accelerometer { g_force: 1.02 }),
];
for sensor in &sensors {
println!("{}: {:.2}", sensor.name(), sensor.read());
}
}
Implement the Trait on the Enum
For interoperability, you can implement the original trait on the enum itself:
#![allow(unused)]
fn main() {
impl Sensor for AnySensor {
fn read(&self) -> f64 {
match self {
AnySensor::Gps(s) => s.read(),
AnySensor::Thermometer(s) => s.read(),
AnySensor::Accelerometer(s) => s.read(),
}
}
fn name(&self) -> &str {
match self {
AnySensor::Gps(s) => s.name(),
AnySensor::Thermometer(s) => s.name(),
AnySensor::Accelerometer(s) => s.name(),
}
}
}
// Now AnySensor works anywhere a Sensor is expected via generics:
fn report<S: Sensor>(s: &S) {
println!("{}: {:.2}", s.name(), s.read());
}
}
Reducing Boilerplate with a Macro
The match-arm delegation is repetitive. A macro eliminates it:
#![allow(unused)]
fn main() {
macro_rules! dispatch_sensor {
($self:expr, $method:ident $(, $arg:expr)*) => {
match $self {
AnySensor::Gps(s) => s.$method($($arg),*),
AnySensor::Thermometer(s) => s.$method($($arg),*),
AnySensor::Accelerometer(s) => s.$method($($arg),*),
}
};
}
impl Sensor for AnySensor {
fn read(&self) -> f64 { dispatch_sensor!(self, read) }
fn name(&self) -> &str { dispatch_sensor!(self, name) }
}
}
For larger projects, the enum_dispatch crate automates this entirely:
#![allow(unused)]
fn main() {
use enum_dispatch::enum_dispatch;
#[enum_dispatch]
trait Sensor {
fn read(&self) -> f64;
fn name(&self) -> &str;
}
#[enum_dispatch(Sensor)]
enum AnySensor {
Gps,
Thermometer,
Accelerometer,
}
// All delegation code is generated automatically.
}
dyn Trait vs Enum Dispatch — Decision Guide
Is the set of types closed (known at compile time)?
├── YES → Prefer enum dispatch (faster, no heap allocation)
│ ├── Few variants (< ~20)? → Manual enum
│ └── Many variants or growing? → enum_dispatch crate
└── NO → Must use dyn Trait (plugins, user-provided types)
| Property | dyn Trait | Enum Dispatch |
|---|---|---|
| Dispatch cost | Vtable indirection (~2ns) | Branch prediction (~0.3ns) |
| Heap allocation | Usually (Box) | None (inline) |
| Cache-friendly | No (pointer chasing) | Yes (contiguous) |
| Open to new types | ✅ (anyone can impl) | ❌ (closed set) |
| Code size | Shared | One copy per variant |
| Trait must be object-safe | Yes | No |
| Adding a variant | No code changes | Update enum + match arms |
When to Use Enum Dispatch
| Scenario | Recommendation |
|---|---|
| Diagnostic test types (CPU, GPU, NIC, Memory, …) | ✅ Enum dispatch — closed set, known at compile time |
| Bus protocols (SPI, I2C, UART, …) | ✅ Enum dispatch or Config trait |
| Plugin system (user loads .so at runtime) | ❌ Use dyn Trait |
| 2-3 variants | ✅ Manual enum dispatch |
| 10+ variants with many methods | ✅ enum_dispatch crate |
| Performance-critical inner loop | ✅ Enum dispatch (eliminates vtable) |
Capability Mixins — Associated Types as Zero-Cost Composition
Ruby developers compose behaviour with mixins — include SomeModule injects methods
into a class. Rust traits with associated types + default methods + blanket impls
produce the same result, except:
- Everything resolves at compile time — no method-missing surprises
- Each associated type is a knob that changes what the default methods produce
- The compiler monomorphises each combination — zero vtable overhead
The Problem: Cross-Cutting Bus Dependencies
Hardware diagnostic routines share common operations — read an IPMI sensor, toggle a GPIO rail, sample a temperature over SPI — but different diagnostics need different combinations. Inheritance hierarchies don’t exist in Rust. Passing every bus handle as a function argument creates unwieldy signatures. We need a way to mix in bus capabilities à la carte.
Step 1 — Define “Ingredient” Traits
Each ingredient provides one hardware capability via an associated type:
#![allow(unused)]
fn main() {
use std::io;
// ── Bus abstractions (traits the hardware team provides) ──────────
pub trait SpiBus {
fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> io::Result<()>;
}
pub trait I2cBus {
fn i2c_read(&self, addr: u8, reg: u8, buf: &mut [u8]) -> io::Result<()>;
fn i2c_write(&self, addr: u8, reg: u8, data: &[u8]) -> io::Result<()>;
}
pub trait GpioPin {
fn set_high(&self) -> io::Result<()>;
fn set_low(&self) -> io::Result<()>;
fn read_level(&self) -> io::Result<bool>;
}
pub trait IpmiBmc {
fn raw_command(&self, net_fn: u8, cmd: u8, data: &[u8]) -> io::Result<Vec<u8>>;
fn read_sensor(&self, sensor_id: u8) -> io::Result<f64>;
}
// ── Ingredient traits — one per bus, carries an associated type ───
pub trait HasSpi {
type Spi: SpiBus;
fn spi(&self) -> &Self::Spi;
}
pub trait HasI2c {
type I2c: I2cBus;
fn i2c(&self) -> &Self::I2c;
}
pub trait HasGpio {
type Gpio: GpioPin;
fn gpio(&self) -> &Self::Gpio;
}
pub trait HasIpmi {
type Ipmi: IpmiBmc;
fn ipmi(&self) -> &Self::Ipmi;
}
}
Each ingredient is tiny, generic, and testable in isolation.
Step 2 — Define “Mixin” Traits
A mixin trait declares its required ingredients as supertraits, then provides all its methods via defaults — implementors get them for free:
#![allow(unused)]
fn main() {
/// Mixin: fan diagnostics — needs I2C (tachometer) + GPIO (PWM enable)
pub trait FanDiagMixin: HasI2c + HasGpio {
/// Read fan RPM from the tachometer IC over I2C.
fn read_fan_rpm(&self, fan_id: u8) -> io::Result<u32> {
let mut buf = [0u8; 2];
self.i2c().i2c_read(0x48 + fan_id, 0x00, &mut buf)?;
Ok(u16::from_be_bytes(buf) as u32 * 60) // tach counts → RPM
}
/// Enable or disable the fan PWM output via GPIO.
fn set_fan_pwm(&self, enable: bool) -> io::Result<()> {
if enable { self.gpio().set_high() }
else { self.gpio().set_low() }
}
/// Full fan health check — read RPM + verify within threshold.
fn check_fan_health(&self, fan_id: u8, min_rpm: u32) -> io::Result<bool> {
let rpm = self.read_fan_rpm(fan_id)?;
Ok(rpm >= min_rpm)
}
}
/// Mixin: temperature monitoring — needs SPI (thermocouple ADC) + IPMI (BMC sensors)
pub trait TempMonitorMixin: HasSpi + HasIpmi {
/// Read a thermocouple via the SPI ADC (e.g. MAX31855).
fn read_thermocouple(&self) -> io::Result<f64> {
let mut rx = [0u8; 4];
self.spi().spi_transfer(&[0x00; 4], &mut rx)?;
let raw = i32::from_be_bytes(rx) >> 18; // 14-bit signed
Ok(raw as f64 * 0.25)
}
/// Read a BMC-managed temperature sensor via IPMI.
fn read_bmc_temp(&self, sensor_id: u8) -> io::Result<f64> {
self.ipmi().read_sensor(sensor_id)
}
/// Cross-validate: thermocouple vs BMC must agree within delta.
fn validate_temps(&self, sensor_id: u8, max_delta: f64) -> io::Result<bool> {
let tc = self.read_thermocouple()?;
let bmc = self.read_bmc_temp(sensor_id)?;
Ok((tc - bmc).abs() <= max_delta)
}
}
/// Mixin: power sequencing — needs GPIO (rail enable) + IPMI (event logging)
pub trait PowerSeqMixin: HasGpio + HasIpmi {
/// Assert the power-good GPIO and verify via IPMI sensor.
fn enable_power_rail(&self, sensor_id: u8) -> io::Result<bool> {
self.gpio().set_high()?;
std::thread::sleep(std::time::Duration::from_millis(50));
let voltage = self.ipmi().read_sensor(sensor_id)?;
Ok(voltage > 0.8) // above 80% nominal = good
}
/// De-assert power and log shutdown via IPMI OEM command.
fn disable_power_rail(&self) -> io::Result<()> {
self.gpio().set_low()?;
// Log OEM "power rail disabled" event to BMC
self.ipmi().raw_command(0x2E, 0x01, &[0x00, 0x01])?;
Ok(())
}
}
}
Step 3 — Blanket Impls Make It Truly “Mixin”
The magic line — provide the ingredients, get the methods:
#![allow(unused)]
fn main() {
impl<T: HasI2c + HasGpio> FanDiagMixin for T {}
impl<T: HasSpi + HasIpmi> TempMonitorMixin for T {}
impl<T: HasGpio + HasIpmi> PowerSeqMixin for T {}
}
Any struct that implements the right ingredient traits automatically gains every mixin method — no boilerplate, no forwarding, no inheritance.
Step 4 — Wire Up Production
#![allow(unused)]
fn main() {
// ── Concrete bus implementations (Linux platform) ────────────────
struct LinuxSpi { dev: String }
struct LinuxI2c { dev: String }
struct SysfsGpio { pin: u32 }
struct IpmiTool { timeout_secs: u32 }
impl SpiBus for LinuxSpi {
fn spi_transfer(&self, _tx: &[u8], _rx: &mut [u8]) -> io::Result<()> {
// spidev ioctl — omitted for brevity
Ok(())
}
}
impl I2cBus for LinuxI2c {
fn i2c_read(&self, _addr: u8, _reg: u8, _buf: &mut [u8]) -> io::Result<()> {
// i2c-dev ioctl — omitted for brevity
Ok(())
}
fn i2c_write(&self, _addr: u8, _reg: u8, _data: &[u8]) -> io::Result<()> { Ok(()) }
}
impl GpioPin for SysfsGpio {
fn set_high(&self) -> io::Result<()> { /* /sys/class/gpio */ Ok(()) }
fn set_low(&self) -> io::Result<()> { Ok(()) }
fn read_level(&self) -> io::Result<bool> { Ok(true) }
}
impl IpmiBmc for IpmiTool {
fn raw_command(&self, _nf: u8, _cmd: u8, _data: &[u8]) -> io::Result<Vec<u8>> {
// shells out to ipmitool — omitted for brevity
Ok(vec![])
}
fn read_sensor(&self, _id: u8) -> io::Result<f64> { Ok(25.0) }
}
// ── Production platform — all four buses ─────────────────────────
struct DiagPlatform {
spi: LinuxSpi,
i2c: LinuxI2c,
gpio: SysfsGpio,
ipmi: IpmiTool,
}
impl HasSpi for DiagPlatform { type Spi = LinuxSpi; fn spi(&self) -> &LinuxSpi { &self.spi } }
impl HasI2c for DiagPlatform { type I2c = LinuxI2c; fn i2c(&self) -> &LinuxI2c { &self.i2c } }
impl HasGpio for DiagPlatform { type Gpio = SysfsGpio; fn gpio(&self) -> &SysfsGpio { &self.gpio } }
impl HasIpmi for DiagPlatform { type Ipmi = IpmiTool; fn ipmi(&self) -> &IpmiTool { &self.ipmi } }
// DiagPlatform now has ALL mixin methods:
fn production_diagnostics(platform: &DiagPlatform) -> io::Result<()> {
let rpm = platform.read_fan_rpm(0)?; // from FanDiagMixin
let tc = platform.read_thermocouple()?; // from TempMonitorMixin
let ok = platform.enable_power_rail(42)?; // from PowerSeqMixin
println!("Fan: {rpm} RPM, Temp: {tc}°C, Power: {ok}");
Ok(())
}
}
Step 5 — Test With Mocks (No Hardware Required)
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
use std::cell::Cell;
struct MockSpi { temp: Cell<f64> }
struct MockI2c { rpm: Cell<u32> }
struct MockGpio { level: Cell<bool> }
struct MockIpmi { sensor_val: Cell<f64> }
impl SpiBus for MockSpi {
fn spi_transfer(&self, _tx: &[u8], rx: &mut [u8]) -> io::Result<()> {
// Encode mock temp as MAX31855 format
let raw = ((self.temp.get() / 0.25) as i32) << 18;
rx.copy_from_slice(&raw.to_be_bytes());
Ok(())
}
}
impl I2cBus for MockI2c {
fn i2c_read(&self, _addr: u8, _reg: u8, buf: &mut [u8]) -> io::Result<()> {
let tach = (self.rpm.get() / 60) as u16;
buf.copy_from_slice(&tach.to_be_bytes());
Ok(())
}
fn i2c_write(&self, _: u8, _: u8, _: &[u8]) -> io::Result<()> { Ok(()) }
}
impl GpioPin for MockGpio {
fn set_high(&self) -> io::Result<()> { self.level.set(true); Ok(()) }
fn set_low(&self) -> io::Result<()> { self.level.set(false); Ok(()) }
fn read_level(&self) -> io::Result<bool> { Ok(self.level.get()) }
}
impl IpmiBmc for MockIpmi {
fn raw_command(&self, _: u8, _: u8, _: &[u8]) -> io::Result<Vec<u8>> { Ok(vec![]) }
fn read_sensor(&self, _: u8) -> io::Result<f64> { Ok(self.sensor_val.get()) }
}
// ── Partial platform: only fan-related buses ─────────────────
struct FanTestRig {
i2c: MockI2c,
gpio: MockGpio,
}
impl HasI2c for FanTestRig { type I2c = MockI2c; fn i2c(&self) -> &MockI2c { &self.i2c } }
impl HasGpio for FanTestRig { type Gpio = MockGpio; fn gpio(&self) -> &MockGpio { &self.gpio } }
// FanTestRig gets FanDiagMixin but NOT TempMonitorMixin or PowerSeqMixin
#[test]
fn fan_health_check_passes_above_threshold() {
let rig = FanTestRig {
i2c: MockI2c { rpm: Cell::new(6000) },
gpio: MockGpio { level: Cell::new(false) },
};
assert!(rig.check_fan_health(0, 4000).unwrap());
}
#[test]
fn fan_health_check_fails_below_threshold() {
let rig = FanTestRig {
i2c: MockI2c { rpm: Cell::new(2000) },
gpio: MockGpio { level: Cell::new(false) },
};
assert!(!rig.check_fan_health(0, 4000).unwrap());
}
}
}
Notice that FanTestRig only implements HasI2c + HasGpio — it gets FanDiagMixin
automatically, but the compiler refuses rig.read_thermocouple() because HasSpi
is not satisfied. This is mixin scoping enforced at compile time.
Conditional Methods — Beyond What Ruby Can Do
Add where bounds to individual default methods. The method only exists when
the associated type satisfies the extra bound:
#![allow(unused)]
fn main() {
/// Marker trait for DMA-capable SPI controllers
pub trait DmaCapable: SpiBus {
fn dma_transfer(&self, tx: &[u8], rx: &mut [u8]) -> io::Result<()>;
}
/// Marker trait for interrupt-capable GPIO pins
pub trait InterruptCapable: GpioPin {
fn wait_for_edge(&self, timeout_ms: u32) -> io::Result<bool>;
}
pub trait AdvancedDiagMixin: HasSpi + HasGpio {
// Always available
fn basic_probe(&self) -> io::Result<bool> {
let mut rx = [0u8; 1];
self.spi().spi_transfer(&[0xFF], &mut rx)?;
Ok(rx[0] != 0x00)
}
// Only exists when the SPI controller supports DMA
fn bulk_sensor_read(&self, buf: &mut [u8]) -> io::Result<()>
where
Self::Spi: DmaCapable,
{
self.spi().dma_transfer(&vec![0x00; buf.len()], buf)
}
// Only exists when the GPIO pin supports interrupts
fn wait_for_fault_signal(&self, timeout_ms: u32) -> io::Result<bool>
where
Self::Gpio: InterruptCapable,
{
self.gpio().wait_for_edge(timeout_ms)
}
}
impl<T: HasSpi + HasGpio> AdvancedDiagMixin for T {}
}
If your platform’s SPI doesn’t support DMA, calling bulk_sensor_read() is a
compile error, not a runtime crash. Ruby’s respond_to? check is the closest
equivalent — but it happens at deploy time, not compile time.
Composability: Stacking Mixins
Multiple mixins can share the same ingredient — no diamond problem:
┌─────────────┐ ┌───────────┐ ┌──────────────┐
│ FanDiagMixin│ │TempMonitor│ │ PowerSeqMixin│
│ (I2C+GPIO) │ │ (SPI+IPMI)│ │ (GPIO+IPMI) │
└──────┬──────┘ └─────┬─────┘ └──────┬───────┘
│ │ │
│ ┌─────────────┴─────────────┐ │
└──►│ DiagPlatform │◄──┘
│ HasSpi+HasI2c+HasGpio │
│ +HasIpmi │
└───────────────────────────┘
DiagPlatform implements HasGpio once, and both FanDiagMixin and
PowerSeqMixin use the same self.gpio(). In Ruby, this would be two modules
both calling self.gpio_pin — but if they expected different pin numbers, you’d
discover the conflict at runtime. In Rust, you can disambiguate at the type level.
Comparison: Ruby Mixins vs Rust Capability Mixins
| Dimension | Ruby Mixins | Rust Capability Mixins |
|---|---|---|
| Dispatch | Runtime (method table lookup) | Compile-time (monomorphised) |
| Safe composition | MRO linearisation hides conflicts | Compiler rejects ambiguity |
| Conditional methods | respond_to? at runtime | where bounds at compile time |
| Overhead | Method dispatch + GC | Zero-cost (inlined) |
| Testability | Stub/mock via metaprogramming | Generic over mock types |
| Adding new buses | include at runtime | Add ingredient trait, recompile |
| Runtime flexibility | extend, prepend, open classes | None (fully static) |
When to Use Capability Mixins
| Scenario | Use Mixins? |
|---|---|
| Multiple diagnostics share bus-reading logic | ✅ |
| Test harness needs different bus subsets | ✅ (partial ingredient structs) |
| Methods only valid for certain bus capabilities (DMA, IRQ) | ✅ (conditional where bounds) |
| You need runtime module loading (plugins) | ❌ (use dyn Trait or enum dispatch) |
| Single struct with one bus — no sharing needed | ❌ (keep it simple) |
| Cross-crate ingredients with coherence issues | ⚠️ (use newtype wrappers) |
Key Takeaways — Capability Mixins
- Ingredient trait = associated type + accessor method (e.g.,
HasSpi)- Mixin trait = supertrait bounds on ingredients + default method bodies
- Blanket impl =
impl<T: HasX + HasY> Mixin for T {}— auto-injects methods- Conditional methods =
where Self::Spi: DmaCapableon individual defaults- Partial platforms = test structs that only impl the needed ingredients
- No runtime cost — the compiler generates specialised code for each platform type
Typed Commands — GADT-Style Return Type Safety
In Haskell, Generalised Algebraic Data Types (GADTs) let each constructor of a
data type refine the type parameter — so Expr Int and Expr Bool are enforced by
the type checker. Rust has no direct GADT syntax, but traits with associated types
achieve the same guarantee: the command type determines the response type, and
mixing them up is a compile error.
This pattern is particularly powerful for hardware diagnostics, where IPMI commands, register reads, and sensor queries each return different physical quantities that should never be confused.
The Problem: The Untyped Vec<u8> Swamp
Most C/C++ IPMI stacks — and naïve Rust ports — use raw bytes everywhere:
#![allow(unused)]
fn main() {
use std::io;
struct BmcConnectionUntyped { timeout_secs: u32 }
impl BmcConnectionUntyped {
fn raw_command(&self, net_fn: u8, cmd: u8, data: &[u8]) -> io::Result<Vec<u8>> {
// ... shells out to ipmitool ...
Ok(vec![0x00, 0x19, 0x00]) // stub
}
}
fn diagnose_thermal_untyped(bmc: &BmcConnectionUntyped) -> io::Result<()> {
// Read CPU temperature — sensor ID 0x20
let raw = bmc.raw_command(0x04, 0x2D, &[0x20])?;
let cpu_temp = raw[0] as f64; // 🤞 hope byte 0 is the reading
// Read fan speed — sensor ID 0x30
let raw = bmc.raw_command(0x04, 0x2D, &[0x30])?;
let fan_rpm = raw[0] as u32; // 🐛 BUG: fan speed is 2 bytes LE
// Read inlet voltage — sensor ID 0x40
let raw = bmc.raw_command(0x04, 0x2D, &[0x40])?;
let voltage = raw[0] as f64; // 🐛 BUG: need to divide by 1000
// 🐛 Comparing °C to RPM — compiles, but nonsensical
if cpu_temp > fan_rpm as f64 {
println!("uh oh");
}
// 🐛 Passing Volts as temperature — compiles fine
log_temp_untyped(voltage);
log_volts_untyped(cpu_temp);
Ok(())
}
fn log_temp_untyped(t: f64) { println!("Temp: {t}°C"); }
fn log_volts_untyped(v: f64) { println!("Voltage: {v}V"); }
}
Every reading is f64 — the compiler has no idea that one is a temperature, another
is RPM, another is voltage. Four distinct bugs compile without warning:
| # | Bug | Consequence | Discovered |
|---|---|---|---|
| 1 | Fan RPM parsed as 1 byte instead of 2 | Reads 25 RPM instead of 6400 | Production, 3 AM fan-failure flood |
| 2 | Voltage not divided by 1000 | 12000V instead of 12.0V | Threshold check flags every PSU |
| 3 | Comparing °C to RPM | Meaningless boolean | Possibly never |
| 4 | Voltage passed to log_temp_untyped() | Silent data corruption in logs | 6 months later, reading history |
The Solution: Typed Commands via Associated Types
Step 1 — Domain newtypes
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Celsius(f64);
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Rpm(u32);
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Volts(f64);
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
struct Watts(f64);
}
Step 2 — The command trait (the GADT equivalent)
The associated type Response is the key — it binds each command to its return type:
#![allow(unused)]
fn main() {
trait IpmiCmd {
/// The GADT "index" — determines what execute() returns.
type Response;
fn net_fn(&self) -> u8;
fn cmd_byte(&self) -> u8;
fn payload(&self) -> Vec<u8>;
/// Parsing is encapsulated HERE — each command knows its own byte layout.
fn parse_response(&self, raw: &[u8]) -> io::Result<Self::Response>;
}
}
Step 3 — One struct per command, parsing written once
#![allow(unused)]
fn main() {
struct ReadTemp { sensor_id: u8 }
impl IpmiCmd for ReadTemp {
type Response = Celsius; // ← "this command returns a temperature"
fn net_fn(&self) -> u8 { 0x04 }
fn cmd_byte(&self) -> u8 { 0x2D }
fn payload(&self) -> Vec<u8> { vec![self.sensor_id] }
fn parse_response(&self, raw: &[u8]) -> io::Result<Celsius> {
// Signed byte per IPMI SDR — written once, tested once
Ok(Celsius(raw[0] as i8 as f64))
}
}
struct ReadFanSpeed { fan_id: u8 }
impl IpmiCmd for ReadFanSpeed {
type Response = Rpm; // ← "this command returns RPM"
fn net_fn(&self) -> u8 { 0x04 }
fn cmd_byte(&self) -> u8 { 0x2D }
fn payload(&self) -> Vec<u8> { vec![self.fan_id] }
fn parse_response(&self, raw: &[u8]) -> io::Result<Rpm> {
// 2-byte LE — the correct layout, encoded once
Ok(Rpm(u16::from_le_bytes([raw[0], raw[1]]) as u32))
}
}
struct ReadVoltage { rail: u8 }
impl IpmiCmd for ReadVoltage {
type Response = Volts; // ← "this command returns voltage"
fn net_fn(&self) -> u8 { 0x04 }
fn cmd_byte(&self) -> u8 { 0x2D }
fn payload(&self) -> Vec<u8> { vec![self.rail] }
fn parse_response(&self, raw: &[u8]) -> io::Result<Volts> {
// Millivolts → Volts, always correct
Ok(Volts(u16::from_le_bytes([raw[0], raw[1]]) as f64 / 1000.0))
}
}
struct ReadFru { fru_id: u8 }
impl IpmiCmd for ReadFru {
type Response = String;
fn net_fn(&self) -> u8 { 0x0A }
fn cmd_byte(&self) -> u8 { 0x11 }
fn payload(&self) -> Vec<u8> { vec![self.fru_id, 0x00, 0x00, 0xFF] }
fn parse_response(&self, raw: &[u8]) -> io::Result<String> {
Ok(String::from_utf8_lossy(raw).to_string())
}
}
}
Step 4 — The executor (zero dyn, monomorphised)
#![allow(unused)]
fn main() {
struct BmcConnection { timeout_secs: u32 }
impl BmcConnection {
/// Generic over any command — compiler generates one version per command type.
fn execute<C: IpmiCmd>(&self, cmd: &C) -> io::Result<C::Response> {
let raw = self.raw_send(cmd.net_fn(), cmd.cmd_byte(), &cmd.payload())?;
cmd.parse_response(&raw)
}
fn raw_send(&self, _nf: u8, _cmd: u8, _data: &[u8]) -> io::Result<Vec<u8>> {
Ok(vec![0x19, 0x00]) // stub — real impl calls ipmitool
}
}
}
Step 5 — Caller code: all four bugs become compile errors
#![allow(unused)]
fn main() {
fn diagnose_thermal(bmc: &BmcConnection) -> io::Result<()> {
let cpu_temp: Celsius = bmc.execute(&ReadTemp { sensor_id: 0x20 })?;
let fan_rpm: Rpm = bmc.execute(&ReadFanSpeed { fan_id: 0x30 })?;
let voltage: Volts = bmc.execute(&ReadVoltage { rail: 0x40 })?;
// Bug #1 — IMPOSSIBLE: parsing lives in ReadFanSpeed::parse_response
// Bug #2 — IMPOSSIBLE: scaling lives in ReadVoltage::parse_response
// Bug #3 — COMPILE ERROR:
// if cpu_temp > fan_rpm { }
// ^^^^^^^^ ^^^^^^^
// Celsius Rpm → "mismatched types" ❌
// Bug #4 — COMPILE ERROR:
// log_temperature(voltage);
// ^^^^^^^ Volts, expected Celsius ❌
// Only correct comparisons compile:
if cpu_temp > Celsius(85.0) {
println!("CPU overheating: {:?}", cpu_temp);
}
if fan_rpm < Rpm(4000) {
println!("Fan too slow: {:?}", fan_rpm);
}
Ok(())
}
fn log_temperature(t: Celsius) { println!("Temp: {:?}", t); }
fn log_voltage(v: Volts) { println!("Voltage: {:?}", v); }
}
Macro DSL for Diagnostic Scripts
For large diagnostic routines that run many commands in sequence, a macro gives concise declarative syntax while preserving full type safety:
#![allow(unused)]
fn main() {
/// Execute a series of typed IPMI commands, returning a tuple of results.
/// Each element of the tuple has the command's own Response type.
macro_rules! diag_script {
($bmc:expr; $($cmd:expr),+ $(,)?) => {{
( $( $bmc.execute(&$cmd)?, )+ )
}};
}
fn full_pre_flight(bmc: &BmcConnection) -> io::Result<()> {
// Expands to: (Celsius, Rpm, Volts, String) — every type tracked
let (temp, rpm, volts, board_pn) = diag_script!(bmc;
ReadTemp { sensor_id: 0x20 },
ReadFanSpeed { fan_id: 0x30 },
ReadVoltage { rail: 0x40 },
ReadFru { fru_id: 0x00 },
);
println!("Board: {:?}", board_pn);
println!("CPU: {:?}, Fan: {:?}, 12V: {:?}", temp, rpm, volts);
// Type-safe threshold checks:
assert!(temp < Celsius(95.0), "CPU too hot");
assert!(rpm > Rpm(3000), "Fan too slow");
assert!(volts > Volts(11.4), "12V rail sagging");
Ok(())
}
}
The macro is just syntactic sugar — the tuple type (Celsius, Rpm, Volts, String) is
fully inferred by the compiler. Swap two commands and the destructuring breaks at
compile time, not at runtime.
Enum Dispatch for Heterogeneous Command Lists
When you need a Vec of mixed commands (e.g., a configurable script loaded from JSON),
use enum dispatch to stay dyn-free:
#![allow(unused)]
fn main() {
enum AnyReading {
Temp(Celsius),
Rpm(Rpm),
Volt(Volts),
Text(String),
}
enum AnyCmd {
Temp(ReadTemp),
Fan(ReadFanSpeed),
Voltage(ReadVoltage),
Fru(ReadFru),
}
impl AnyCmd {
fn execute(&self, bmc: &BmcConnection) -> io::Result<AnyReading> {
match self {
AnyCmd::Temp(c) => Ok(AnyReading::Temp(bmc.execute(c)?)),
AnyCmd::Fan(c) => Ok(AnyReading::Rpm(bmc.execute(c)?)),
AnyCmd::Voltage(c) => Ok(AnyReading::Volt(bmc.execute(c)?)),
AnyCmd::Fru(c) => Ok(AnyReading::Text(bmc.execute(c)?)),
}
}
}
/// Dynamic diagnostic script — commands loaded at runtime
fn run_script(bmc: &BmcConnection, script: &[AnyCmd]) -> io::Result<Vec<AnyReading>> {
script.iter().map(|cmd| cmd.execute(bmc)).collect()
}
}
You lose per-element type tracking (everything is AnyReading), but you gain
runtime flexibility — and the parsing is still encapsulated in each IpmiCmd impl.
Testing Typed Commands
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
struct StubBmc {
responses: std::collections::HashMap<u8, Vec<u8>>,
}
impl StubBmc {
fn execute<C: IpmiCmd>(&self, cmd: &C) -> io::Result<C::Response> {
let key = cmd.payload()[0]; // sensor ID as key
let raw = self.responses.get(&key)
.ok_or_else(|| io::Error::new(io::ErrorKind::NotFound, "no stub"))?;
cmd.parse_response(raw)
}
}
#[test]
fn read_temp_parses_signed_byte() {
let bmc = StubBmc {
responses: [( 0x20, vec![0xE7] )].into() // -25 as i8 = 0xE7
};
let temp = bmc.execute(&ReadTemp { sensor_id: 0x20 }).unwrap();
assert_eq!(temp, Celsius(-25.0));
}
#[test]
fn read_fan_parses_two_byte_le() {
let bmc = StubBmc {
responses: [( 0x30, vec![0x00, 0x19] )].into() // 0x1900 = 6400
};
let rpm = bmc.execute(&ReadFanSpeed { fan_id: 0x30 }).unwrap();
assert_eq!(rpm, Rpm(6400));
}
#[test]
fn read_voltage_scales_millivolts() {
let bmc = StubBmc {
responses: [( 0x40, vec![0xE8, 0x2E] )].into() // 0x2EE8 = 12008 mV
};
let v = bmc.execute(&ReadVoltage { rail: 0x40 }).unwrap();
assert!((v.0 - 12.008).abs() < 0.001);
}
}
}
Each command’s parsing is tested independently. If ReadFanSpeed changes from 2-byte
LE to 4-byte BE in a new IPMI spec revision, you update one parse_response and
the test catches regressions.
How This Maps to Haskell GADTs
Haskell GADT Rust Equivalent
──────────────── ───────────────────────
data Cmd a where trait IpmiCmd {
ReadTemp :: SensorId -> Cmd Temp type Response;
ReadFan :: FanId -> Cmd Rpm ...
}
eval :: Cmd a -> IO a fn execute<C: IpmiCmd>(&self, cmd: &C)
-> io::Result<C::Response>
Type refinement in case branches Monomorphisation: compiler generates
execute::<ReadTemp>() → returns Celsius
execute::<ReadFanSpeed>() → returns Rpm
Both guarantee: the command determines the return type. Rust achieves it through generic monomorphisation instead of type-level case analysis — same safety, zero runtime cost.
Before vs After Summary
| Dimension | Untyped (Vec<u8>) | Typed Commands |
|---|---|---|
| Lines per sensor | ~3 (duplicated at every call site) | ~15 (written and tested once) |
| Parsing errors possible | At every call site | In one parse_response impl |
| Unit confusion bugs | Unlimited | Zero (compile error) |
| Adding a new sensor | Touch N files, copy-paste parsing | Add 1 struct + 1 impl |
| Runtime cost | — | Identical (monomorphised) |
| IDE autocomplete | f64 everywhere | Celsius, Rpm, Volts — self-documenting |
| Code review burden | Must verify every raw byte parse | Verify one parse_response per sensor |
| Macro DSL | N/A | diag_script!(bmc; ReadTemp{..}, ReadFan{..}) → (Celsius, Rpm) |
| Dynamic scripts | Manual dispatch | AnyCmd enum — still dyn-free |
When to Use Typed Commands
| Scenario | Recommendation |
|---|---|
| IPMI sensor reads with distinct physical units | ✅ Typed commands |
| Register map with different-width fields | ✅ Typed commands |
| Network protocol messages (request → response) | ✅ Typed commands |
| Single command type with one return format | ❌ Overkill — just return the type directly |
| Prototyping / exploring an unknown device | ❌ Raw bytes first, type later |
| Plugin system where commands aren’t known at compile time | ⚠️ Use AnyCmd enum dispatch |
Key Takeaways — Traits
- Associated types = one impl per type; generic parameters = many impls per type
- GATs unlock lending iterators and async-in-traits patterns
- Use enum dispatch for closed sets (fast);
dyn Traitfor open sets (flexible)Any+TypeIdis the escape hatch when compile-time types are unknown
See also: Ch 1 — Generics for monomorphization and when generics cause code bloat. Ch 3 — Newtype & Type-State for using traits with the config trait pattern.
Exercise: Repository with Associated Types ★★★ (~40 min)
Design a Repository trait with associated Error, Id, and Item types. Implement it for an in-memory store and demonstrate compile-time type safety.
🔑 Solution
use std::collections::HashMap;
trait Repository {
type Item;
type Id;
type Error;
fn get(&self, id: &Self::Id) -> Result<Option<&Self::Item>, Self::Error>;
fn insert(&mut self, item: Self::Item) -> Result<Self::Id, Self::Error>;
fn delete(&mut self, id: &Self::Id) -> Result<bool, Self::Error>;
}
#[derive(Debug, Clone)]
struct User {
name: String,
email: String,
}
struct InMemoryUserRepo {
data: HashMap<u64, User>,
next_id: u64,
}
impl InMemoryUserRepo {
fn new() -> Self {
InMemoryUserRepo { data: HashMap::new(), next_id: 1 }
}
}
impl Repository for InMemoryUserRepo {
type Item = User;
type Id = u64;
type Error = std::convert::Infallible;
fn get(&self, id: &u64) -> Result<Option<&User>, Self::Error> {
Ok(self.data.get(id))
}
fn insert(&mut self, item: User) -> Result<u64, Self::Error> {
let id = self.next_id;
self.next_id += 1;
self.data.insert(id, item);
Ok(id)
}
fn delete(&mut self, id: &u64) -> Result<bool, Self::Error> {
Ok(self.data.remove(id).is_some())
}
}
fn create_and_fetch<R: Repository>(repo: &mut R, item: R::Item) -> Result<(), R::Error>
where
R::Item: std::fmt::Debug,
R::Id: std::fmt::Debug,
{
let id = repo.insert(item)?;
println!("Inserted with id: {id:?}");
let retrieved = repo.get(&id)?;
println!("Retrieved: {retrieved:?}");
Ok(())
}
fn main() {
let mut repo = InMemoryUserRepo::new();
create_and_fetch(&mut repo, User {
name: "Alice".into(),
email: "alice@example.com".into(),
}).unwrap();
}
3. The Newtype and Type-State Patterns 🟡
What you’ll learn:
- The newtype pattern for zero-cost compile-time type safety
- Type-state pattern: making illegal state transitions unrepresentable
- Builder pattern with type states for compile-time–enforced construction
- Config trait pattern for taming generic parameter explosion
Newtype: Zero-Cost Type Safety
The newtype pattern wraps an existing type in a single-field tuple struct to create a distinct type with zero runtime overhead:
#![allow(unused)]
fn main() {
// Without newtypes — easy to mix up:
fn create_user(name: String, email: String, age: u32, employee_id: u32) { }
// create_user(name, email, age, id); — but what if we swap age and id?
// create_user(name, email, id, age); — COMPILES FINE, BUG
// With newtypes — the compiler catches mistakes:
struct UserName(String);
struct Email(String);
struct Age(u32);
struct EmployeeId(u32);
fn create_user(name: UserName, email: Email, age: Age, id: EmployeeId) { }
// create_user(name, email, EmployeeId(42), Age(30));
// ❌ Compile error: expected Age, got EmployeeId
}
impl Deref for Newtypes — Power and Pitfalls
Implementing Deref on a newtype lets it auto-coerce to the inner type’s
reference, giving you all of the inner type’s methods “for free”:
#![allow(unused)]
fn main() {
use std::ops::Deref;
struct Email(String);
impl Email {
fn new(raw: &str) -> Result<Self, &'static str> {
if raw.contains('@') {
Ok(Email(raw.to_string()))
} else {
Err("invalid email: missing @")
}
}
}
impl Deref for Email {
type Target = str;
fn deref(&self) -> &str { &self.0 }
}
// Now Email auto-derefs to &str:
let email = Email::new("user@example.com").unwrap();
println!("Length: {}", email.len()); // Uses str::len via Deref
}
This is convenient — but it effectively punches a hole through your newtype’s abstraction boundary because every method on the target type becomes callable on your wrapper.
When Deref IS appropriate
| Scenario | Example | Why it’s fine |
|---|---|---|
| Smart-pointer wrappers | Box<T>, Arc<T>, MutexGuard<T> | The wrapper’s whole purpose is to behave like T |
| Transparent “thin” wrappers | String → str, PathBuf → Path, Vec<T> → [T] | The wrapper IS-A superset of the target |
| Your newtype genuinely IS the inner type | struct Hostname(String) where you always want full string ops | Restricting the API would add no value |
When Deref is an anti-pattern
| Scenario | Problem |
|---|---|
| Domain types with invariants | Email derefs to &str, so callers can call .split_at(), .trim(), etc. — none of which preserve the “must contain @” invariant. If someone stores the trimmed &str and reconstructs, the invariant is lost. |
| Types where you want a restricted API | struct Password(String) with Deref<Target = str> leaks .as_bytes(), .chars(), Debug output — exactly what you’re trying to hide. |
| Fake inheritance | Using Deref to make ManagerWidget auto-deref to Widget simulates OOP inheritance. This is explicitly discouraged — see the Rust API Guidelines (C-DEREF). |
Rule of thumb: If your newtype exists to add type safety or restrict the API, don’t implement
Deref. If it exists to add capabilities while keeping the inner type’s full surface (like a smart pointer),Derefis the right choice.
DerefMut — doubles the risk
If you also implement DerefMut, callers can mutate the inner value
directly, bypassing any validation in your constructors:
#![allow(unused)]
fn main() {
use std::ops::{Deref, DerefMut};
struct PortNumber(u16);
impl Deref for PortNumber {
type Target = u16;
fn deref(&self) -> &u16 { &self.0 }
}
impl DerefMut for PortNumber {
fn deref_mut(&mut self) -> &mut u16 { &mut self.0 }
}
let mut port = PortNumber(443);
*port = 0; // Bypasses any validation — now an invalid port
}
Only implement DerefMut when the inner type has no invariants to protect.
Prefer explicit delegation instead
When you want only some of the inner type’s methods, delegate explicitly:
#![allow(unused)]
fn main() {
struct Email(String);
impl Email {
fn new(raw: &str) -> Result<Self, &'static str> {
if raw.contains('@') { Ok(Email(raw.to_string())) }
else { Err("missing @") }
}
// Expose only what makes sense:
pub fn as_str(&self) -> &str { &self.0 }
pub fn len(&self) -> usize { self.0.len() }
pub fn domain(&self) -> &str {
self.0.split('@').nth(1).unwrap_or("")
}
// .split_at(), .trim(), .replace() — NOT exposed
}
}
Clippy and the ecosystem
clippy::wrong_self_conventioncan fire whenDerefcoercion makes method resolution surprising (e.g.,is_empty()resolving to the inner type’s version instead of one you intended to shadow).- The Rust API Guidelines (C-DEREF) state: “only smart pointers
should implement
Deref.” Treat this as a strong default; deviate only with clear justification. - If you need trait compatibility (e.g., passing
Emailto functions expecting&str), consider implementingAsRef<str>andBorrow<str>instead — they’re explicit conversions without auto-coercion surprises.
Decision matrix
Do you want ALL methods of the inner type to be callable?
├─ YES → Does your type enforce invariants or restrict the API?
│ ├─ NO → impl Deref ✅ (smart-pointer / transparent wrapper)
│ └─ YES → Don't impl Deref ❌ (invariant leaks)
└─ NO → Don't impl Deref ❌ (use AsRef / explicit delegation)
Type-State: Compile-Time Protocol Enforcement
The type-state pattern uses the type system to enforce that operations happen in the correct order. Invalid states become unrepresentable.
stateDiagram-v2
[*] --> Disconnected: new()
Disconnected --> Connected: connect()
Connected --> Authenticated: authenticate()
Authenticated --> Authenticated: request()
Authenticated --> [*]: drop
Disconnected --> Disconnected: ❌ request() won't compile
Connected --> Connected: ❌ request() won't compile
Each transition consumes
selfand returns a new type — the compiler enforces valid ordering.
// Problem: A network connection that must be:
// 1. Created
// 2. Connected
// 3. Authenticated
// 4. Then used for requests
// Calling request() before authenticate() should be a COMPILE error.
// --- Type-state markers (zero-sized types) ---
struct Disconnected;
struct Connected;
struct Authenticated;
// --- Connection parameterized by state ---
struct Connection<State> {
address: String,
_state: std::marker::PhantomData<State>,
}
// Only Disconnected connections can connect:
impl Connection<Disconnected> {
fn new(address: &str) -> Self {
Connection {
address: address.to_string(),
_state: std::marker::PhantomData,
}
}
fn connect(self) -> Connection<Connected> {
println!("Connecting to {}...", self.address);
Connection {
address: self.address,
_state: std::marker::PhantomData,
}
}
}
// Only Connected connections can authenticate:
impl Connection<Connected> {
fn authenticate(self, _token: &str) -> Connection<Authenticated> {
println!("Authenticating...");
Connection {
address: self.address,
_state: std::marker::PhantomData,
}
}
}
// Only Authenticated connections can make requests:
impl Connection<Authenticated> {
fn request(&self, path: &str) -> String {
format!("GET {} from {}", path, self.address)
}
}
fn main() {
let conn = Connection::new("api.example.com");
// conn.request("/data"); // ❌ Compile error: no method `request` on Connection<Disconnected>
let conn = conn.connect();
// conn.request("/data"); // ❌ Compile error: no method `request` on Connection<Connected>
let conn = conn.authenticate("secret-token");
let response = conn.request("/data"); // ✅ Only works after authentication
println!("{response}");
}
Key insight: Each state transition consumes
selfand returns a new type. You can’t use the old state after transitioning — the compiler enforces it. Zero runtime cost —PhantomDatais zero-sized, states are erased at compile time.
Comparison with C++/C#: In C++ or C#, you’d enforce this with runtime checks (if (!authenticated) throw ...). The Rust type-state pattern moves these checks to compile time — invalid states are literally unrepresentable in the type system.
Builder Pattern with Type States
A practical application — a builder that enforces required fields:
use std::marker::PhantomData;
// Marker types for required fields
struct NeedsName;
struct NeedsPort;
struct Ready;
struct ServerConfig<State> {
name: Option<String>,
port: Option<u16>,
max_connections: usize, // Optional, has default
_state: PhantomData<State>,
}
impl ServerConfig<NeedsName> {
fn new() -> Self {
ServerConfig {
name: None,
port: None,
max_connections: 100,
_state: PhantomData,
}
}
fn name(self, name: &str) -> ServerConfig<NeedsPort> {
ServerConfig {
name: Some(name.to_string()),
port: self.port,
max_connections: self.max_connections,
_state: PhantomData,
}
}
}
impl ServerConfig<NeedsPort> {
fn port(self, port: u16) -> ServerConfig<Ready> {
ServerConfig {
name: self.name,
port: Some(port),
max_connections: self.max_connections,
_state: PhantomData,
}
}
}
impl ServerConfig<Ready> {
fn max_connections(mut self, n: usize) -> Self {
self.max_connections = n;
self
}
fn build(self) -> Server {
Server {
name: self.name.unwrap(),
port: self.port.unwrap(),
max_connections: self.max_connections,
}
}
}
struct Server {
name: String,
port: u16,
max_connections: usize,
}
fn main() {
// Must provide name, then port, then can build:
let server = ServerConfig::new()
.name("my-server")
.port(8080)
.max_connections(500)
.build();
// ServerConfig::new().port(8080); // ❌ Compile error: no method `port` on NeedsName
// ServerConfig::new().name("x").build(); // ❌ Compile error: no method `build` on NeedsPort
}
Case Study: Type-Safe Connection Pool
Real-world systems need connection pools where connections move through well-defined states. Here’s how the typestate pattern enforces correctness in a production pool:
stateDiagram-v2
[*] --> Idle: pool.acquire()
Idle --> Active: conn.begin_transaction()
Active --> Active: conn.execute(query)
Active --> Idle: conn.commit() / conn.rollback()
Idle --> [*]: pool.release(conn)
Active --> [*]: ❌ cannot release mid-transaction
use std::marker::PhantomData;
// States
struct Idle;
struct InTransaction;
struct PooledConnection<State> {
id: u32,
_state: PhantomData<State>,
}
struct Pool {
next_id: u32,
}
impl Pool {
fn new() -> Self { Pool { next_id: 0 } }
fn acquire(&mut self) -> PooledConnection<Idle> {
self.next_id += 1;
println!("[pool] Acquired connection #{}", self.next_id);
PooledConnection { id: self.next_id, _state: PhantomData }
}
// Only idle connections can be released — prevents mid-transaction leaks
fn release(&self, conn: PooledConnection<Idle>) {
println!("[pool] Released connection #{}", conn.id);
}
}
impl PooledConnection<Idle> {
fn begin_transaction(self) -> PooledConnection<InTransaction> {
println!("[conn #{}] BEGIN", self.id);
PooledConnection { id: self.id, _state: PhantomData }
}
}
impl PooledConnection<InTransaction> {
fn execute(&self, query: &str) {
println!("[conn #{}] EXEC: {}", self.id, query);
}
fn commit(self) -> PooledConnection<Idle> {
println!("[conn #{}] COMMIT", self.id);
PooledConnection { id: self.id, _state: PhantomData }
}
fn rollback(self) -> PooledConnection<Idle> {
println!("[conn #{}] ROLLBACK", self.id);
PooledConnection { id: self.id, _state: PhantomData }
}
}
fn main() {
let mut pool = Pool::new();
let conn = pool.acquire();
let conn = conn.begin_transaction();
conn.execute("INSERT INTO users VALUES ('Alice')");
conn.execute("INSERT INTO orders VALUES (1, 42)");
let conn = conn.commit(); // Back to Idle
pool.release(conn); // ✅ Only works on Idle connections
// pool.release(conn_active); // ❌ Compile error: can't release InTransaction
}
Why this matters in production: A connection leaked mid-transaction holds database locks indefinitely. The typestate pattern makes this impossible — you literally cannot return a connection to the pool until the transaction is committed or rolled back.
Config Trait Pattern — Taming Generic Parameter Explosion
The Problem
As a struct takes on more responsibilities, each backed by a trait-constrained generic, the type signature grows unwieldy:
#![allow(unused)]
fn main() {
trait SpiBus { fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> Result<(), BusError>; }
trait ComPort { fn com_send(&self, data: &[u8]) -> Result<usize, BusError>; }
trait I3cBus { fn i3c_read(&self, addr: u8, buf: &mut [u8]) -> Result<(), BusError>; }
trait SmBus { fn smbus_read_byte(&self, addr: u8, cmd: u8) -> Result<u8, BusError>; }
trait GpioBus { fn gpio_set(&self, pin: u32, high: bool); }
// ❌ Every new bus trait adds another generic parameter
struct DiagController<S: SpiBus, C: ComPort, I: I3cBus, M: SmBus, G: GpioBus> {
spi: S,
com: C,
i3c: I,
smbus: M,
gpio: G,
}
// impl blocks, function signatures, and callers all repeat the full list.
// Adding a 6th bus means editing every mention of DiagController<S, C, I, M, G>.
}
This is often called “generic parameter explosion.” It compounds across impl blocks,
function parameters, and downstream consumers — each of which must repeat the full
parameter list.
The Solution: A Config Trait
Bundle all associated types into a single trait. The struct then has one generic parameter regardless of how many component types it contains:
#![allow(unused)]
fn main() {
#[derive(Debug)]
enum BusError {
Timeout,
NakReceived,
HardwareFault(String),
}
// --- Bus traits (unchanged) ---
trait SpiBus {
fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> Result<(), BusError>;
fn spi_write(&self, data: &[u8]) -> Result<(), BusError>;
}
trait ComPort {
fn com_send(&self, data: &[u8]) -> Result<usize, BusError>;
fn com_recv(&self, buf: &mut [u8], timeout_ms: u32) -> Result<usize, BusError>;
}
trait I3cBus {
fn i3c_read(&self, addr: u8, buf: &mut [u8]) -> Result<(), BusError>;
fn i3c_write(&self, addr: u8, data: &[u8]) -> Result<(), BusError>;
}
// --- The Config trait: one associated type per component ---
trait BoardConfig {
type Spi: SpiBus;
type Com: ComPort;
type I3c: I3cBus;
}
// --- DiagController has exactly ONE generic parameter ---
struct DiagController<Cfg: BoardConfig> {
spi: Cfg::Spi,
com: Cfg::Com,
i3c: Cfg::I3c,
}
}
DiagController<Cfg> will never gain another generic parameter.
Adding a 4th bus means adding one associated type to BoardConfig and one field
to DiagController — no downstream signature changes.
Implementing the Controller
#![allow(unused)]
fn main() {
impl<Cfg: BoardConfig> DiagController<Cfg> {
fn new(spi: Cfg::Spi, com: Cfg::Com, i3c: Cfg::I3c) -> Self {
DiagController { spi, com, i3c }
}
fn read_flash_id(&self) -> Result<u32, BusError> {
let cmd = [0x9F]; // JEDEC Read ID
let mut id = [0u8; 4];
self.spi.spi_transfer(&cmd, &mut id)?;
Ok(u32::from_be_bytes(id))
}
fn send_bmc_command(&self, cmd: &[u8]) -> Result<Vec<u8>, BusError> {
self.com.com_send(cmd)?;
let mut resp = vec![0u8; 256];
let n = self.com.com_recv(&mut resp, 1000)?;
resp.truncate(n);
Ok(resp)
}
fn read_sensor_temp(&self, sensor_addr: u8) -> Result<i16, BusError> {
let mut buf = [0u8; 2];
self.i3c.i3c_read(sensor_addr, &mut buf)?;
Ok(i16::from_be_bytes(buf))
}
fn run_full_diag(&self) -> Result<DiagReport, BusError> {
let flash_id = self.read_flash_id()?;
let bmc_resp = self.send_bmc_command(b"VERSION\n")?;
let cpu_temp = self.read_sensor_temp(0x48)?;
let gpu_temp = self.read_sensor_temp(0x49)?;
Ok(DiagReport {
flash_id,
bmc_version: String::from_utf8_lossy(&bmc_resp).to_string(),
cpu_temp_c: cpu_temp,
gpu_temp_c: gpu_temp,
})
}
}
#[derive(Debug)]
struct DiagReport {
flash_id: u32,
bmc_version: String,
cpu_temp_c: i16,
gpu_temp_c: i16,
}
}
Production Wiring
One impl BoardConfig selects the concrete hardware drivers:
struct PlatformSpi { dev: String, speed_hz: u32 }
struct UartCom { dev: String, baud: u32 }
struct LinuxI3c { dev: String }
impl SpiBus for PlatformSpi {
fn spi_transfer(&self, tx: &[u8], rx: &mut [u8]) -> Result<(), BusError> {
// ioctl(SPI_IOC_MESSAGE) in production
rx[0..4].copy_from_slice(&[0xEF, 0x40, 0x18, 0x00]);
Ok(())
}
fn spi_write(&self, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}
impl ComPort for UartCom {
fn com_send(&self, _data: &[u8]) -> Result<usize, BusError> { Ok(0) }
fn com_recv(&self, buf: &mut [u8], _timeout: u32) -> Result<usize, BusError> {
let resp = b"BMC v2.4.1\n";
buf[..resp.len()].copy_from_slice(resp);
Ok(resp.len())
}
}
impl I3cBus for LinuxI3c {
fn i3c_read(&self, _addr: u8, buf: &mut [u8]) -> Result<(), BusError> {
buf[0] = 0x00; buf[1] = 0x2D; // 45°C
Ok(())
}
fn i3c_write(&self, _addr: u8, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}
// ✅ One struct, one impl — all concrete types resolved here
struct ProductionBoard;
impl BoardConfig for ProductionBoard {
type Spi = PlatformSpi;
type Com = UartCom;
type I3c = LinuxI3c;
}
fn main() {
let ctrl = DiagController::<ProductionBoard>::new(
PlatformSpi { dev: "/dev/spidev0.0".into(), speed_hz: 10_000_000 },
UartCom { dev: "/dev/ttyS0".into(), baud: 115200 },
LinuxI3c { dev: "/dev/i3c-0".into() },
);
let report = ctrl.run_full_diag().unwrap();
println!("{report:#?}");
}
Test Wiring with Mocks
Swap the entire hardware layer by defining a different BoardConfig:
#![allow(unused)]
fn main() {
struct MockSpi { flash_id: [u8; 4] }
struct MockCom { response: Vec<u8> }
struct MockI3c { temps: std::collections::HashMap<u8, i16> }
impl SpiBus for MockSpi {
fn spi_transfer(&self, _tx: &[u8], rx: &mut [u8]) -> Result<(), BusError> {
rx[..4].copy_from_slice(&self.flash_id);
Ok(())
}
fn spi_write(&self, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}
impl ComPort for MockCom {
fn com_send(&self, _data: &[u8]) -> Result<usize, BusError> { Ok(0) }
fn com_recv(&self, buf: &mut [u8], _timeout: u32) -> Result<usize, BusError> {
let n = self.response.len().min(buf.len());
buf[..n].copy_from_slice(&self.response[..n]);
Ok(n)
}
}
impl I3cBus for MockI3c {
fn i3c_read(&self, addr: u8, buf: &mut [u8]) -> Result<(), BusError> {
let temp = self.temps.get(&addr).copied().unwrap_or(0);
buf[..2].copy_from_slice(&temp.to_be_bytes());
Ok(())
}
fn i3c_write(&self, _addr: u8, _data: &[u8]) -> Result<(), BusError> { Ok(()) }
}
struct TestBoard;
impl BoardConfig for TestBoard {
type Spi = MockSpi;
type Com = MockCom;
type I3c = MockI3c;
}
#[cfg(test)]
mod tests {
use super::*;
fn make_test_controller() -> DiagController<TestBoard> {
let mut temps = std::collections::HashMap::new();
temps.insert(0x48, 45i16);
temps.insert(0x49, 72i16);
DiagController::<TestBoard>::new(
MockSpi { flash_id: [0xEF, 0x40, 0x18, 0x00] },
MockCom { response: b"BMC v2.4.1\n".to_vec() },
MockI3c { temps },
)
}
#[test]
fn test_flash_id() {
let ctrl = make_test_controller();
assert_eq!(ctrl.read_flash_id().unwrap(), 0xEF401800);
}
#[test]
fn test_sensor_temps() {
let ctrl = make_test_controller();
assert_eq!(ctrl.read_sensor_temp(0x48).unwrap(), 45);
assert_eq!(ctrl.read_sensor_temp(0x49).unwrap(), 72);
}
#[test]
fn test_full_diag() {
let ctrl = make_test_controller();
let report = ctrl.run_full_diag().unwrap();
assert_eq!(report.flash_id, 0xEF401800);
assert_eq!(report.cpu_temp_c, 45);
assert_eq!(report.gpu_temp_c, 72);
assert!(report.bmc_version.contains("2.4.1"));
}
}
}
Adding a New Bus Later
When you need a 4th bus, only two things change — BoardConfig and DiagController.
No downstream signature changes. The generic parameter count stays at one:
#![allow(unused)]
fn main() {
trait SmBus {
fn smbus_read_byte(&self, addr: u8, cmd: u8) -> Result<u8, BusError>;
}
// 1. Add one associated type:
trait BoardConfig {
type Spi: SpiBus;
type Com: ComPort;
type I3c: I3cBus;
type Smb: SmBus; // ← new
}
// 2. Add one field:
struct DiagController<Cfg: BoardConfig> {
spi: Cfg::Spi,
com: Cfg::Com,
i3c: Cfg::I3c,
smb: Cfg::Smb, // ← new
}
// 3. Provide the concrete type in each config impl:
impl BoardConfig for ProductionBoard {
type Spi = PlatformSpi;
type Com = UartCom;
type I3c = LinuxI3c;
type Smb = LinuxSmbus; // ← new
}
}
When to Use This Pattern
| Situation | Use Config Trait? | Alternative |
|---|---|---|
| 3+ trait-constrained generics on a struct | ✅ Yes | — |
| Need to swap entire hardware/platform layer | ✅ Yes | — |
| Only 1-2 generics | ❌ Overkill | Direct generics |
| Need runtime polymorphism | ❌ | dyn Trait objects |
| Open-ended plugin system | ❌ | Type-map / Any |
| Component traits form a natural group (board, platform) | ✅ Yes | — |
Key Properties
- One generic parameter forever —
DiagController<Cfg>never gains more<A, B, C, ...> - Fully static dispatch — no vtables, no
dyn, no heap allocation for trait objects - Clean test swapping — define
TestBoardwith mock impls, zero conditional compilation - Compile-time safety — forget an associated type → compile error, not runtime crash
- Battle-tested — this is the pattern used by Substrate/Polkadot’s frame system
to manage 20+ associated types through a single
Configtrait
Key Takeaways — Newtype & Type-State
- Newtypes give compile-time type safety at zero runtime cost
- Type-state makes illegal state transitions a compile error, not a runtime bug
- Config traits tame generic parameter explosion in large systems
See also: Ch 4 — PhantomData for the zero-sized markers that power type-state. Ch 2 — Traits In Depth for associated types used in the config trait pattern.
Case Study: Dual-Axis Typestate — Vendor × Protocol State
The patterns above handle one axis at a time: typestate enforces protocol order,
and trait abstraction handles multiple vendors. Real systems often need both
simultaneously: a wrapper Handle<Vendor, State> where available methods depend
on which vendor is plugged in and which state the handle is in.
This section shows the dual-axis conditional impl pattern — where impl
blocks are gated on both a vendor trait bound and a state marker trait.
The Two-Dimensional Problem
Consider a debug probe interface (JTAG/SWD). Multiple vendors make probes, and every probe must be unlocked before registers become accessible. Some vendors additionally support direct memory reads — but only after an extended unlock that configures the memory access port:
graph LR
subgraph "All vendors"
L["🔒 Locked"] -- "unlock()" --> U["🔓 Unlocked"]
end
subgraph "Memory-capable vendors only"
U -- "extended_unlock()" --> E["🔓🧠 ExtendedUnlocked"]
end
U -. "read_reg() / write_reg()" .-> U
E -. "read_reg() / write_reg()" .-> E
E -. "read_memory() / write_memory()" .-> E
style L fill:#fee,stroke:#c33
style U fill:#efe,stroke:#3a3
style E fill:#eef,stroke:#33c
The capability matrix — which methods exist for which (vendor, state) combination — is two-dimensional:
block-beta
columns 4
space header1["Locked"] header2["Unlocked"] header3["ExtendedUnlocked"]
basic["Basic Vendor"]:1 b1["unlock()"] b2["read_reg()\nwrite_reg()"] b3["— unreachable —"]
memory["Memory Vendor"]:1 m1["unlock()"] m2["read_reg()\nwrite_reg()\nextended_unlock()"] m3["read_reg()\nwrite_reg()\nread_memory()\nwrite_memory()"]
style b1 fill:#ffd,stroke:#aa0
style b2 fill:#efe,stroke:#3a3
style b3 fill:#eee,stroke:#999,stroke-dasharray: 5 5
style m1 fill:#ffd,stroke:#aa0
style m2 fill:#efe,stroke:#3a3
style m3 fill:#eef,stroke:#33c
The challenge: express this matrix entirely at compile time, with static
dispatch, so that calling extended_unlock() on a basic probe or
read_memory() on an unlocked-but-not-extended handle is a compile error.
The Solution: Jtag<V, S> with Marker Traits
Step 1 — State tokens and capability markers:
use std::marker::PhantomData;
// Zero-sized state tokens — no runtime cost
struct Locked;
struct Unlocked;
struct ExtendedUnlocked;
// Marker traits express which capabilities each state has
trait HasRegAccess {}
impl HasRegAccess for Unlocked {}
impl HasRegAccess for ExtendedUnlocked {}
trait HasMemAccess {}
impl HasMemAccess for ExtendedUnlocked {}
Why marker traits, not just concrete states? Writing
impl<V, S: HasRegAccess> Jtag<V, S>meansread_reg()works in any state with register access — today that’sUnlockedandExtendedUnlocked, but if you addDebugHaltedtomorrow, you just add one line:impl HasRegAccess for DebugHalted {}. Every register function works with it automatically — zero code changes.
Step 2 — Vendor traits (raw operations):
// Every probe vendor implements these
trait JtagVendor {
fn raw_unlock(&mut self);
fn raw_read_reg(&self, addr: u32) -> u32;
fn raw_write_reg(&mut self, addr: u32, val: u32);
}
// Vendors with memory access also implement this super-trait
trait JtagMemoryVendor: JtagVendor {
fn raw_extended_unlock(&mut self);
fn raw_read_memory(&self, addr: u64, buf: &mut [u8]);
fn raw_write_memory(&mut self, addr: u64, data: &[u8]);
}
Step 3 — The wrapper with conditional impl blocks:
struct Jtag<V, S = Locked> {
vendor: V,
_state: PhantomData<S>,
}
// Construction — always starts Locked
impl<V: JtagVendor> Jtag<V, Locked> {
fn new(vendor: V) -> Self {
Jtag { vendor, _state: PhantomData }
}
fn unlock(mut self) -> Jtag<V, Unlocked> {
self.vendor.raw_unlock();
Jtag { vendor: self.vendor, _state: PhantomData }
}
}
// Register I/O — any vendor, any state with HasRegAccess
impl<V: JtagVendor, S: HasRegAccess> Jtag<V, S> {
fn read_reg(&self, addr: u32) -> u32 {
self.vendor.raw_read_reg(addr)
}
fn write_reg(&mut self, addr: u32, val: u32) {
self.vendor.raw_write_reg(addr, val);
}
}
// Extended unlock — only memory-capable vendors, only from Unlocked
impl<V: JtagMemoryVendor> Jtag<V, Unlocked> {
fn extended_unlock(mut self) -> Jtag<V, ExtendedUnlocked> {
self.vendor.raw_extended_unlock();
Jtag { vendor: self.vendor, _state: PhantomData }
}
}
// Memory I/O — only memory-capable vendors, only ExtendedUnlocked
impl<V: JtagMemoryVendor, S: HasMemAccess> Jtag<V, S> {
fn read_memory(&self, addr: u64, buf: &mut [u8]) {
self.vendor.raw_read_memory(addr, buf);
}
fn write_memory(&mut self, addr: u64, data: &[u8]) {
self.vendor.raw_write_memory(addr, data);
}
}
Each impl block encodes one cell (or row) of the capability matrix.
The compiler enforces the matrix — no runtime checks anywhere.
Vendor Implementations
Adding a vendor means implementing raw methods on one struct — no per-state struct duplication, no delegation boilerplate:
// Vendor A: basic probe — register access only
struct BasicProbe { port: u16 }
impl JtagVendor for BasicProbe {
fn raw_unlock(&mut self) { /* TAP reset sequence */ }
fn raw_read_reg(&self, addr: u32) -> u32 { /* DR scan */ 0 }
fn raw_write_reg(&mut self, addr: u32, val: u32) { /* DR scan */ }
}
// BasicProbe does NOT impl JtagMemoryVendor.
// extended_unlock() will not compile on Jtag<BasicProbe, _>.
// Vendor B: full-featured probe — registers + memory
struct DapProbe { serial: String }
impl JtagVendor for DapProbe {
fn raw_unlock(&mut self) { /* SWD switch, read DPIDR */ }
fn raw_read_reg(&self, addr: u32) -> u32 { /* AP register read */ 0 }
fn raw_write_reg(&mut self, addr: u32, val: u32) { /* AP register write */ }
}
impl JtagMemoryVendor for DapProbe {
fn raw_extended_unlock(&mut self) { /* select MEM-AP, power up */ }
fn raw_read_memory(&self, addr: u64, buf: &mut [u8]) { /* MEM-AP read */ }
fn raw_write_memory(&mut self, addr: u64, data: &[u8]) { /* MEM-AP write */ }
}
What the Compiler Prevents
| Attempt | Error | Why |
|---|---|---|
Jtag<_, Locked>::read_reg() | no method read_reg | Locked doesn’t impl HasRegAccess |
Jtag<BasicProbe, _>::extended_unlock() | no method extended_unlock | BasicProbe doesn’t impl JtagMemoryVendor |
Jtag<_, Unlocked>::read_memory() | no method read_memory | Unlocked doesn’t impl HasMemAccess |
Calling unlock() twice | value used after move | unlock() consumes self |
All four errors are caught at compile time. No panics, no Option, no runtime state enum.
Writing Generic Functions
Functions bind only the axes they care about:
/// Works with ANY vendor, ANY state that grants register access.
fn read_idcode<V: JtagVendor, S: HasRegAccess>(jtag: &Jtag<V, S>) -> u32 {
jtag.read_reg(0x00)
}
/// Only compiles for memory-capable vendors in ExtendedUnlocked state.
fn dump_firmware<V: JtagMemoryVendor, S: HasMemAccess>(jtag: &Jtag<V, S>) {
let mut buf = [0u8; 256];
jtag.read_memory(0x0800_0000, &mut buf);
}
read_idcode doesn’t care whether you’re in Unlocked or ExtendedUnlocked —
it only requires HasRegAccess. This is where marker traits pay off over
hardcoding specific states in signatures.
Same Pattern, Different Domain: Storage Backends
The dual-axis technique isn’t hardware-specific. Here’s the same structure for a storage layer where some backends support transactions:
// States
struct Closed;
struct Open;
struct InTransaction;
trait HasReadWrite {}
impl HasReadWrite for Open {}
impl HasReadWrite for InTransaction {}
// Vendor traits
trait StorageBackend {
fn raw_open(&mut self);
fn raw_read(&self, key: &[u8]) -> Option<Vec<u8>>;
fn raw_write(&mut self, key: &[u8], value: &[u8]);
}
trait TransactionalBackend: StorageBackend {
fn raw_begin(&mut self);
fn raw_commit(&mut self);
fn raw_rollback(&mut self);
}
// Wrapper
struct Store<B, S = Closed> { backend: B, _s: PhantomData<S> }
impl<B: StorageBackend> Store<B, Closed> {
fn open(mut self) -> Store<B, Open> { self.backend.raw_open(); /* ... */ todo!() }
}
impl<B: StorageBackend, S: HasReadWrite> Store<B, S> {
fn read(&self, key: &[u8]) -> Option<Vec<u8>> { self.backend.raw_read(key) }
fn write(&mut self, key: &[u8], val: &[u8]) { self.backend.raw_write(key, val) }
}
impl<B: TransactionalBackend> Store<B, Open> {
fn begin(mut self) -> Store<B, InTransaction> { /* ... */ todo!() }
}
impl<B: TransactionalBackend> Store<B, InTransaction> {
fn commit(mut self) -> Store<B, Open> { /* ... */ todo!() }
fn rollback(mut self) -> Store<B, Open> { /* ... */ todo!() }
}
A flat-file backend implements StorageBackend only — begin() won’t
compile. A database backend adds TransactionalBackend — the full
Open → InTransaction → Open cycle becomes available.
When to Reach for This Pattern
| Signal | Why dual-axis fits |
|---|---|
| Two independent axes: “who provides it” and “what state is it in” | The impl block matrix directly encodes both |
| Some providers have strictly more capabilities than others | Super-trait (MemoryVendor: Vendor) + conditional impl |
| Misusing state or capability is a safety/correctness bug | Compile-time prevention > runtime checks |
| You want static dispatch (no vtables) | PhantomData + generics = zero-cost |
| Signal | Consider something simpler |
|---|---|
| Only one axis varies (state OR vendor, not both) | Single-axis typestate or plain trait objects |
| Three or more independent axes | Config Trait Pattern (above) bundles axes into associated types |
| Runtime polymorphism is acceptable | enum state + dyn dispatch is simpler |
When two axes become three or more: If you find yourself writing
Handle<V, S, D, T>— vendor, state, debug level, transport — the generic parameter list is telling you something. Consider collapsing the vendor axis into an associated-type config trait (the Config Trait Pattern from earlier in this chapter), keeping only the state axis as a generic parameter:Handle<Cfg, S>. The config trait bundlestype Vendor,type Transport, etc. into one parameter, and the state axis retains its compile-time transition guarantees. This is a natural evolution, not a rewrite — you lift vendor-related types intoCfgand leave the typestate machinery untouched.
Key Takeaway: The dual-axis pattern is the intersection of typestate and trait-based abstraction. Each
implblock maps to one cell of the (vendor × state) matrix. The compiler enforces the entire matrix — no runtime state checks, no impossible-state panics, no cost.
Exercise: Type-Safe State Machine ★★ (~30 min)
Build a traffic light state machine using the type-state pattern. The light must transition Red → Green → Yellow → Red and no other order should be possible.
🔑 Solution
use std::marker::PhantomData;
struct Red;
struct Green;
struct Yellow;
struct TrafficLight<State> {
_state: PhantomData<State>,
}
impl TrafficLight<Red> {
fn new() -> Self {
println!("🔴 Red — STOP");
TrafficLight { _state: PhantomData }
}
fn go(self) -> TrafficLight<Green> {
println!("🟢 Green — GO");
TrafficLight { _state: PhantomData }
}
}
impl TrafficLight<Green> {
fn caution(self) -> TrafficLight<Yellow> {
println!("🟡 Yellow — CAUTION");
TrafficLight { _state: PhantomData }
}
}
impl TrafficLight<Yellow> {
fn stop(self) -> TrafficLight<Red> {
println!("🔴 Red — STOP");
TrafficLight { _state: PhantomData }
}
}
fn main() {
let light = TrafficLight::new(); // Red
let light = light.go(); // Green
let light = light.caution(); // Yellow
let _light = light.stop(); // Red
// light.caution(); // ❌ Compile error: no method `caution` on Red
// TrafficLight::new().stop(); // ❌ Compile error: no method `stop` on Red
}
Key takeaway: Invalid transitions are compile errors, not runtime panics.
4. PhantomData — Types That Carry No Data 🔴
What you’ll learn:
- Why
PhantomData<T>exists and the three problems it solves- Lifetime branding for compile-time scope enforcement
- The unit-of-measure pattern for dimension-safe arithmetic
- Variance (covariant, contravariant, invariant) and how PhantomData controls it
What PhantomData Solves
PhantomData<T> is a zero-sized type that tells the compiler “this struct is logically associated with T, even though it doesn’t contain a T.” It affects variance, drop checking, and auto-trait inference — without using any memory.
#![allow(unused)]
fn main() {
use std::marker::PhantomData;
// Without PhantomData:
struct Slice<'a, T> {
ptr: *const T,
len: usize,
// Problem: compiler doesn't know this struct borrows from 'a
// or that it's associated with T for drop-check purposes
}
// With PhantomData:
struct Slice<'a, T> {
ptr: *const T,
len: usize,
_marker: PhantomData<&'a T>,
// Now the compiler knows:
// 1. This struct borrows data with lifetime 'a
// 2. It's covariant over 'a (lifetimes can shrink)
// 3. Drop check considers T
}
}
The three jobs of PhantomData:
| Job | Example | What It Does |
|---|---|---|
| Lifetime binding | PhantomData<&'a T> | Struct is treated as borrowing 'a |
| Ownership simulation | PhantomData<T> | Drop check assumes struct owns a T |
| Variance control | PhantomData<fn(T)> | Makes struct contravariant over T |
Lifetime Branding
Use PhantomData to prevent mixing values from different “sessions” or “contexts”:
use std::marker::PhantomData;
/// A handle that's valid only within a specific arena's lifetime
struct ArenaHandle<'arena> {
index: usize,
_brand: PhantomData<&'arena ()>,
}
struct Arena {
data: Vec<String>,
}
impl Arena {
fn new() -> Self {
Arena { data: Vec::new() }
}
/// Allocate a string and return a branded handle
fn alloc<'a>(&'a mut self, value: String) -> ArenaHandle<'a> {
let index = self.data.len();
self.data.push(value);
ArenaHandle { index, _brand: PhantomData }
}
/// Look up by handle — only accepts handles from THIS arena
fn get<'a>(&'a self, handle: ArenaHandle<'a>) -> &'a str {
&self.data[handle.index]
}
}
fn main() {
let mut arena1 = Arena::new();
let handle1 = arena1.alloc("hello".to_string());
// Can't use handle1 with a different arena — lifetimes won't match
// let mut arena2 = Arena::new();
// arena2.get(handle1); // ❌ Lifetime mismatch
println!("{}", arena1.get(handle1)); // ✅
}
Unit-of-Measure Pattern
Prevent mixing incompatible units at compile time, with zero runtime cost:
use std::marker::PhantomData;
use std::ops::{Add, Mul};
// Unit marker types (zero-sized)
struct Meters;
struct Seconds;
struct MetersPerSecond;
#[derive(Debug, Clone, Copy)]
struct Quantity<Unit> {
value: f64,
_unit: PhantomData<Unit>,
}
impl<U> Quantity<U> {
fn new(value: f64) -> Self {
Quantity { value, _unit: PhantomData }
}
}
// Can only add same units:
impl<U> Add for Quantity<U> {
type Output = Quantity<U>;
fn add(self, rhs: Self) -> Self::Output {
Quantity::new(self.value + rhs.value)
}
}
// Meters / Seconds = MetersPerSecond (custom trait)
impl std::ops::Div<Quantity<Seconds>> for Quantity<Meters> {
type Output = Quantity<MetersPerSecond>;
fn div(self, rhs: Quantity<Seconds>) -> Quantity<MetersPerSecond> {
Quantity::new(self.value / rhs.value)
}
}
fn main() {
let dist = Quantity::<Meters>::new(100.0);
let time = Quantity::<Seconds>::new(9.58);
let speed = dist / time; // Quantity<MetersPerSecond>
println!("Speed: {:.2} m/s", speed.value); // 10.44 m/s
// let nonsense = dist + time; // ❌ Compile error: can't add Meters + Seconds
}
This is pure type-system magic —
PhantomData<Meters>is zero-sized, soQuantity<Meters>has the same layout asf64. No wrapper overhead at runtime, but full unit safety at compile time.
PhantomData and Drop Check
When the compiler checks whether a struct’s destructor might access expired data, it uses PhantomData to decide:
#![allow(unused)]
fn main() {
use std::marker::PhantomData;
// PhantomData<T> — compiler assumes we MIGHT drop a T
// This means T must outlive our struct
struct OwningSemantic<T> {
ptr: *const T,
_marker: PhantomData<T>, // "I logically own a T"
}
// PhantomData<*const T> — compiler assumes we DON'T own T
// More permissive — T doesn't need to outlive us
struct NonOwningSemantic<T> {
ptr: *const T,
_marker: PhantomData<*const T>, // "I just point to T"
}
}
Practical rule: When wrapping raw pointers, choose PhantomData carefully:
- Writing a container that owns its data? →
PhantomData<T> - Writing a view/reference type? →
PhantomData<&'a T>orPhantomData<*const T>
Variance — Why PhantomData’s Type Parameter Matters
Variance determines whether a generic type can be substituted with a sub- or super-type (in Rust, “subtype” means “has a longer lifetime”). Getting variance wrong causes either rejected-good-code or unsound-accepted-code.
graph LR
subgraph Covariant
direction TB
A1["&'long T"] -->|"can become"| A2["&'short T"]
end
subgraph Contravariant
direction TB
B1["fn(&'short T)"] -->|"can become"| B2["fn(&'long T)"]
end
subgraph Invariant
direction TB
C1["&'a mut T"] ---|"NO substitution"| C2["&'b mut T"]
end
style A1 fill:#d4efdf,stroke:#27ae60,color:#000
style A2 fill:#d4efdf,stroke:#27ae60,color:#000
style B1 fill:#e8daef,stroke:#8e44ad,color:#000
style B2 fill:#e8daef,stroke:#8e44ad,color:#000
style C1 fill:#fadbd8,stroke:#e74c3c,color:#000
style C2 fill:#fadbd8,stroke:#e74c3c,color:#000
The Three Variances
| Variance | Meaning | “Can I substitute…” | Rust example |
|---|---|---|---|
| Covariant | Subtype flows through | 'long where 'short expected ✅ | &'a T, Vec<T>, Box<T> |
| Contravariant | Subtype flows against | 'short where 'long expected ✅ | fn(T) (in parameter position) |
| Invariant | No substitution allowed | Neither direction ✅ | &mut T, Cell<T>, UnsafeCell<T> |
Why &'a T is Covariant Over 'a
fn print_str(s: &str) {
println!("{s}");
}
fn main() {
let owned = String::from("hello");
// owned lives for the entire function ('long)
// print_str expects &'_ str ('short — just for the call)
print_str(&owned); // ✅ Covariance: 'long → 'short is safe
// A longer-lived reference can always be used where a shorter one is needed.
}
Why &mut T is Invariant Over T
#![allow(unused)]
fn main() {
// If &mut T were covariant over T, this would compile:
fn evil(s: &mut &'static str) {
// We could write a shorter-lived &str into a &'static str slot!
let local = String::from("temporary");
// *s = &local; // ← Would create a dangling &'static str
}
// Invariance prevents this: &'static str ≠ &'a str when mutating.
// The compiler rejects the substitution entirely.
}
How PhantomData Controls Variance
PhantomData<X> gives your struct the same variance as X:
#![allow(unused)]
fn main() {
use std::marker::PhantomData;
// Covariant over 'a — a Ref<'long> can be used as Ref<'short>
struct Ref<'a, T> {
ptr: *const T,
_marker: PhantomData<&'a T>, // Covariant over 'a, covariant over T
}
// Invariant over T — prevents unsound lifetime shortening of T
struct MutRef<'a, T> {
ptr: *mut T,
_marker: PhantomData<&'a mut T>, // Covariant over 'a, INVARIANT over T
}
// Contravariant over T — useful for callback containers
struct CallbackSlot<T> {
_marker: PhantomData<fn(T)>, // Contravariant over T
}
}
PhantomData variance cheat sheet:
| PhantomData type | Variance over T | Variance over 'a | Use when |
|---|---|---|---|
PhantomData<T> | Covariant | — | You logically own a T |
PhantomData<&'a T> | Covariant | Covariant | You borrow a T with lifetime 'a |
PhantomData<&'a mut T> | Invariant | Covariant | You mutably borrow T |
PhantomData<*const T> | Covariant | — | Non-owning pointer to T |
PhantomData<*mut T> | Invariant | — | Non-owning mutable pointer |
PhantomData<fn(T)> | Contravariant | — | T appears in argument position |
PhantomData<fn() -> T> | Covariant | — | T appears in return position |
PhantomData<fn(T) -> T> | Invariant | — | T in both positions cancels out |
Worked Example: Why This Matters in Practice
use std::marker::PhantomData;
// A token that brands values with a session lifetime.
// MUST be covariant over 'a — otherwise callers can't shorten
// the lifetime when passing to functions that need a shorter borrow.
struct SessionToken<'a> {
id: u64,
_brand: PhantomData<&'a ()>, // ✅ Covariant — callers can shorten 'a
// _brand: PhantomData<fn(&'a ())>, // ❌ Contravariant — breaks ergonomics
// _brand: PhantomData<&'a mut ()>; // Still covariant over 'a (invariant over T, but T is fixed as ())
}
fn use_token(token: &SessionToken<'_>) {
println!("Using token {}", token.id);
}
fn main() {
let token = SessionToken { id: 42, _brand: PhantomData };
use_token(&token); // ✅ Works because SessionToken is covariant over 'a
}
Decision rule: Start with
PhantomData<&'a T>(covariant). Switch toPhantomData<&'a mut T>(invariant) only if your abstraction hands out mutable access toT. UsePhantomData<fn(T)>(contravariant) almost never — it’s only correct for callback-storage scenarios.
Key Takeaways — PhantomData
PhantomData<T>carries type/lifetime information without runtime cost- Use it for lifetime branding, variance control, and unit-of-measure patterns
- Drop check:
PhantomData<T>tells the compiler your type logically owns aT
See also: Ch 3 — Newtype & Type-State for type-state patterns that use PhantomData. Ch 11 — Unsafe Rust for how PhantomData interacts with raw pointers.
Exercise: Unit-of-Measure with PhantomData ★★ (~30 min)
Extend the unit-of-measure pattern to support:
Meters,Seconds,Kilograms- Addition of same units
- Multiplication:
Meters * Meters = SquareMeters - Division:
Meters / Seconds = MetersPerSecond
🔑 Solution
use std::marker::PhantomData;
use std::ops::{Add, Mul, Div};
#[derive(Clone, Copy)]
struct Meters;
#[derive(Clone, Copy)]
struct Seconds;
#[derive(Clone, Copy)]
struct Kilograms;
#[derive(Clone, Copy)]
struct SquareMeters;
#[derive(Clone, Copy)]
struct MetersPerSecond;
#[derive(Debug, Clone, Copy)]
struct Qty<U> {
value: f64,
_unit: PhantomData<U>,
}
impl<U> Qty<U> {
fn new(v: f64) -> Self { Qty { value: v, _unit: PhantomData } }
}
impl<U> Add for Qty<U> {
type Output = Qty<U>;
fn add(self, rhs: Self) -> Self::Output { Qty::new(self.value + rhs.value) }
}
impl Mul<Qty<Meters>> for Qty<Meters> {
type Output = Qty<SquareMeters>;
fn mul(self, rhs: Qty<Meters>) -> Qty<SquareMeters> {
Qty::new(self.value * rhs.value)
}
}
impl Div<Qty<Seconds>> for Qty<Meters> {
type Output = Qty<MetersPerSecond>;
fn div(self, rhs: Qty<Seconds>) -> Qty<MetersPerSecond> {
Qty::new(self.value / rhs.value)
}
}
fn main() {
let width = Qty::<Meters>::new(5.0);
let height = Qty::<Meters>::new(3.0);
let area = width * height; // Qty<SquareMeters>
println!("Area: {:.1} m²", area.value);
let dist = Qty::<Meters>::new(100.0);
let time = Qty::<Seconds>::new(9.58);
let speed = dist / time;
println!("Speed: {:.2} m/s", speed.value);
let sum = width + height; // Same unit ✅
println!("Sum: {:.1} m", sum.value);
// let bad = width + time; // ❌ Compile error: can't add Meters + Seconds
}
5. Channels and Message Passing 🟢
What you’ll learn:
std::sync::mpscbasics and when to upgrade to crossbeam-channel- Channel selection with
select!for multi-source message handling- Bounded vs unbounded channels and backpressure strategies
- The actor pattern for encapsulating concurrent state
std::sync::mpsc — The Standard Channel
Rust’s standard library provides a multi-producer, single-consumer channel:
use std::sync::mpsc;
use std::thread;
use std::time::Duration;
fn main() {
// Create a channel: tx (transmitter) and rx (receiver)
let (tx, rx) = mpsc::channel();
// Spawn a producer thread
let tx1 = tx.clone(); // Clone for multiple producers
thread::spawn(move || {
for i in 0..5 {
tx1.send(format!("producer-1: msg {i}")).unwrap();
thread::sleep(Duration::from_millis(100));
}
});
// Second producer
thread::spawn(move || {
for i in 0..5 {
tx.send(format!("producer-2: msg {i}")).unwrap();
thread::sleep(Duration::from_millis(150));
}
});
// Consumer: receive all messages
for msg in rx {
// rx iterator ends when ALL senders are dropped
println!("Received: {msg}");
}
println!("All producers done.");
}
Note:
.unwrap()on.send()is used for brevity. It panics if the receiver has been dropped. Production code should handleSendErrorgracefully.
Key properties:
- Unbounded by default (can fill memory if consumer is slow)
mpsc::sync_channel(N)creates a bounded channel with backpressurerx.recv()blocks the current thread until a message arrivesrx.try_recv()returns immediately withErr(TryRecvError::Empty)if nothing is ready- The channel closes when all
Senders are dropped
#![allow(unused)]
fn main() {
// Bounded channel with backpressure:
let (tx, rx) = mpsc::sync_channel(10); // Buffer of 10 messages
thread::spawn(move || {
for i in 0..1000 {
tx.send(i).unwrap(); // BLOCKS if buffer is full — natural backpressure
}
});
}
Note:
.unwrap()is used for brevity. In production, handleSendError(receiver dropped) instead of panicking.
crossbeam-channel — The Production Workhorse
crossbeam-channel is the de facto standard for production channel usage. It’s faster than std::sync::mpsc and supports multi-consumer (mpmc):
// Cargo.toml:
// [dependencies]
// crossbeam-channel = "0.5"
use crossbeam_channel::{bounded, unbounded, select, Sender, Receiver};
use std::thread;
use std::time::Duration;
fn main() {
// Bounded MPMC channel
let (tx, rx) = bounded::<String>(100);
// Multiple producers
for id in 0..4 {
let tx = tx.clone();
thread::spawn(move || {
for i in 0..10 {
tx.send(format!("worker-{id}: item-{i}")).unwrap();
}
});
}
drop(tx); // Drop the original sender so the channel can close
// Multiple consumers (not possible with std::sync::mpsc!)
let rx2 = rx.clone();
let consumer1 = thread::spawn(move || {
while let Ok(msg) = rx.recv() {
println!("[consumer-1] {msg}");
}
});
let consumer2 = thread::spawn(move || {
while let Ok(msg) = rx2.recv() {
println!("[consumer-2] {msg}");
}
});
consumer1.join().unwrap();
consumer2.join().unwrap();
}
Channel Selection (select!)
Listen on multiple channels simultaneously — like select in Go:
use crossbeam_channel::{bounded, tick, after, select};
use std::time::Duration;
fn main() {
let (work_tx, work_rx) = bounded::<String>(10);
let ticker = tick(Duration::from_secs(1)); // Periodic tick
let deadline = after(Duration::from_secs(10)); // One-shot timeout
// Producer
let tx = work_tx.clone();
std::thread::spawn(move || {
for i in 0..100 {
tx.send(format!("job-{i}")).unwrap();
std::thread::sleep(Duration::from_millis(500));
}
});
drop(work_tx);
loop {
select! {
recv(work_rx) -> msg => {
match msg {
Ok(job) => println!("Processing: {job}"),
Err(_) => {
println!("Work channel closed");
break;
}
}
},
recv(ticker) -> _ => {
println!("Tick — heartbeat");
},
recv(deadline) -> _ => {
println!("Deadline reached — shutting down");
break;
},
}
}
}
Go comparison: This is exactly like Go’s
selectstatement over channels. crossbeam’sselect!macro randomizes order to prevent starvation, just like Go.
Bounded vs Unbounded and Backpressure
| Type | Behavior When Full | Memory | Use Case |
|---|---|---|---|
| Unbounded | Never blocks (grows heap) | Unbounded ⚠️ | Rare — only when producer is slower than consumer |
| Bounded | send() blocks until space | Fixed | Production default — prevents OOM |
| Rendezvous (bounded(0)) | send() blocks until receiver is ready | None | Synchronization / handoff |
#![allow(unused)]
fn main() {
// Rendezvous channel — zero capacity, direct handoff
let (tx, rx) = crossbeam_channel::bounded(0);
// tx.send(x) blocks until rx.recv() is called, and vice versa.
// This synchronizes the two threads precisely.
}
Rule: Always use bounded channels in production unless you can prove the producer will never outpace the consumer.
Actor Pattern with Channels
The actor pattern uses channels to serialize access to mutable state — no mutexes needed:
use std::sync::mpsc;
use std::thread;
// Messages the actor can receive
enum CounterMsg {
Increment,
Decrement,
Get(mpsc::Sender<i64>), // Reply channel
}
struct CounterActor {
count: i64,
rx: mpsc::Receiver<CounterMsg>,
}
impl CounterActor {
fn new(rx: mpsc::Receiver<CounterMsg>) -> Self {
CounterActor { count: 0, rx }
}
fn run(mut self) {
while let Ok(msg) = self.rx.recv() {
match msg {
CounterMsg::Increment => self.count += 1,
CounterMsg::Decrement => self.count -= 1,
CounterMsg::Get(reply) => {
let _ = reply.send(self.count);
}
}
}
}
}
// Actor handle — cheap to clone, Send + Sync
#[derive(Clone)]
struct Counter {
tx: mpsc::Sender<CounterMsg>,
}
impl Counter {
fn spawn() -> Self {
let (tx, rx) = mpsc::channel();
thread::spawn(move || CounterActor::new(rx).run());
Counter { tx }
}
fn increment(&self) { let _ = self.tx.send(CounterMsg::Increment); }
fn decrement(&self) { let _ = self.tx.send(CounterMsg::Decrement); }
fn get(&self) -> i64 {
let (reply_tx, reply_rx) = mpsc::channel();
self.tx.send(CounterMsg::Get(reply_tx)).unwrap();
reply_rx.recv().unwrap()
}
}
fn main() {
let counter = Counter::spawn();
// Multiple threads can safely use the counter — no mutex!
let handles: Vec<_> = (0..10).map(|_| {
let counter = counter.clone();
thread::spawn(move || {
for _ in 0..1000 {
counter.increment();
}
})
}).collect();
for h in handles { h.join().unwrap(); }
println!("Final count: {}", counter.get()); // 10000
}
When to use actors vs mutexes: Actors are great when the state has complex invariants, operations take a long time, or you want to serialize access without thinking about lock ordering. Mutexes are simpler for short critical sections.
Key Takeaways — Channels
crossbeam-channelis the production workhorse — faster and more feature-rich thanstd::sync::mpscselect!replaces complex multi-source polling with declarative channel selection- Bounded channels provide natural backpressure; unbounded channels risk OOM
See also: Ch 6 — Concurrency for threads, Mutex, and shared state. Ch 15 — Async for async channels (
tokio::sync::mpsc).
Exercise: Channel-Based Worker Pool ★★★ (~45 min)
Build a worker pool using channels where:
- A dispatcher sends
Jobstructs through a channel - N workers consume jobs and send results back
- Use
std::sync::mpscwithArc<Mutex<Receiver>>for work-stealing
🔑 Solution
use std::sync::mpsc;
use std::thread;
struct Job {
id: u64,
data: String,
}
struct JobResult {
job_id: u64,
output: String,
worker_id: usize,
}
fn worker_pool(jobs: Vec<Job>, num_workers: usize) -> Vec<JobResult> {
let (job_tx, job_rx) = mpsc::channel::<Job>();
let (result_tx, result_rx) = mpsc::channel::<JobResult>();
let job_rx = std::sync::Arc::new(std::sync::Mutex::new(job_rx));
let mut handles = Vec::new();
for worker_id in 0..num_workers {
let job_rx = job_rx.clone();
let result_tx = result_tx.clone();
handles.push(thread::spawn(move || {
loop {
let job = {
let rx = job_rx.lock().unwrap();
rx.recv()
};
match job {
Ok(job) => {
let output = format!("processed '{}' by worker {worker_id}", job.data);
result_tx.send(JobResult {
job_id: job.id, output, worker_id,
}).unwrap();
}
Err(_) => break,
}
}
}));
}
drop(result_tx);
let num_jobs = jobs.len();
for job in jobs {
job_tx.send(job).unwrap();
}
drop(job_tx);
let results: Vec<_> = result_rx.into_iter().collect();
assert_eq!(results.len(), num_jobs);
for h in handles { h.join().unwrap(); }
results
}
fn main() {
let jobs: Vec<Job> = (0..20).map(|i| Job {
id: i, data: format!("task-{i}"),
}).collect();
let results = worker_pool(jobs, 4);
for r in &results {
println!("[worker {}] job {}: {}", r.worker_id, r.job_id, r.output);
}
}
6. Concurrency vs Parallelism vs Threads 🟡
What you’ll learn:
- The precise distinction between concurrency and parallelism
- OS threads, scoped threads, and rayon for data parallelism
- Shared state primitives: Arc, Mutex, RwLock, Atomics, Condvar
- Lazy initialization with OnceLock/LazyLock and lock-free patterns
Terminology: Concurrency ≠ Parallelism
These terms are often confused. Here is the precise distinction:
| Concurrency | Parallelism | |
|---|---|---|
| Definition | Managing multiple tasks that can make progress | Executing multiple tasks simultaneously |
| Hardware requirement | One core is enough | Requires multiple cores |
| Analogy | One cook, multiple dishes (switching between them) | Multiple cooks, each working on a dish |
| Rust tools | async/await, channels, select! | rayon, thread::spawn, par_iter() |
Concurrency (single core): Parallelism (multi-core):
Task A: ██░░██░░██ Task A: ██████████
Task B: ░░██░░██░░ Task B: ██████████
─────────────────→ time ─────────────────→ time
(interleaved on one core) (simultaneous on two cores)
std::thread — OS Threads
Rust threads map 1:1 to OS threads. Each gets its own stack (typically 2-8 MB):
use std::thread;
use std::time::Duration;
fn main() {
// Spawn a thread — takes a closure
let handle = thread::spawn(|| {
for i in 0..5 {
println!("spawned thread: {i}");
thread::sleep(Duration::from_millis(100));
}
42 // Return value
});
// Do work on the main thread simultaneously
for i in 0..3 {
println!("main thread: {i}");
thread::sleep(Duration::from_millis(150));
}
// Wait for the thread to finish and get its return value
let result = handle.join().unwrap(); // unwrap panics if thread panicked
println!("Thread returned: {result}");
}
Thread::spawn type requirements:
#![allow(unused)]
fn main() {
// The closure must be:
// 1. Send — can be transferred to another thread
// 2. 'static — can't borrow from the calling scope
// 3. FnOnce — takes ownership of captured variables
let data = vec![1, 2, 3];
// ❌ Borrows data — not 'static
// thread::spawn(|| println!("{data:?}"));
// ✅ Move ownership into the thread
thread::spawn(move || println!("{data:?}"));
// data is no longer accessible here
}
Scoped Threads (std::thread::scope)
Since Rust 1.63, scoped threads solve the 'static requirement — threads can borrow from the parent scope:
use std::thread;
fn main() {
let mut data = vec![1, 2, 3, 4, 5];
thread::scope(|s| {
// Thread 1: borrow shared reference
s.spawn(|| {
let sum: i32 = data.iter().sum();
println!("Sum: {sum}");
});
// Thread 2: also borrow shared reference (multiple readers OK)
s.spawn(|| {
let max = data.iter().max().unwrap();
println!("Max: {max}");
});
// ❌ Can't mutably borrow while shared borrows exist:
// s.spawn(|| data.push(6));
});
// ALL scoped threads joined here — guaranteed before scope returns
// Now safe to mutate — all threads have finished
data.push(6);
println!("Updated: {data:?}");
}
This is huge: Before scoped threads, you had to
Arc::clone()everything to share with threads. Now you can borrow directly, and the compiler proves all threads finish before the data goes out of scope.
rayon — Data Parallelism
rayon provides parallel iterators that distribute work across a thread pool automatically:
// Cargo.toml: rayon = "1"
use rayon::prelude::*;
fn main() {
let data: Vec<u64> = (0..1_000_000).collect();
// Sequential:
let sum_seq: u64 = data.iter().map(|x| x * x).sum();
// Parallel — just change .iter() to .par_iter():
let sum_par: u64 = data.par_iter().map(|x| x * x).sum();
assert_eq!(sum_seq, sum_par);
// Parallel sort:
let mut numbers = vec![5, 2, 8, 1, 9, 3];
numbers.par_sort();
// Parallel processing with map/filter/collect:
let results: Vec<_> = data
.par_iter()
.filter(|&&x| x % 2 == 0)
.map(|&x| expensive_computation(x))
.collect();
}
fn expensive_computation(x: u64) -> u64 {
// Simulate CPU-heavy work
(0..1000).fold(x, |acc, _| acc.wrapping_mul(7).wrapping_add(13))
}
When to use rayon vs threads:
| Use | When |
|---|---|
rayon::par_iter() | Processing collections in parallel (map, filter, reduce) |
thread::spawn | Long-running background tasks, I/O workers |
thread::scope | Short-lived parallel tasks that borrow local data |
async + tokio | I/O-bound concurrency (networking, file I/O) |
Shared State: Arc, Mutex, RwLock, Atomics
When threads need shared mutable state, Rust provides safe abstractions:
Note:
.unwrap()on.lock(),.read(), and.write()is used for brevity throughout these examples. These calls fail only if another thread panicked while holding the lock (“poisoning”). Production code should decide whether to recover from poisoned locks or propagate the error.
#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex, RwLock};
use std::sync::atomic::{AtomicU64, Ordering};
use std::thread;
// --- Arc<Mutex<T>>: Shared + Exclusive access ---
fn mutex_example() {
let counter = Arc::new(Mutex::new(0u64));
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
handles.push(thread::spawn(move || {
for _ in 0..1000 {
let mut guard = counter.lock().unwrap();
*guard += 1;
} // Guard dropped → lock released
}));
}
for h in handles { h.join().unwrap(); }
println!("Counter: {}", counter.lock().unwrap()); // 10000
}
// --- Arc<RwLock<T>>: Multiple readers OR one writer ---
fn rwlock_example() {
let config = Arc::new(RwLock::new(String::from("initial")));
// Many readers — don't block each other
let readers: Vec<_> = (0..5).map(|id| {
let config = Arc::clone(&config);
thread::spawn(move || {
let guard = config.read().unwrap();
println!("Reader {id}: {guard}");
})
}).collect();
// Writer — blocks and waits for all readers to finish
{
let mut guard = config.write().unwrap();
*guard = "updated".to_string();
}
for r in readers { r.join().unwrap(); }
}
// --- Atomics: Lock-free for simple values ---
fn atomic_example() {
let counter = Arc::new(AtomicU64::new(0));
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
handles.push(thread::spawn(move || {
for _ in 0..1000 {
counter.fetch_add(1, Ordering::Relaxed);
// No lock, no mutex — hardware atomic instruction
}
}));
}
for h in handles { h.join().unwrap(); }
println!("Atomic counter: {}", counter.load(Ordering::Relaxed)); // 10000
}
}
Quick Comparison
| Primitive | Use Case | Cost | Contention |
|---|---|---|---|
Mutex<T> | Short critical sections | Lock + unlock | Threads wait in line |
RwLock<T> | Read-heavy, rare writes | Reader-writer lock | Readers concurrent, writer exclusive |
AtomicU64 etc. | Counters, flags | Hardware CAS | Lock-free — no waiting |
| Channels | Message passing | Queue ops | Producer/consumer decouple |
Condition Variables (Condvar)
A Condvar lets a thread wait until another thread signals that a condition is
true, without busy-looping. It is always paired with a Mutex:
#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex, Condvar};
use std::thread;
let pair = Arc::new((Mutex::new(false), Condvar::new()));
let pair2 = Arc::clone(&pair);
// Spawned thread: wait until ready == true
let handle = thread::spawn(move || {
let (lock, cvar) = &*pair2;
let mut ready = lock.lock().unwrap();
while !*ready {
ready = cvar.wait(ready).unwrap(); // atomically unlocks + sleeps
}
println!("Worker: condition met, proceeding");
});
// Main thread: set ready = true, then signal
{
let (lock, cvar) = &*pair;
let mut ready = lock.lock().unwrap();
*ready = true;
cvar.notify_one(); // wake one waiting thread (use notify_all for many)
}
handle.join().unwrap();
}
Pattern: Always re-check the condition in a
whileloop afterwait()returns — spurious wakeups are allowed by the OS.
Lazy Initialization: OnceLock and LazyLock
Before Rust 1.80, initializing a global static that requires runtime computation
(e.g., parsing a config, compiling a regex) needed the lazy_static! macro or the
once_cell crate. The standard library now provides two types that cover these
use cases natively:
#![allow(unused)]
fn main() {
use std::sync::{OnceLock, LazyLock};
use std::collections::HashMap;
// OnceLock — initialize on first use via `get_or_init`.
// Useful when the init value depends on runtime arguments.
static CONFIG: OnceLock<HashMap<String, String>> = OnceLock::new();
fn get_config() -> &'static HashMap<String, String> {
CONFIG.get_or_init(|| {
// Expensive: read & parse config file — happens exactly once.
let mut m = HashMap::new();
m.insert("log_level".into(), "info".into());
m
})
}
// LazyLock — initialize on first access, closure provided at definition site.
// Equivalent to lazy_static! but without a macro.
static REGEX: LazyLock<regex::Regex> = LazyLock::new(|| {
regex::Regex::new(r"^[a-zA-Z0-9_]+$").unwrap()
});
fn is_valid_identifier(s: &str) -> bool {
REGEX.is_match(s) // First call compiles the regex; subsequent calls reuse it.
}
}
| Type | Stabilized | Init Timing | Use When |
|---|---|---|---|
OnceLock<T> | Rust 1.70 | Call-site (get_or_init) | Init depends on runtime args |
LazyLock<T> | Rust 1.80 | Definition-site (closure) | Init is self-contained |
lazy_static! | — | Definition-site (macro) | Pre-1.80 codebases (migrate away) |
const fn + static | Always | Compile-time | Value is computable at compile time |
Migration tip: Replace
lazy_static! { static ref X: T = expr; }withstatic X: LazyLock<T> = LazyLock::new(|| expr);— same semantics, no macro, no external dependency.
Lock-Free Patterns
For high-performance code, avoid locks entirely:
#![allow(unused)]
fn main() {
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
// Pattern 1: Spin lock (educational — prefer std::sync::Mutex)
// ⚠️ WARNING: This is a teaching example only. Real spinlocks need:
// - A RAII guard (so a panic while holding doesn't deadlock forever)
// - Fairness guarantees (this starves under contention)
// - Backoff strategies (exponential backoff, yield to OS)
// Use std::sync::Mutex or parking_lot::Mutex in production.
struct SpinLock {
locked: AtomicBool,
}
impl SpinLock {
fn new() -> Self { SpinLock { locked: AtomicBool::new(false) } }
fn lock(&self) {
while self.locked
.compare_exchange_weak(false, true, Ordering::Acquire, Ordering::Relaxed)
.is_err()
{
std::hint::spin_loop(); // CPU hint: we're spinning
}
}
fn unlock(&self) {
self.locked.store(false, Ordering::Release);
}
}
// Pattern 2: Lock-free SPSC (single producer, single consumer)
// Use crossbeam::queue::ArrayQueue or similar in production
// roll-your-own only for learning.
// Pattern 3: Sequence counter for wait-free reads
// ⚠️ Best for single-machine-word types (u64, f64); wider T may tear on read.
struct SeqLock<T: Copy> {
seq: AtomicUsize,
data: std::cell::UnsafeCell<T>,
}
unsafe impl<T: Copy + Send> Sync for SeqLock<T> {}
impl<T: Copy> SeqLock<T> {
fn new(val: T) -> Self {
SeqLock {
seq: AtomicUsize::new(0),
data: std::cell::UnsafeCell::new(val),
}
}
fn read(&self) -> T {
loop {
let s1 = self.seq.load(Ordering::Acquire);
if s1 & 1 != 0 { continue; } // Writer in progress, retry
// SAFETY: We use ptr::read_volatile to prevent the compiler from
// reordering or caching the read. The SeqLock protocol (checking
// s1 == s2 after reading) ensures we retry if a writer was active.
// This mirrors the C SeqLock pattern where the data read must use
// volatile/relaxed semantics to avoid tearing under concurrency.
let value = unsafe { core::ptr::read_volatile(self.data.get() as *const T) };
// Acquire fence: ensures the data read above is ordered before
// we re-check the sequence counter.
std::sync::atomic::fence(Ordering::Acquire);
let s2 = self.seq.load(Ordering::Relaxed);
if s1 == s2 { return value; } // No writer intervened
// else retry
}
}
/// # Safety contract
/// Only ONE thread may call `write()` at a time. If multiple writers
/// are needed, wrap the `write()` call in an external `Mutex`.
fn write(&self, val: T) {
// Increment to odd (signals write in progress).
// AcqRel: the Acquire side prevents the subsequent data write
// from being reordered before this increment (readers must see
// odd before they could observe a partial write). The Release
// side is technically unnecessary for a single writer but
// harmless and consistent.
self.seq.fetch_add(1, Ordering::AcqRel);
// SAFETY: Single-writer invariant upheld by caller (see doc above).
// UnsafeCell allows interior mutation; seq counter protects readers.
unsafe { *self.data.get() = val; }
// Increment to even (signals write complete).
// Release: ensure the data write is visible before readers see the even seq.
self.seq.fetch_add(1, Ordering::Release);
}
}
}
⚠️ Rust memory model caveat: The non-atomic write through
UnsafeCellinwrite()concurrent with the non-atomicptr::read_volatileinread()is technically a data race under the Rust abstract machine — even though the SeqLock protocol ensures readers always retry on stale data. This mirrors the C kernel SeqLock pattern and is sound in practice on all modern hardware for typesTthat fit in a single machine word (e.g.,u64). For wider types, consider usingAtomicU64for the data field or wrapping access in aMutex. See the Rust unsafe code guidelines for the evolving story onUnsafeCellconcurrency.
Practical advice: Lock-free code is hard to get right. Use
MutexorRwLockunless profiling shows lock contention is your bottleneck. When you do need lock-free, reach for proven crates (crossbeam,arc-swap,dashmap) rather than rolling your own.
Key Takeaways — Concurrency
- Scoped threads (
thread::scope) let you borrow stack data withoutArcrayon::par_iter()parallelizes iterators with one method call- Use
OnceLock/LazyLockinstead oflazy_static!; useMutexbefore reaching for atomics- Lock-free code is hard — prefer proven crates over hand-rolled implementations
See also: Ch 5 — Channels for message-passing concurrency. Ch 8 — Smart Pointers for Arc/Rc details.
flowchart TD
A["Need shared<br>mutable state?"] -->|Yes| B{"How much<br>contention?"}
A -->|No| C["Use channels<br>(Ch 5)"]
B -->|"Read-heavy"| D["RwLock"]
B -->|"Short critical<br>section"| E["Mutex"]
B -->|"Simple counter<br>or flag"| F["Atomics"]
B -->|"Complex state"| G["Actor + channels"]
H["Need parallelism?"] -->|"Collection<br>processing"| I["rayon::par_iter"]
H -->|"Background task"| J["thread::spawn"]
H -->|"Borrow local data"| K["thread::scope"]
style A fill:#e8f4f8,stroke:#2980b9,color:#000
style B fill:#fef9e7,stroke:#f1c40f,color:#000
style C fill:#d4efdf,stroke:#27ae60,color:#000
style D fill:#fdebd0,stroke:#e67e22,color:#000
style E fill:#fdebd0,stroke:#e67e22,color:#000
style F fill:#fdebd0,stroke:#e67e22,color:#000
style G fill:#fdebd0,stroke:#e67e22,color:#000
style H fill:#e8f4f8,stroke:#2980b9,color:#000
style I fill:#d4efdf,stroke:#27ae60,color:#000
style J fill:#d4efdf,stroke:#27ae60,color:#000
style K fill:#d4efdf,stroke:#27ae60,color:#000
Exercise: Parallel Map with Scoped Threads ★★ (~25 min)
Write a function parallel_map<T, R>(data: &[T], f: fn(&T) -> R, num_threads: usize) -> Vec<R> that splits data into num_threads chunks and processes each in a scoped thread. Do not use rayon — use std::thread::scope.
🔑 Solution
fn parallel_map<T: Sync, R: Send>(data: &[T], f: fn(&T) -> R, num_threads: usize) -> Vec<R> {
let chunk_size = (data.len() + num_threads - 1) / num_threads;
let mut results = Vec::with_capacity(data.len());
std::thread::scope(|s| {
let mut handles = Vec::new();
for chunk in data.chunks(chunk_size) {
handles.push(s.spawn(move || {
chunk.iter().map(f).collect::<Vec<_>>()
}));
}
for h in handles {
results.extend(h.join().unwrap());
}
});
results
}
fn main() {
let data: Vec<u64> = (1..=20).collect();
let squares = parallel_map(&data, |x| x * x, 4);
assert_eq!(squares, (1..=20).map(|x: u64| x * x).collect::<Vec<_>>());
println!("Parallel squares: {squares:?}");
}
7. Closures and Higher-Order Functions 🟢
What you’ll learn:
- The three closure traits (
Fn,FnMut,FnOnce) and how capture works- Passing closures as parameters and returning them from functions
- Combinator chains and iterator adapters for functional-style programming
- Designing your own higher-order APIs with the right trait bounds
Fn, FnMut, FnOnce — The Closure Traits
Every closure in Rust implements one or more of three traits, based on how it captures variables:
#![allow(unused)]
fn main() {
// FnOnce — consumes captured values (can only be called once)
let name = String::from("Alice");
let greet = move || {
println!("Hello, {name}!"); // Takes ownership of `name`
drop(name); // name is consumed
};
greet(); // ✅ First call
// greet(); // ❌ Can't call again — `name` was consumed
// FnMut — mutably borrows captured values (can be called many times)
let mut count = 0;
let mut increment = || {
count += 1; // Mutably borrows `count`
};
increment(); // count == 1
increment(); // count == 2
// Fn — immutably borrows captured values (can be called many times, concurrently)
let prefix = "Result";
let display = |x: i32| {
println!("{prefix}: {x}"); // Immutably borrows `prefix`
};
display(1);
display(2);
}
The hierarchy: Fn : FnMut : FnOnce — each is a subtrait of the next:
FnOnce ← everything can be called at least once
↑
FnMut ← can be called repeatedly (may mutate state)
↑
Fn ← can be called repeatedly and concurrently (no mutation)
If a closure implements Fn, it also implements FnMut and FnOnce.
Closures as Parameters and Return Values
// --- Parameters ---
// Static dispatch (monomorphized — fastest)
fn apply_twice<F: Fn(i32) -> i32>(f: F, x: i32) -> i32 {
f(f(x))
}
// Also written with impl Trait:
fn apply_twice_v2(f: impl Fn(i32) -> i32, x: i32) -> i32 {
f(f(x))
}
// Dynamic dispatch (trait object — flexible, slight overhead)
fn apply_dyn(f: &dyn Fn(i32) -> i32, x: i32) -> i32 {
f(x)
}
// --- Return Values ---
// Can't return closures by value without boxing (they have anonymous types):
fn make_adder(n: i32) -> Box<dyn Fn(i32) -> i32> {
Box::new(move |x| x + n)
}
// With impl Trait (simpler, monomorphized, but can't be dynamic):
fn make_adder_v2(n: i32) -> impl Fn(i32) -> i32 {
move |x| x + n
}
fn main() {
let double = |x: i32| x * 2;
println!("{}", apply_twice(double, 3)); // 12
let add5 = make_adder(5);
println!("{}", add5(10)); // 15
}
Combinator Chains and Iterator Adapters
Higher-order functions shine with iterators — this is idiomatic Rust:
#![allow(unused)]
fn main() {
// C-style loop (imperative):
let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
let mut result = Vec::new();
for x in &data {
if x % 2 == 0 {
result.push(x * x);
}
}
// Idiomatic Rust (functional combinator chain):
let result: Vec<i32> = data.iter()
.filter(|&&x| x % 2 == 0)
.map(|&x| x * x)
.collect();
// Same performance — iterators are lazy and optimized by LLVM
assert_eq!(result, vec![4, 16, 36, 64, 100]);
}
Common combinators cheat sheet:
| Combinator | What It Does | Example |
|---|---|---|
.map(f) | Transform each element | `.map( |
.filter(p) | Keep elements where predicate is true | `.filter( |
.filter_map(f) | Map + filter in one step (returns Option) | `.filter_map( |
.flat_map(f) | Map then flatten nested iterators | `.flat_map( |
.fold(init, f) | Reduce to single value (like Aggregate in C#) | `.fold(0, |
.any(p) / .all(p) | Short-circuit boolean check | `.any( |
.enumerate() | Add index | `.enumerate().map( |
.zip(other) | Pair with another iterator | .zip(labels.iter()) |
.take(n) / .skip(n) | First/skip N elements | .take(10) |
.chain(other) | Concatenate two iterators | .chain(extra.iter()) |
.peekable() | Look ahead without consuming | .peek() |
.collect() | Gather into a collection | .collect::<Vec<_>>() |
Implementing Your Own Higher-Order APIs
Design APIs that accept closures for customization:
#![allow(unused)]
fn main() {
/// Retry an operation with a configurable strategy
fn retry<T, E, F, S>(
mut operation: F,
mut should_retry: S,
max_attempts: usize,
) -> Result<T, E>
where
F: FnMut() -> Result<T, E>,
S: FnMut(&E, usize) -> bool, // (error, attempt) → try again?
{
for attempt in 1..=max_attempts {
match operation() {
Ok(val) => return Ok(val),
Err(e) if attempt < max_attempts && should_retry(&e, attempt) => {
continue;
}
Err(e) => return Err(e),
}
}
unreachable!()
}
// Usage — caller controls retry logic:
}
#![allow(unused)]
fn main() {
fn connect_to_database() -> Result<(), String> { Ok(()) }
fn http_get(_url: &str) -> Result<String, String> { Ok(String::new()) }
trait TransientError { fn is_transient(&self) -> bool; }
impl TransientError for String { fn is_transient(&self) -> bool { true } }
let url = "http://example.com";
let result = retry(
|| connect_to_database(),
|err, attempt| {
eprintln!("Attempt {attempt} failed: {err}");
true // Always retry
},
3,
);
// Usage — retry only specific errors:
let result = retry(
|| http_get(url),
|err, _| err.is_transient(), // Only retry transient errors
5,
);
}
The with Pattern — Bracketed Resource Access
Sometimes you need to guarantee that a resource is in a specific state for the
duration of an operation, and restored afterward — regardless of how the caller’s
code exits (early return, ?, panic). Instead of exposing the resource directly
and hoping callers remember to set up and tear down, lend it through a closure:
set up → call closure with resource → tear down
The caller never touches setup or teardown. They can’t forget, can’t get it wrong, and can’t hold the resource beyond the closure’s scope.
Example: GPIO Pin Direction
A GPIO controller manages pins that support bidirectional I/O. Some callers need
the pin configured as input, others as output. Rather than exposing raw pin access
and trusting callers to set direction correctly, the controller provides
with_pin_input and with_pin_output:
/// GPIO pin direction — not public, callers never set this directly.
#[derive(Debug, Clone, Copy, PartialEq)]
enum Direction { In, Out }
/// A GPIO pin handle lent to the closure. Cannot be stored or cloned —
/// it exists only for the duration of the callback.
pub struct GpioPin<'a> {
pin_number: u8,
_controller: &'a GpioController,
}
impl GpioPin<'_> {
pub fn read(&self) -> bool {
// Read pin level from hardware register
println!(" reading pin {}", self.pin_number);
true // stub
}
pub fn write(&self, high: bool) {
// Drive pin level via hardware register
println!(" writing pin {} = {high}", self.pin_number);
}
}
pub struct GpioController {
current_direction: std::cell::Cell<Option<Direction>>,
}
impl GpioController {
pub fn new() -> Self {
GpioController {
current_direction: std::cell::Cell::new(None),
}
}
/// Configure pin as input, run the closure, restore state.
/// The caller receives a `GpioPin` that lives only for the callback.
pub fn with_pin_input<R>(
&self,
pin: u8,
mut f: impl FnMut(&GpioPin<'_>) -> R,
) -> R {
let prev = self.current_direction.get();
self.set_direction(pin, Direction::In);
let handle = GpioPin { pin_number: pin, _controller: self };
let result = f(&handle);
// Restore previous direction (or leave as-is — policy choice)
if let Some(dir) = prev {
self.set_direction(pin, dir);
}
result
}
/// Configure pin as output, run the closure, restore state.
pub fn with_pin_output<R>(
&self,
pin: u8,
mut f: impl FnMut(&GpioPin<'_>) -> R,
) -> R {
let prev = self.current_direction.get();
self.set_direction(pin, Direction::Out);
let handle = GpioPin { pin_number: pin, _controller: self };
let result = f(&handle);
if let Some(dir) = prev {
self.set_direction(pin, dir);
}
result
}
fn set_direction(&self, pin: u8, dir: Direction) {
println!(" [hw] pin {pin} → {dir:?}");
self.current_direction.set(Some(dir));
}
}
fn main() {
let gpio = GpioController::new();
// Caller 1: needs input — doesn't know or care how direction is managed
let level = gpio.with_pin_input(4, |pin| {
pin.read()
});
println!("Pin 4 level: {level}");
// Caller 2: needs output — same API shape, different guarantee
gpio.with_pin_output(4, |pin| {
pin.write(true);
// do more work...
pin.write(false);
});
// Can't use the pin handle outside the closure:
// let escaped_pin = gpio.with_pin_input(4, |pin| pin);
// ❌ ERROR: borrowed value does not live long enough
}
What the with pattern guarantees:
- Direction is always set before the caller’s code runs
- Direction is always restored after, even if the closure returns early
- The
GpioPinhandle cannot escape the closure — the borrow checker enforces this via the lifetime tied to the controller reference - Callers never import
Direction, never callset_direction— the API is impossible to misuse
Where This Pattern Appears
The with pattern shows up throughout Rust’s standard library and ecosystem:
| API | Setup | Callback | Teardown |
|---|---|---|---|
std::thread::scope | Create scope | |s| { s.spawn(...) } | Join all threads |
Mutex::lock | Acquire lock | Use MutexGuard (RAII, not closure, but same idea) | Release on drop |
tempfile::tempdir | Create temp directory | Use path | Delete on drop |
std::io::BufWriter::new | Buffer writes | Write operations | Flush on drop |
GPIO with_pin_* (above) | Set direction | Use pin handle | Restore direction |
The closure-based variant is strongest when:
- Setup and teardown are paired and forgetting either is a bug
- The resource shouldn’t outlive the operation — the borrow checker enforces this naturally
- Multiple configurations exist (
with_pin_inputvswith_pin_output) — eachwith_*method encapsulates a different setup without exposing the configuration to the caller
withvs RAII (Drop): Both guarantee cleanup. Use RAII /Dropwhen the caller needs to hold the resource across multiple statements and function calls. Usewithwhen the operation is bracketed — one setup, one block of work, one teardown — and you don’t want the caller to be able to break the bracket.
FnMut vs Fn in API design: Use
FnMutas the default bound — it’s the most flexible (callers can passFnorFnMutclosures). Only requireFnif you need to call the closure concurrently (e.g., from multiple threads). Only requireFnOnceif you call it exactly once.
Key Takeaways — Closures
Fnborrows,FnMutborrows mutably,FnOnceconsumes — accept the weakest bound your API needsimpl Fnin parameters,Box<dyn Fn>for storage,impl Fnin return (orBox<dyn Fn>if dynamic)- Combinator chains (
map,filter,and_then) compose cleanly and inline to tight loops- The
withpattern (bracketed access via closure) guarantees setup/teardown and prevents resource escape — use it when the caller shouldn’t manage configuration lifecycle
See also: Ch 2 — Traits In Depth for how
Fn/FnMut/FnOncerelate to trait objects. Ch 8 — Functional vs. Imperative for when to choose combinators over loops. Ch 15 — API Design for ergonomic parameter patterns.
graph TD
FnOnce["FnOnce<br>(can call once)"]
FnMut["FnMut<br>(can call many times,<br>may mutate captures)"]
Fn["Fn<br>(can call many times,<br>immutable captures)"]
Fn -->|"implements"| FnMut
FnMut -->|"implements"| FnOnce
style Fn fill:#d4efdf,stroke:#27ae60,color:#000
style FnMut fill:#fef9e7,stroke:#f1c40f,color:#000
style FnOnce fill:#fadbd8,stroke:#e74c3c,color:#000
Every
Fnis alsoFnMut, and everyFnMutis alsoFnOnce. AcceptFnMutby default — it’s the most flexible bound for callers.
Exercise: Higher-Order Combinator Pipeline ★★ (~25 min)
Create a Pipeline struct that chains transformations. It should support .pipe(f) to add a transformation and .execute(input) to run the full chain.
🔑 Solution
struct Pipeline<T> {
transforms: Vec<Box<dyn Fn(T) -> T>>,
}
impl<T: 'static> Pipeline<T> {
fn new() -> Self {
Pipeline { transforms: Vec::new() }
}
fn pipe(mut self, f: impl Fn(T) -> T + 'static) -> Self {
self.transforms.push(Box::new(f));
self
}
fn execute(self, input: T) -> T {
self.transforms.into_iter().fold(input, |val, f| f(val))
}
}
fn main() {
let result = Pipeline::new()
.pipe(|s: String| s.trim().to_string())
.pipe(|s| s.to_uppercase())
.pipe(|s| format!(">>> {s} <<<"))
.execute(" hello world ".to_string());
println!("{result}"); // >>> HELLO WORLD <<<
let result = Pipeline::new()
.pipe(|x: i32| x * 2)
.pipe(|x| x + 10)
.pipe(|x| x * x)
.execute(5);
println!("{result}"); // (5*2 + 10)^2 = 400
}
Chapter 8 — Functional vs. Imperative: When Elegance Wins (and When It Doesn’t)
Difficulty: 🟡 Intermediate | Time: 2–3 hours | Prerequisites: Ch 7 — Closures
Rust gives you genuine parity between functional and imperative styles. Unlike Haskell (functional by fiat) or C (imperative by default), Rust lets you choose — and the right choice depends on what you’re expressing. This chapter builds the judgment to pick well.
The core principle: Functional style shines when you’re transforming data through a pipeline. Imperative style shines when you’re managing state transitions with side effects. Most real code has both, and the skill is knowing where the boundary falls.
8.1 The Combinator You Didn’t Know You Wanted
Many Rust developers write this:
#![allow(unused)]
fn main() {
let value = if let Some(x) = maybe_config() {
x
} else {
default_config()
};
process(value);
}
When they could write this:
#![allow(unused)]
fn main() {
process(maybe_config().unwrap_or_else(default_config));
}
Or this common pattern:
#![allow(unused)]
fn main() {
let display_name = if let Some(name) = user.nickname() {
name.to_uppercase()
} else {
"ANONYMOUS".to_string()
};
}
Which is:
#![allow(unused)]
fn main() {
let display_name = user.nickname()
.map(|n| n.to_uppercase())
.unwrap_or_else(|| "ANONYMOUS".to_string());
}
The functional version isn’t just shorter — it tells you what is happening (transform, then default) without making you trace control flow. The if let version makes you read the branches to figure out that both paths end up in the same place.
The Option combinator family
Here’s the mental model: Option<T> is a one-element-or-empty collection. Every combinator on Option has an analogy to a collection operation.
| You write… | Instead of… | What it communicates |
|---|---|---|
opt.unwrap_or(default) | if let Some(x) = opt { x } else { default } | “Use this value or fall back” |
opt.unwrap_or_else(|| expensive()) | if let Some(x) = opt { x } else { expensive() } | Same, but default is lazy |
opt.map(f) | match opt { Some(x) => Some(f(x)), None => None } | “Transform the inside, propagate absence” |
opt.and_then(f) | match opt { Some(x) => f(x), None => None } | “Chain fallible operations” (flatmap) |
opt.filter(|x| pred(x)) | match opt { Some(x) if pred(&x) => Some(x), _ => None } | “Keep only if it passes” |
opt.zip(other) | if let (Some(a), Some(b)) = (opt, other) { Some((a,b)) } else { None } | “Both or neither” |
opt.or(fallback) | if opt.is_some() { opt } else { fallback } | “First available” |
opt.or_else(|| try_another()) | if opt.is_some() { opt } else { try_another() } | “Try alternatives in order” |
opt.map_or(default, f) | if let Some(x) = opt { f(x) } else { default } | “Transform or default” — one-liner |
opt.map_or_else(default_fn, f) | if let Some(x) = opt { f(x) } else { default_fn() } | Same, both sides are closures |
opt? | match opt { Some(x) => x, None => return None } | “Propagate absence upward” |
The Result combinator family
The same pattern applies to Result<T, E>:
| You write… | Instead of… | What it communicates |
|---|---|---|
res.map(f) | match res { Ok(x) => Ok(f(x)), Err(e) => Err(e) } | Transform the success path |
res.map_err(f) | match res { Ok(x) => Ok(x), Err(e) => Err(f(e)) } | Transform the error |
res.and_then(f) | match res { Ok(x) => f(x), Err(e) => Err(e) } | Chain fallible operations |
res.unwrap_or_else(|e| default(e)) | match res { Ok(x) => x, Err(e) => default(e) } | Recover from error |
res.ok() | match res { Ok(x) => Some(x), Err(_) => None } | “I don’t care about the error” |
res? | match res { Ok(x) => x, Err(e) => return Err(e.into()) } | Propagate errors upward |
When if let IS better
The combinators lose when:
- You need multiple statements in the
Somebranch. A map closure with 5 lines is worse than anif letwith 5 lines. - The control flow is the point.
if let Some(connection) = pool.try_get() { /* use it */ } else { /* log, retry, alert */ }— the two branches are genuinely different code paths, not a transform-or-default. - Side effects dominate. If both branches do I/O with different error handling, the combinator version obscures the important differences.
Rule of thumb: If the else branch produces the same type as the Some branch and the bodies are short expressions, use a combinator. If the branches do fundamentally different things, use if let or match.
8.2 Bool Combinators: .then() and .then_some()
Another pattern that’s more common than it should be:
#![allow(unused)]
fn main() {
let label = if is_admin {
Some("ADMIN")
} else {
None
};
}
Rust 1.62+ gives you:
#![allow(unused)]
fn main() {
let label = is_admin.then_some("ADMIN");
}
Or with a computed value:
#![allow(unused)]
fn main() {
let permissions = is_admin.then(|| compute_admin_permissions());
}
This is especially powerful in chains:
#![allow(unused)]
fn main() {
// Imperative
let mut tags = Vec::new();
if user.is_admin { tags.push("admin"); }
if user.is_verified { tags.push("verified"); }
if user.score > 100 { tags.push("power-user"); }
// Functional
let tags: Vec<&str> = [
user.is_admin.then_some("admin"),
user.is_verified.then_some("verified"),
(user.score > 100).then_some("power-user"),
]
.into_iter()
.flatten()
.collect();
}
The functional version makes the pattern explicit: “build a list from conditional elements.” The imperative version makes you read each if to confirm they all do the same thing (push a tag).
8.3 Iterator Chains vs. Loops: The Decision Framework
Ch 7 showed the mechanics. This section builds the judgment.
When iterators win
Data pipelines — transforming a collection through a series of steps:
#![allow(unused)]
fn main() {
// Imperative: 8 lines, 2 mutable variables
let mut results = Vec::new();
for item in inventory {
if item.category == Category::Server {
if let Some(temp) = item.last_temperature() {
if temp > 80.0 {
results.push((item.id, temp));
}
}
}
}
// Functional: 6 lines, 0 mutable variables, one pipeline
let results: Vec<_> = inventory.iter()
.filter(|item| item.category == Category::Server)
.filter_map(|item| item.last_temperature().map(|t| (item.id, t)))
.filter(|(_, temp)| *temp > 80.0)
.collect();
}
The functional version wins because:
- Each filter is independently readable
- No
mut— the data flows in one direction - You can add/remove/reorder pipeline stages without restructuring
- LLVM inlines iterator adapters to the same machine code as the loop
Aggregation — computing a single value from a collection:
#![allow(unused)]
fn main() {
// Imperative
let mut total_power = 0.0;
let mut count = 0;
for server in fleet {
total_power += server.power_draw();
count += 1;
}
let avg = total_power / count as f64;
// Functional
let (total_power, count) = fleet.iter()
.map(|s| s.power_draw())
.fold((0.0, 0usize), |(sum, n), p| (sum + p, n + 1));
let avg = total_power / count as f64;
}
Or even simpler if you just need the sum:
#![allow(unused)]
fn main() {
let total: f64 = fleet.iter().map(|s| s.power_draw()).sum();
}
When loops win
Early exit with complex state:
#![allow(unused)]
fn main() {
// This is clear and direct
let mut best_candidate = None;
for server in fleet {
let score = evaluate(server);
if score > threshold {
if server.is_available() {
best_candidate = Some(server);
break; // Found one — stop immediately
}
}
}
// The functional version is strained
let best_candidate = fleet.iter()
.filter(|s| evaluate(s) > threshold)
.find(|s| s.is_available());
}
Wait — that functional version is actually pretty clean. Let’s try a case where it genuinely loses:
Building multiple outputs simultaneously:
#![allow(unused)]
fn main() {
// Imperative: clear, each branch does something different
let mut warnings = Vec::new();
let mut errors = Vec::new();
let mut stats = Stats::default();
for event in log_stream {
match event.severity {
Severity::Warn => {
warnings.push(event.clone());
stats.warn_count += 1;
}
Severity::Error => {
errors.push(event.clone());
stats.error_count += 1;
if event.is_critical() {
alert_oncall(&event);
}
}
_ => stats.other_count += 1,
}
}
// Functional version: forced, awkward, nobody wants to read this
let (warnings, errors, stats) = log_stream.iter().fold(
(Vec::new(), Vec::new(), Stats::default()),
|(mut w, mut e, mut s), event| {
match event.severity {
Severity::Warn => { w.push(event.clone()); s.warn_count += 1; }
Severity::Error => {
e.push(event.clone()); s.error_count += 1;
if event.is_critical() { alert_oncall(event); }
}
_ => s.other_count += 1,
}
(w, e, s)
},
);
}
The fold version is longer, harder to read, and has mutation anyway (the mut deconstructed accumulators). The loop wins because:
- Multiple outputs being built in parallel
- Side effects (alerting) mixed into the logic
- Branch bodies are statements, not expressions
State machines with I/O:
#![allow(unused)]
fn main() {
// A parser that reads tokens — the loop IS the algorithm
let mut state = ParseState::Start;
loop {
let token = lexer.next_token()?;
state = match state {
ParseState::Start => match token {
Token::Keyword(k) => ParseState::GotKeyword(k),
Token::Eof => break,
_ => return Err(ParseError::UnexpectedToken(token)),
},
ParseState::GotKeyword(k) => match token {
Token::Ident(name) => ParseState::GotName(k, name),
_ => return Err(ParseError::ExpectedIdentifier),
},
// ...more states
};
}
}
No functional equivalent is cleaner. The loop with match state is the natural expression of a state machine.
The decision flowchart
flowchart TB
START{What are you doing?}
START -->|"Transforming a collection\ninto another collection"| PIPE[Use iterator chain]
START -->|"Computing a single value\nfrom a collection"| AGG{How complex?}
START -->|"Multiple outputs from\none pass"| LOOP[Use a for loop]
START -->|"State machine with\nI/O or side effects"| LOOP
START -->|"One Option/Result\ntransform + default"| COMB[Use combinators]
AGG -->|"Sum, count, min, max"| BUILTIN["Use .sum(), .count(),\n.min(), .max()"]
AGG -->|"Custom accumulation"| FOLD{Accumulator has mutation\nor side effects?}
FOLD -->|"No"| FOLDF["Use .fold()"]
FOLD -->|"Yes"| LOOP
style PIPE fill:#d4efdf,stroke:#27ae60,color:#000
style COMB fill:#d4efdf,stroke:#27ae60,color:#000
style BUILTIN fill:#d4efdf,stroke:#27ae60,color:#000
style FOLDF fill:#d4efdf,stroke:#27ae60,color:#000
style LOOP fill:#fef9e7,stroke:#f1c40f,color:#000
Sidebar: Scoped mutability — imperative inside, functional outside
Rust blocks are expressions. This lets you confine mutation to a construction phase and bind the result immutably:
#![allow(unused)]
fn main() {
use rand::random;
let samples = {
let mut buf = Vec::with_capacity(10);
while buf.len() < 10 {
let reading: f64 = random();
buf.push(reading);
if random::<u8>() % 3 == 0 { break; } // randomly stop early
}
buf
};
// samples is immutable — contains between 1 and 10 elements
}
The inner buf is mutable only inside the block. Once the block yields, the outer binding
samples is immutable and the compiler will reject any later samples.push(...).
Why not an iterator chain? You might try:
#![allow(unused)]
fn main() {
let samples: Vec<f64> = std::iter::from_fn(|| Some(random()))
.take(10)
.take_while(|_| random::<u8>() % 3 != 0)
.collect();
}
But take_while excludes the element that fails the predicate, producing anywhere from
zero to nine elements instead of the guaranteed-at-least-one the imperative version provides. You can work around it with scan or chain, but the imperative version
is clearer.
When scoped mutability genuinely wins:
| Scenario | Why iterators struggle |
|---|---|
Sort-then-freeze (sort_unstable() + dedup()) | Both return () — no chainable output (itertools offers .sorted().dedup() if available) |
| Stateful termination (stop on a condition unrelated to the data) | take_while drops the boundary element |
| Multi-step struct population (field-by-field from different sources) | No natural single pipeline |
Honest calibration: For most collection-building tasks, iterator chains or itertools are preferred. Reach for scoped mutability when the construction logic has branching, early exit, or in-place mutation that doesn’t map to a single pipeline. The pattern’s real value is teaching that mutation scope can be smaller than variable lifetime — a Rust fundamental that surprises developers coming from C++, C#, and Python.
8.4 The ? Operator: Where Functional Meets Imperative
The ? operator is Rust’s most elegant synthesis of both styles. It’s essentially .and_then() combined with early return:
#![allow(unused)]
fn main() {
// This chain of and_then...
fn load_config() -> Result<Config, Error> {
read_file("config.toml")
.and_then(|contents| parse_toml(&contents))
.and_then(|table| validate_config(table))
.and_then(|valid| Config::from_validated(valid))
}
// ...is exactly equivalent to this
fn load_config() -> Result<Config, Error> {
let contents = read_file("config.toml")?;
let table = parse_toml(&contents)?;
let valid = validate_config(table)?;
Config::from_validated(valid)
}
}
Both are functional in spirit (they propagate errors automatically) but the ? version gives you named intermediate variables, which matter when:
- You need to use
contentsagain later - You want to add
.context("while parsing config")?per step - You’re debugging and want to inspect intermediate values
The anti-pattern: long .and_then() chains when ? is available. If every closure in the chain is |x| next_step(x), you’ve reinvented ? without the readability.
When .and_then() IS better than ?:
#![allow(unused)]
fn main() {
// Transforming inside an Option, without early return
let port: Option<u16> = config.get("port")
.and_then(|v| v.parse::<u16>().ok())
.filter(|&p| p > 0 && p < 65535);
}
You can’t use ? here because there’s no enclosing function to return from — you’re building an Option, not propagating it.
8.5 Collection Building: collect() vs. Push Loops
collect() is more powerful than most developers realize:
Collecting into a Result
#![allow(unused)]
fn main() {
// Imperative: parse a list, fail on first error
let mut numbers = Vec::new();
for s in input_strings {
let n: i64 = s.parse().map_err(|_| Error::BadInput(s.clone()))?;
numbers.push(n);
}
// Functional: collect into Result<Vec<_>, _>
let numbers: Vec<i64> = input_strings.iter()
.map(|s| s.parse::<i64>().map_err(|_| Error::BadInput(s.clone())))
.collect::<Result<_, _>>()?;
}
The collect::<Result<Vec<_>, _>>() trick works because Result implements FromIterator. It short-circuits on the first Err, just like the loop with ?.
Collecting into a HashMap
#![allow(unused)]
fn main() {
// Imperative
let mut index = HashMap::new();
for server in fleet {
index.insert(server.id.clone(), server);
}
// Functional
let index: HashMap<_, _> = fleet.into_iter()
.map(|s| (s.id.clone(), s))
.collect();
}
Collecting into a String
#![allow(unused)]
fn main() {
// Imperative
let mut csv = String::new();
for (i, field) in fields.iter().enumerate() {
if i > 0 { csv.push(','); }
csv.push_str(field);
}
// Functional
let csv = fields.join(",");
// Or for more complex formatting:
let csv: String = fields.iter()
.map(|f| format!("\"{f}\""))
.collect::<Vec<_>>()
.join(",");
}
When the loop version wins
collect() allocates a new collection. If you’re modifying in place, the loop is both clearer and more efficient:
#![allow(unused)]
fn main() {
// In-place update — no functional equivalent that's better
for server in &mut fleet {
if server.needs_refresh() {
server.refresh_telemetry()?;
}
}
}
The functional version would require .iter_mut().for_each(|s| { ... }), which is just a loop with extra syntax.
8.6 Pattern Matching as Function Dispatch
Rust’s match is a functional construct that most developers use imperatively. Here’s the functional lens:
Match as a lookup table
#![allow(unused)]
fn main() {
// Imperative thinking: "check each case"
fn status_message(code: StatusCode) -> &'static str {
if code == StatusCode::OK { "Success" }
else if code == StatusCode::NOT_FOUND { "Not found" }
else if code == StatusCode::INTERNAL { "Server error" }
else { "Unknown" }
}
// Functional thinking: "map from domain to range"
fn status_message(code: StatusCode) -> &'static str {
match code {
StatusCode::OK => "Success",
StatusCode::NOT_FOUND => "Not found",
StatusCode::INTERNAL => "Server error",
_ => "Unknown",
}
}
}
The match version isn’t just style — the compiler verifies exhaustiveness. Add a new variant, and every match that doesn’t handle it becomes a compile error. The if/else chain silently falls through to the default.
Match + destructuring as a pipeline
#![allow(unused)]
fn main() {
// Parsing a command — each arm extracts and transforms
fn execute(cmd: Command) -> Result<Response, Error> {
match cmd {
Command::Get { key } => db.get(&key).map(Response::Value),
Command::Set { key, value } => db.set(key, value).map(|_| Response::Ok),
Command::Delete { key } => db.delete(&key).map(|_| Response::Ok),
Command::Batch(cmds) => cmds.into_iter()
.map(execute)
.collect::<Result<Vec<_>, _>>()
.map(Response::Batch),
}
}
}
Each arm is an expression that returns the same type. This is pattern matching as function dispatch — the match arms are essentially a function table indexed by the enum variant.
8.7 Chaining Methods on Custom Types
The functional style extends beyond standard library types. Builder patterns and fluent APIs are functional programming in disguise:
#![allow(unused)]
fn main() {
// This is a combinator chain over your own type
let query = QueryBuilder::new("servers")
.filter("status", Eq, "active")
.filter("rack", In, &["A1", "A2", "B1"])
.order_by("temperature", Desc)
.limit(50)
.build();
}
The key insight: if your type has methods that take self and return Self (or a transformed type), you’ve built a combinator. The same functional/imperative judgment applies:
#![allow(unused)]
fn main() {
// Good: chainable because each step is a simple transform
let config = Config::default()
.with_timeout(Duration::from_secs(30))
.with_retries(3)
.with_tls(true);
// Bad: chainable but the chain is doing too many unrelated things
let result = processor
.load_data(path)? // I/O
.validate() // Pure
.transform(rule_set) // Pure
.save_to_disk(output)? // I/O
.notify_downstream()?; // Side effect
// Better: separate the pure pipeline from the I/O bookends
let data = load_data(path)?;
let processed = data.validate().transform(rule_set);
save_to_disk(output, &processed)?;
notify_downstream()?;
}
The chain fails when it mixes pure transforms with I/O. The reader can’t tell which calls might fail, which have side effects, and where the actual data transformations happen.
8.8 Performance: They’re the Same
A common misconception: “functional style is slower because of all the closures and allocations.”
In Rust, iterator chains compile to the same machine code as hand-written loops. LLVM inlines the closure calls, eliminates the iterator adapter structs, and often produces identical assembly. This is called zero-cost abstraction and it’s not aspirational — it’s measured.
#![allow(unused)]
fn main() {
// These produce identical assembly on release builds:
// Functional
let sum: i64 = (0..1000).filter(|n| n % 2 == 0).map(|n| n * n).sum();
// Imperative
let mut sum: i64 = 0;
for n in 0..1000 {
if n % 2 == 0 {
sum += n * n;
}
}
}
The one exception: .collect() allocates. If you’re chaining .map().collect().iter().map().collect() with intermediate collections, you’re paying for allocations the loop version avoids. The fix: eliminate intermediate collects by chaining adapters directly, or use a loop if you need the intermediate collections for other reasons.
8.9 The Taste Test: A Catalog of Transformations
Here’s a reference table for the most common “I wrote 6 lines but there’s a one-liner” patterns:
| Imperative pattern | Functional equivalent | When to prefer functional |
|---|---|---|
if let Some(x) = opt { f(x) } else { default } | opt.map_or(default, f) | Short expressions on both sides |
if let Some(x) = opt { Some(g(x)) } else { None } | opt.map(g) | Always — this is what map is for |
if condition { Some(x) } else { None } | condition.then_some(x) | Always |
if condition { Some(compute()) } else { None } | condition.then(compute) | Always |
match opt { Some(x) if pred(x) => Some(x), _ => None } | opt.filter(pred) | Always |
for x in iter { if pred(x) { result.push(f(x)); } } | iter.filter(pred).map(f).collect() | When the pipeline is readable in one screen |
if a.is_some() && b.is_some() { Some((a?, b?)) } | a.zip(b) | Always — .zip() is exactly this |
match (a, b) { (Some(x), Some(y)) => x + y, _ => 0 } | a.zip(b).map(|(x,y)| x + y).unwrap_or(0) | Judgment call — depends on complexity |
iter.map(f).collect::<Vec<_>>()[0] | iter.map(f).next().unwrap() | Don’t allocate a Vec for one element |
let mut v = vec; v.sort(); v | { let mut v = vec; v.sort(); v } | Rust doesn’t have a .sorted() in std (use itertools) |
8.10 The Anti-Patterns
Over-functionalizing: the 5-deep chain nobody can read
#![allow(unused)]
fn main() {
// This is not elegant. This is a puzzle.
let result = data.iter()
.filter_map(|x| x.metadata.as_ref())
.flat_map(|m| m.tags.iter())
.filter(|t| t.starts_with("env:"))
.map(|t| t.strip_prefix("env:").unwrap())
.filter(|env| allowed_envs.contains(env))
.map(|env| env.to_uppercase())
.collect::<HashSet<_>>()
.into_iter()
.sorted()
.collect::<Vec<_>>();
}
When a chain exceeds ~4 adapters, break it up with named intermediate variables or extract a helper:
#![allow(unused)]
fn main() {
let env_tags = data.iter()
.filter_map(|x| x.metadata.as_ref())
.flat_map(|m| m.tags.iter());
let allowed: Vec<_> = env_tags
.filter_map(|t| t.strip_prefix("env:"))
.filter(|env| allowed_envs.contains(env))
.map(|env| env.to_uppercase())
.sorted()
.collect();
}
Under-functionalizing: the C-style loop that Rust has a word for
#![allow(unused)]
fn main() {
// This is just .any()
let mut found = false;
for item in &list {
if item.is_expired() {
found = true;
break;
}
}
// Write this instead
let found = list.iter().any(|item| item.is_expired());
}
#![allow(unused)]
fn main() {
// This is just .find()
let mut target = None;
for server in &fleet {
if server.id == target_id {
target = Some(server);
break;
}
}
// Write this instead
let target = fleet.iter().find(|s| s.id == target_id);
}
#![allow(unused)]
fn main() {
// This is just .all()
let mut all_healthy = true;
for server in &fleet {
if !server.is_healthy() {
all_healthy = false;
break;
}
}
// Write this instead
let all_healthy = fleet.iter().all(|s| s.is_healthy());
}
The standard library has these for a reason. Learn the vocabulary and the patterns become obvious.
Key Takeaways
- Option and Result are one-element collections. Their combinators (
.map(),.and_then(),.unwrap_or_else(),.filter(),.zip()) replace mostif let/matchboilerplate.- Use
bool::then_some()— it replacesif cond { Some(x) } else { None }in every case.- Iterator chains win for data pipelines — filter/map/collect with zero mutable state. They compile to the same machine code as loops.
- Loops win for multi-output state machines — when you’re building multiple collections, doing I/O in branches, or managing a state transition.
- The
?operator is the best of both worlds — functional error propagation with imperative readability.- Break chains at ~4 adapters — use named intermediates for readability. Over-functionalizing is as bad as under-functionalizing.
- Learn the standard-library vocabulary —
.any(),.all(),.find(),.position(),.sum(),.min_by_key()— each one replaces a multi-line loop with a single intent-revealing call.
See also: Ch 7 for closure mechanics and the
Fntrait hierarchy. Ch 10 for error combinator patterns. Ch 15 for fluent API design.
Exercise: Refactoring Imperative to Functional ★★ (~30 min)
Refactor the following function from imperative to functional style. Then identify one place where the functional version is worse and explain why.
#![allow(unused)]
fn main() {
fn summarize_fleet(fleet: &[Server]) -> FleetSummary {
let mut healthy = Vec::new();
let mut degraded = Vec::new();
let mut failed = Vec::new();
let mut total_power = 0.0;
let mut max_temp = f64::NEG_INFINITY;
for server in fleet {
match server.health_status() {
Health::Healthy => healthy.push(server.id.clone()),
Health::Degraded(reason) => degraded.push((server.id.clone(), reason)),
Health::Failed(err) => failed.push((server.id.clone(), err)),
}
total_power += server.power_draw();
if server.max_temperature() > max_temp {
max_temp = server.max_temperature();
}
}
FleetSummary {
healthy,
degraded,
failed,
avg_power: total_power / fleet.len() as f64,
max_temp,
}
}
}
🔑 Solution
The total_power and max_temp are clean functional rewrites:
#![allow(unused)]
fn main() {
fn summarize_fleet(fleet: &[Server]) -> FleetSummary {
let avg_power: f64 = fleet.iter().map(|s| s.power_draw()).sum::<f64>()
/ fleet.len() as f64;
let max_temp = fleet.iter()
.map(|s| s.max_temperature())
.fold(f64::NEG_INFINITY, f64::max);
// But the three-way partition is BETTER as a loop.
// Functional version would require three separate passes
// or an awkward fold with three mutable accumulators.
let mut healthy = Vec::new();
let mut degraded = Vec::new();
let mut failed = Vec::new();
for server in fleet {
match server.health_status() {
Health::Healthy => healthy.push(server.id.clone()),
Health::Degraded(reason) => degraded.push((server.id.clone(), reason)),
Health::Failed(err) => failed.push((server.id.clone(), err)),
}
}
FleetSummary { healthy, degraded, failed, avg_power, max_temp }
}
}
Why the loop is better for the three-way partition: A functional version would either require three .filter().collect() passes (3x iteration), or a .fold() with three mut Vec accumulators inside a tuple — which is just the loop rewritten with worse syntax. The imperative single-pass loop is clearer, more efficient, and easier to extend.
8. Smart Pointers and Interior Mutability 🟡
What you’ll learn:
- Box, Rc, Arc for heap allocation and shared ownership
- Weak references for breaking Rc/Arc reference cycles
- Cell, RefCell, and Cow for interior mutability patterns
- Pin for self-referential types and ManuallyDrop for lifecycle control
Box, Rc, Arc — Heap Allocation and Sharing
#![allow(unused)]
fn main() {
// --- Box<T>: Single owner, heap allocation ---
// Use when: recursive types, large values, trait objects
let boxed: Box<i32> = Box::new(42);
println!("{}", *boxed); // Deref to i32
// Recursive type requires Box (otherwise infinite size):
enum List<T> {
Cons(T, Box<List<T>>),
Nil,
}
// Trait object (dynamic dispatch):
let writer: Box<dyn std::io::Write> = Box::new(std::io::stdout());
// --- Rc<T>: Multiple owners, single-threaded ---
// Use when: shared ownership within one thread (no Send/Sync)
use std::rc::Rc;
let a = Rc::new(vec![1, 2, 3]);
let b = Rc::clone(&a); // Increments reference count (NOT deep clone)
let c = Rc::clone(&a);
println!("Ref count: {}", Rc::strong_count(&a)); // 3
// All three point to the same Vec. When the last Rc is dropped,
// the Vec is deallocated.
// --- Arc<T>: Multiple owners, thread-safe ---
// Use when: shared ownership across threads
use std::sync::Arc;
let shared = Arc::new(String::from("shared data"));
let handles: Vec<_> = (0..5).map(|_| {
let shared = Arc::clone(&shared);
std::thread::spawn(move || println!("{shared}"))
}).collect();
for h in handles { h.join().unwrap(); }
}
Weak References — Breaking Reference Cycles
Rc and Arc use reference counting, which cannot free cycles (A → B → A).
Weak<T> is a non-owning handle that does not increment the strong count:
#![allow(unused)]
fn main() {
use std::rc::{Rc, Weak};
use std::cell::RefCell;
struct Node {
value: i32,
parent: RefCell<Weak<Node>>, // does NOT keep parent alive
children: RefCell<Vec<Rc<Node>>>,
}
let parent = Rc::new(Node {
value: 0, parent: RefCell::new(Weak::new()), children: RefCell::new(vec![]),
});
let child = Rc::new(Node {
value: 1, parent: RefCell::new(Rc::downgrade(&parent)), children: RefCell::new(vec![]),
});
parent.children.borrow_mut().push(Rc::clone(&child));
// Access parent from child — returns Option<Rc<Node>>:
if let Some(p) = child.parent.borrow().upgrade() {
println!("Child's parent value: {}", p.value); // 0
}
// When `parent` is dropped, strong_count → 0, memory is freed.
// `child.parent.upgrade()` would then return `None`.
}
Rule of thumb: Use Rc/Arc for ownership edges, Weak for back-references
and caches. For thread-safe code, use Arc<T> with sync::Weak<T>.
Cell and RefCell — Interior Mutability
Sometimes you need to mutate data behind a shared (&) reference. Rust provides interior mutability with runtime borrow checking:
#![allow(unused)]
fn main() {
use std::cell::{Cell, RefCell};
// --- Cell<T>: Copy-based interior mutability ---
// Only for Copy types (or types you swap in/out)
struct Counter {
count: Cell<u32>,
}
impl Counter {
fn new() -> Self { Counter { count: Cell::new(0) } }
fn increment(&self) { // &self, not &mut self!
self.count.set(self.count.get() + 1);
}
fn value(&self) -> u32 { self.count.get() }
}
// --- RefCell<T>: Runtime borrow checking ---
// Panics if you violate borrow rules at runtime
struct Cache {
data: RefCell<Vec<String>>,
}
impl Cache {
fn new() -> Self { Cache { data: RefCell::new(Vec::new()) } }
fn add(&self, item: String) { // &self — looks immutable from outside
self.data.borrow_mut().push(item); // Runtime-checked &mut
}
fn get_all(&self) -> Vec<String> {
self.data.borrow().clone() // Runtime-checked &
}
fn bad_example(&self) {
let _guard1 = self.data.borrow();
// let _guard2 = self.data.borrow_mut();
// ❌ PANICS at runtime — can't have &mut while & exists
}
}
}
Cell vs RefCell:
Cellnever panics (it copies/swaps values) but only works withCopytypes or viaswap()/replace().RefCellworks with any type but panics on double-mutable-borrow. Neither isSync— for multithreaded use, seeMutex/RwLock.
Cow — Clone on Write
Cow (Clone on Write) holds either a borrowed or owned value. It clones only when mutation is needed:
use std::borrow::Cow;
// Avoids allocating when no modification is needed:
fn normalize(input: &str) -> Cow<'_, str> {
if input.contains('\t') {
// Only allocate if tabs need replacing
Cow::Owned(input.replace('\t', " "))
} else {
// No allocation — just return a reference
Cow::Borrowed(input)
}
}
fn main() {
let clean = "no tabs here";
let dirty = "tabs\there";
let r1 = normalize(clean); // Cow::Borrowed — zero allocation
let r2 = normalize(dirty); // Cow::Owned — allocated new String
println!("{r1}");
println!("{r2}");
}
// Also useful for function parameters that MIGHT need ownership:
fn process(data: Cow<'_, [u8]>) {
// Can read data without copying
println!("Length: {}", data.len());
// If we need to mutate, Cow auto-clones:
let mut owned = data.into_owned(); // Clone only if Borrowed
owned.push(0xFF);
}
Cow<'_, [u8]> for Binary Data
Cow is especially useful for byte-oriented APIs where data may or may not
need transformation (checksum insertion, padding, escaping). This avoids
allocating a Vec<u8> on the common fast path:
#![allow(unused)]
fn main() {
use std::borrow::Cow;
/// Pads a frame to a minimum length, borrowing when no padding is needed.
fn pad_frame(frame: &[u8], min_len: usize) -> Cow<'_, [u8]> {
if frame.len() >= min_len {
Cow::Borrowed(frame) // Already long enough — zero allocation
} else {
let mut padded = frame.to_vec();
padded.resize(min_len, 0x00);
Cow::Owned(padded) // Allocate only when padding is required
}
}
let short = pad_frame(&[0xDE, 0xAD], 8); // Owned — padded to 8 bytes
let long = pad_frame(&[0; 64], 8); // Borrowed — already ≥ 8
}
Tip: Combine
Cow<[u8]>withbytes::Bytes(Ch10) when you need reference-counted sharing of potentially-transformed buffers.
When to Use Which Pointer
| Pointer | Owner Count | Thread-Safe | Mutability | Use When |
|---|---|---|---|---|
Box<T> | 1 | ✅ (if T: Send) | Via &mut | Heap allocation, trait objects, recursive types |
Rc<T> | N | ❌ | None (wrap in Cell/RefCell) | Shared ownership, single thread, graphs/trees |
Arc<T> | N | ✅ | None (wrap in Mutex/RwLock) | Shared ownership across threads |
Cell<T> | — | ❌ | .get() / .set() | Interior mutability for Copy types |
RefCell<T> | — | ❌ | .borrow() / .borrow_mut() | Interior mutability for any type, single thread |
Cow<'_, T> | 0 or 1 | ✅ (if T: Send) | Clone on write | Avoid allocation when data is often unchanged |
Pin and Self-Referential Types
Pin<P> prevents a value from being moved in memory. This is essential for
self-referential types — structs that contain a pointer to their own data —
and for Futures, which may hold references across .await points.
use std::pin::Pin;
use std::marker::PhantomPinned;
// A self-referential struct (simplified):
struct SelfRef {
data: String,
ptr: *const String, // Points to `data` above
_pin: PhantomPinned, // Opts out of Unpin — can't be moved
}
impl SelfRef {
fn new(s: &str) -> Pin<Box<Self>> {
let val = SelfRef {
data: s.to_string(),
ptr: std::ptr::null(),
_pin: PhantomPinned,
};
let mut boxed = Box::pin(val);
// SAFETY: we don't move the data after setting the pointer
let self_ptr: *const String = &boxed.data;
unsafe {
let mut_ref = Pin::as_mut(&mut boxed);
Pin::get_unchecked_mut(mut_ref).ptr = self_ptr;
}
boxed
}
fn data(&self) -> &str {
&self.data
}
fn ptr_data(&self) -> &str {
// SAFETY: ptr was set to point to self.data while pinned
unsafe { &*self.ptr }
}
}
fn main() {
let pinned = SelfRef::new("hello");
assert_eq!(pinned.data(), pinned.ptr_data()); // Both "hello"
// std::mem::swap would invalidate ptr — but Pin prevents it
}
Key concepts:
| Concept | Meaning |
|---|---|
Unpin (auto-trait) | “Moving this type is safe.” Most types are Unpin by default. |
!Unpin / PhantomPinned | “I have internal pointers — don’t move me.” |
Pin<&mut T> | A mutable reference that guarantees T won’t move |
Pin<Box<T>> | An owned, heap-pinned value |
Why this matters for async: Every async fn desugars to a Future that may
hold references across .await points — making it self-referential. The async
runtime uses Pin<&mut Future> to guarantee the future isn’t moved once polled.
#![allow(unused)]
fn main() {
// When you write:
async fn fetch(url: &str) -> String {
let response = http_get(url).await; // reference held across await
response.text().await
}
// The compiler generates a state machine struct that is !Unpin,
// and the runtime pins it before calling Future::poll().
}
When to care about Pin: (1) Implementing
Futuremanually, (2) writing async runtimes or combinators, (3) any struct with self-referential pointers. For normal application code,async/awaithandles pinning transparently. See the companion Async Rust Training for deeper coverage.Crate alternatives: For self-referential structs without manual
Pin, considerouroborosorself_cell— they generate safe wrappers with correct pinning and drop semantics.
Pin Projections — Structural Pinning
When you have a Pin<&mut MyStruct>, you often need to access individual fields.
Pin projection is the pattern for safely going from Pin<&mut Struct> to
Pin<&mut Field> (for pinned fields) or &mut Field (for unpinned fields).
The Problem: Field Access on Pinned Types
#![allow(unused)]
fn main() {
use std::pin::Pin;
use std::marker::PhantomPinned;
struct MyFuture {
data: String, // Regular field — safe to move
state: InternalState, // Self-referential — must stay pinned
_pin: PhantomPinned,
}
enum InternalState {
Waiting { ptr: *const String }, // Points to `data` — self-referential
Done,
}
// Given `Pin<&mut MyFuture>`, how do you access `data` and `state`?
// You CAN'T just do `pinned.data` — the compiler won't let you
// get a &mut to a field of a pinned value without unsafe.
}
Manual Pin Projection (unsafe)
#![allow(unused)]
fn main() {
impl MyFuture {
// Project to `data` — this field is structurally unpinned (safe to move)
fn data(self: Pin<&mut Self>) -> &mut String {
// SAFETY: `data` is not structurally pinned. Moving `data` alone
// doesn't move the whole struct, so Pin's guarantee is preserved.
unsafe { &mut self.get_unchecked_mut().data }
}
// Project to `state` — this field IS structurally pinned
fn state(self: Pin<&mut Self>) -> Pin<&mut InternalState> {
// SAFETY: `state` is structurally pinned — we maintain the
// pin invariant by returning Pin<&mut InternalState>.
unsafe { Pin::new_unchecked(&mut self.get_unchecked_mut().state) }
}
}
}
Structural pinning rules — a field is “structurally pinned” if:
- Moving/swapping that field alone could invalidate a self-reference
- The struct’s
Dropimpl must not move the field - The struct must be
!Unpin(enforced byPhantomPinnedor a!Unpinfield)
pin-project — Safe Pin Projections (Zero Unsafe)
The pin-project crate generates provably correct projections at compile time,
eliminating the need for manual unsafe:
#![allow(unused)]
fn main() {
use pin_project::pin_project;
use std::pin::Pin;
use std::future::Future;
use std::task::{Context, Poll};
#[pin_project] // <-- Generates projection methods
struct TimedFuture<F: Future> {
#[pin] // <-- Structurally pinned (it's a Future)
inner: F,
started_at: std::time::Instant, // NOT pinned — plain data
}
impl<F: Future> Future for TimedFuture<F> {
type Output = (F::Output, std::time::Duration);
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let this = self.project(); // Safe! Generated by pin_project
// this.inner : Pin<&mut F> — pinned field
// this.started_at : &mut std::time::Instant — unpinned field
match this.inner.poll(cx) {
Poll::Ready(output) => {
let elapsed = this.started_at.elapsed();
Poll::Ready((output, elapsed))
}
Poll::Pending => Poll::Pending,
}
}
}
}
pin-project vs Manual Projection
| Aspect | Manual (unsafe) | pin-project |
|---|---|---|
| Safety | You prove invariants | Compiler-verified |
| Boilerplate | Low (but error-prone) | Zero — derive macro |
Drop interaction | Must not move pinned fields | Enforced: #[pinned_drop] |
| Compile-time cost | None | Proc-macro expansion |
| Use case | Primitives, no_std | Application / library code |
#[pinned_drop] — Drop for Pinned Types
When a type has #[pin] fields, pin-project requires #[pinned_drop]
instead of a regular Drop impl to prevent accidentally moving pinned fields:
#![allow(unused)]
fn main() {
use pin_project::{pin_project, pinned_drop};
use std::pin::Pin;
#[pin_project(PinnedDrop)]
struct Connection<F> {
#[pin]
future: F,
buffer: Vec<u8>, // Not pinned — can be moved in drop
}
#[pinned_drop]
impl<F> PinnedDrop for Connection<F> {
fn drop(self: Pin<&mut Self>) {
let this = self.project();
// `this.future` is Pin<&mut F> — can't be moved, only dropped in place
// `this.buffer` is &mut Vec<u8> — can be drained, cleared, etc.
this.buffer.clear();
println!("Connection dropped, buffer cleared");
}
}
}
When Pin Projections Matter in Practice
Note: The diagram below uses Mermaid syntax. It renders on GitHub and in tools that support Mermaid (mdBook with
mermaidplugin, VS Code with Mermaid extension). In plain Markdown viewers, you’ll see the raw source.
graph TD
A["Do you implement Future manually?"] -->|Yes| B["Does the future hold references<br/>across .await points?"]
A -->|No| C["async/await handles Pin for you<br/>✅ No projections needed"]
B -->|Yes| D["Use #[pin_project] on your<br/>future struct"]
B -->|No| E["Your future is Unpin<br/>✅ No projections needed"]
D --> F["Mark futures/streams as #[pin]<br/>Leave data fields unpinned"]
style C fill:#91e5a3,color:#000
style E fill:#91e5a3,color:#000
style D fill:#ffa07a,color:#000
style F fill:#ffa07a,color:#000
Rule of thumb: If you’re wrapping another
FutureorStream, usepin-project. If you’re writing application code withasync/await, you’ll never need pin projections directly. See the companion Async Rust Training for async combinator patterns that use pin projections.
Drop Ordering and ManuallyDrop
Rust’s drop order is deterministic but has rules worth knowing:
Drop Order Rules
struct Label(&'static str);
impl Drop for Label {
fn drop(&mut self) { println!("Dropping {}", self.0); }
}
fn main() {
let a = Label("first"); // Declared first
let b = Label("second"); // Declared second
let c = Label("third"); // Declared third
}
// Output:
// Dropping third ← locals drop in REVERSE declaration order
// Dropping second
// Dropping first
The three rules:
| What | Drop Order | Rationale |
|---|---|---|
| Local variables | Reverse declaration order | Later variables might reference earlier ones |
| Struct fields | Declaration order (top to bottom) | Matches construction order (stable since Rust 1.0, guaranteed by RFC 1857) |
| Tuple elements | Declaration order (left to right) | (a, b, c) → drop a, then b, then c |
#![allow(unused)]
fn main() {
struct Server {
listener: Label, // Dropped 1st
handler: Label, // Dropped 2nd
logger: Label, // Dropped 3rd
}
// Fields drop top-to-bottom (declaration order).
// This matters when fields reference each other or hold resources.
}
Practical impact: If your struct has a
JoinHandleand aSender, field order determines which drops first. If the thread reads from the channel, drop theSenderfirst (close the channel) so the thread exits, then join the handle. PutSenderaboveJoinHandlein the struct.
ManuallyDrop<T> — Suppressing Automatic Drop
ManuallyDrop<T> wraps a value and prevents its destructor from running
automatically. You take responsibility for dropping it (or intentionally
leaking it):
#![allow(unused)]
fn main() {
use std::mem::ManuallyDrop;
// Use case 1: Prevent double-free in unsafe code
struct TwoPhaseBuffer {
// We need to drop the Vec ourselves to control timing
data: ManuallyDrop<Vec<u8>>,
committed: bool,
}
impl TwoPhaseBuffer {
fn new(capacity: usize) -> Self {
TwoPhaseBuffer {
data: ManuallyDrop::new(Vec::with_capacity(capacity)),
committed: false,
}
}
fn write(&mut self, bytes: &[u8]) {
self.data.extend_from_slice(bytes);
}
fn commit(&mut self) {
self.committed = true;
println!("Committed {} bytes", self.data.len());
}
}
impl Drop for TwoPhaseBuffer {
fn drop(&mut self) {
if !self.committed {
println!("Rolling back — dropping uncommitted data");
}
// SAFETY: data is always valid here; we only drop it once.
unsafe { ManuallyDrop::drop(&mut self.data); }
}
}
}
#![allow(unused)]
fn main() {
// Use case 2: Intentional leak (e.g., global singletons)
fn leaked_string() -> &'static str {
// Box::leak() is the idiomatic way to create a &'static reference:
let s = String::from("lives forever");
Box::leak(s.into_boxed_str())
// ⚠️ This is a controlled memory leak. The String's heap allocation
// is never freed. Only use for long-lived singletons.
}
// ManuallyDrop alternative (requires unsafe):
// ⚠️ Prefer Box::leak() above — this is shown only to illustrate
// ManuallyDrop semantics (suppressing Drop while the heap data survives).
fn leaked_string_manual() -> &'static str {
use std::mem::ManuallyDrop;
let md = ManuallyDrop::new(String::from("lives forever"));
// SAFETY: ManuallyDrop prevents deallocation; the heap data lives
// forever, so a 'static reference is valid.
unsafe { &*(md.as_str() as *const str) }
}
}
#![allow(unused)]
fn main() {
// Use case 3: Union fields (only one variant is valid at a time)
use std::mem::ManuallyDrop;
union IntOrString {
i: u64,
s: ManuallyDrop<String>,
// String has a Drop impl, so it MUST be wrapped in ManuallyDrop
// inside a union — the compiler can't know which field is active.
}
// No automatic Drop — the code that constructs IntOrString must also
// handle cleanup. If the String variant is active, call:
// unsafe { ManuallyDrop::drop(&mut value.s); }
// without a Drop impl, the union is simply leaked (no UB, just a leak).
}
ManuallyDrop vs mem::forget:
ManuallyDrop<T> | mem::forget(value) | |
|---|---|---|
| When | Wrap at construction | Consume later |
| Access inner | &*md / &mut *md | Value is gone |
| Drop later | ManuallyDrop::drop(&mut md) | Not possible |
| Use case | Fine-grained lifecycle control | Fire-and-forget leak |
Rule: Use
ManuallyDropin unsafe abstractions where you need to control exactly when a destructor runs. In safe application code, you almost never need it — Rust’s automatic drop ordering handles things correctly.
Key Takeaways — Smart Pointers
Boxfor single ownership on heap;Rc/Arcfor shared ownership (single-/multi-threaded)Cell/RefCellprovide interior mutability;RefCellpanics on violations at runtimeCowavoids allocation on the common path;Pinprevents moves for self-referential types- Drop order: fields drop in declaration order (RFC 1857); locals drop in reverse declaration order
See also: Ch 6 — Concurrency for Arc + Mutex patterns. Ch 4 — PhantomData for PhantomData used with smart pointers.
graph TD
Box["Box<T><br>Single owner, heap"] --> Heap["Heap allocation"]
Rc["Rc<T><br>Shared, single-thread"] --> Heap
Arc["Arc<T><br>Shared, multi-thread"] --> Heap
Rc --> Weak1["Weak<T><br>Non-owning"]
Arc --> Weak2["Weak<T><br>Non-owning"]
Cell["Cell<T><br>Copy interior mut"] --> Stack["Stack / interior"]
RefCell["RefCell<T><br>Runtime borrow check"] --> Stack
Cow["Cow<T><br>Clone on write"] --> Stack
style Box fill:#d4efdf,stroke:#27ae60,color:#000
style Rc fill:#e8f4f8,stroke:#2980b9,color:#000
style Arc fill:#e8f4f8,stroke:#2980b9,color:#000
style Weak1 fill:#fef9e7,stroke:#f1c40f,color:#000
style Weak2 fill:#fef9e7,stroke:#f1c40f,color:#000
style Cell fill:#fdebd0,stroke:#e67e22,color:#000
style RefCell fill:#fdebd0,stroke:#e67e22,color:#000
style Cow fill:#fdebd0,stroke:#e67e22,color:#000
style Heap fill:#f5f5f5,stroke:#999,color:#000
style Stack fill:#f5f5f5,stroke:#999,color:#000
Exercise: Reference-Counted Graph ★★ (~30 min)
Build a directed graph using Rc<RefCell<Node>> where each node has a name and a list of children. Create a cycle (A → B → C → A) using Weak to break the back-edge. Verify no memory leak with Rc::strong_count.
🔑 Solution
use std::cell::RefCell;
use std::rc::{Rc, Weak};
struct Node {
name: String,
children: Vec<Rc<RefCell<Node>>>,
back_ref: Option<Weak<RefCell<Node>>>,
}
impl Node {
fn new(name: &str) -> Rc<RefCell<Self>> {
Rc::new(RefCell::new(Node {
name: name.to_string(),
children: Vec::new(),
back_ref: None,
}))
}
}
impl Drop for Node {
fn drop(&mut self) {
println!("Dropping {}", self.name);
}
}
fn main() {
let a = Node::new("A");
let b = Node::new("B");
let c = Node::new("C");
// A → B → C, with C back-referencing A via Weak
a.borrow_mut().children.push(Rc::clone(&b));
b.borrow_mut().children.push(Rc::clone(&c));
c.borrow_mut().back_ref = Some(Rc::downgrade(&a)); // Weak ref!
println!("A strong count: {}", Rc::strong_count(&a)); // 1 (only `a` binding)
println!("B strong count: {}", Rc::strong_count(&b)); // 2 (b + A's child)
println!("C strong count: {}", Rc::strong_count(&c)); // 2 (c + B's child)
// Upgrade the weak ref to prove it works:
let c_ref = c.borrow();
if let Some(back) = &c_ref.back_ref {
if let Some(a_ref) = back.upgrade() {
println!("C points back to: {}", a_ref.borrow().name);
}
}
// When a, b, c go out of scope, all Nodes drop (no cycle leak!)
}
9. Error Handling Patterns 🟢
What you’ll learn:
- When to use
thiserror(libraries) vsanyhow(applications)- Error conversion chains with
#[from]and.context()wrappers- How the
?operator desugars and works inmain()- When to panic vs return errors, and
catch_unwindfor FFI boundaries
thiserror vs anyhow — Library vs Application
Rust error handling centers on the Result<T, E> type. Two crates dominate:
// --- thiserror: For LIBRARIES ---
// Generates Display, Error, and From impls via derive macros
use thiserror::Error;
#[derive(Error, Debug)]
pub enum DatabaseError {
#[error("connection failed: {0}")]
ConnectionFailed(String),
#[error("query error: {source}")]
QueryError {
#[source]
source: sqlx::Error,
},
#[error("record not found: table={table} id={id}")]
NotFound { table: String, id: u64 },
#[error(transparent)] // Delegate Display to the inner error
Io(#[from] std::io::Error), // Auto-generates From<io::Error>
}
// --- anyhow: For APPLICATIONS ---
// Dynamic error type — great for top-level code where you just want errors to propagate
use anyhow::{Context, Result, bail, ensure};
fn read_config(path: &str) -> Result<Config> {
let content = std::fs::read_to_string(path)
.with_context(|| format!("failed to read config from {path}"))?;
let config: Config = serde_json::from_str(&content)
.context("failed to parse config JSON")?;
ensure!(config.port > 0, "port must be positive, got {}", config.port);
Ok(config)
}
fn main() -> Result<()> {
let config = read_config("server.toml")?;
if config.name.is_empty() {
bail!("server name cannot be empty"); // Return Err immediately
}
Ok(())
}
When to use which:
thiserror | anyhow | |
|---|---|---|
| Use in | Libraries, shared crates | Applications, binaries |
| Error types | Concrete enums — callers can match | anyhow::Error — opaque |
| Effort | Define your error enum | Just use Result<T> |
| Downcasting | Not needed — pattern match | error.downcast_ref::<MyError>() |
Error Conversion Chains (#[from])
use thiserror::Error;
#[derive(Error, Debug)]
enum AppError {
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),
#[error("JSON error: {0}")]
Json(#[from] serde_json::Error),
#[error("HTTP error: {0}")]
Http(#[from] reqwest::Error),
}
// Now ? automatically converts:
fn fetch_and_parse(url: &str) -> Result<Config, AppError> {
let body = reqwest::blocking::get(url)?.text()?; // reqwest::Error → AppError::Http
let config: Config = serde_json::from_str(&body)?; // serde_json::Error → AppError::Json
Ok(config)
}
Context and Error Wrapping
Add human-readable context to errors without losing the original:
use anyhow::{Context, Result};
fn process_file(path: &str) -> Result<Data> {
let content = std::fs::read_to_string(path)
.with_context(|| format!("failed to read {path}"))?;
let data = parse_content(&content)
.with_context(|| format!("failed to parse {path}"))?;
validate(&data)
.context("validation failed")?;
Ok(data)
}
// Error output:
// Error: validation failed
//
// Caused by:
// 0: failed to parse config.json
// 1: expected ',' at line 5 column 12
The ? Operator in Depth
? is syntactic sugar for a match + From conversion + early return:
#![allow(unused)]
fn main() {
// This:
let value = operation()?;
// Desugars to:
let value = match operation() {
Ok(v) => v,
Err(e) => return Err(From::from(e)),
// ^^^^^^^^^^^^^^
// Automatic conversion via From trait
};
}
? also works with Option (in functions returning Option):
#![allow(unused)]
fn main() {
fn find_user_email(users: &[User], name: &str) -> Option<String> {
let user = users.iter().find(|u| u.name == name)?; // Returns None if not found
let email = user.email.as_ref()?; // Returns None if email is None
Some(email.to_uppercase())
}
}
Panics, catch_unwind, and When to Abort
#![allow(unused)]
fn main() {
// Panics: for BUGS, not expected errors
fn get_element(data: &[i32], index: usize) -> &i32 {
// If this panics, it's a programming error (bug).
// Don't "handle" it — fix the caller.
&data[index]
}
// catch_unwind: for boundaries (FFI, thread pools)
use std::panic;
let result = panic::catch_unwind(|| {
// Run potentially panicking code safely
risky_operation()
});
match result {
Ok(value) => println!("Success: {value:?}"),
Err(_) => eprintln!("Operation panicked — continuing safely"),
}
// When to use which:
// - Result<T, E> → expected failures (file not found, network timeout)
// - panic!() → programming bugs (index out of bounds, invariant violated)
// - process::abort() → unrecoverable state (security violation, corrupt data)
}
C++ comparison:
Result<T, E>replaces exceptions for expected errors.panic!()is likeassert()orstd::terminate()— it’s for bugs, not control flow. Rust’s?operator makes error propagation as ergonomic as exceptions without the unpredictable control flow.
Key Takeaways — Error Handling
- Libraries:
thiserrorfor structured error enums; applications:anyhowfor ergonomic propagation#[from]auto-generatesFromimpls;.context()adds human-readable wrappers?desugars toFrom::from()+ early return; works inmain()returningResult
See also: Ch 14 — API Design for “parse, don’t validate” patterns. Ch 10 — Serialization for serde error handling.
flowchart LR
A["std::io::Error"] -->|"#[from]"| B["AppError::Io"]
C["serde_json::Error"] -->|"#[from]"| D["AppError::Json"]
E["Custom validation"] -->|"manual"| F["AppError::Validation"]
B --> G["? operator"]
D --> G
F --> G
G --> H["Result<T, AppError>"]
style A fill:#e8f4f8,stroke:#2980b9,color:#000
style C fill:#e8f4f8,stroke:#2980b9,color:#000
style E fill:#e8f4f8,stroke:#2980b9,color:#000
style B fill:#fdebd0,stroke:#e67e22,color:#000
style D fill:#fdebd0,stroke:#e67e22,color:#000
style F fill:#fdebd0,stroke:#e67e22,color:#000
style G fill:#fef9e7,stroke:#f1c40f,color:#000
style H fill:#d4efdf,stroke:#27ae60,color:#000
Exercise: Error Hierarchy with thiserror ★★ (~30 min)
Design an error type hierarchy for a file-processing application that can fail during I/O, parsing (JSON and CSV), and validation. Use thiserror and demonstrate ? propagation.
🔑 Solution
use thiserror::Error;
#[derive(Error, Debug)]
pub enum AppError {
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),
#[error("JSON parse error: {0}")]
Json(#[from] serde_json::Error),
#[error("CSV error at line {line}: {message}")]
Csv { line: usize, message: String },
#[error("validation error: {field} — {reason}")]
Validation { field: String, reason: String },
}
fn read_file(path: &str) -> Result<String, AppError> {
Ok(std::fs::read_to_string(path)?) // io::Error → AppError::Io via #[from]
}
fn parse_json(content: &str) -> Result<serde_json::Value, AppError> {
Ok(serde_json::from_str(content)?) // serde_json::Error → AppError::Json
}
fn validate_name(value: &serde_json::Value) -> Result<String, AppError> {
let name = value.get("name")
.and_then(|v| v.as_str())
.ok_or_else(|| AppError::Validation {
field: "name".into(),
reason: "must be a non-null string".into(),
})?;
if name.is_empty() {
return Err(AppError::Validation {
field: "name".into(),
reason: "must not be empty".into(),
});
}
Ok(name.to_string())
}
fn process_file(path: &str) -> Result<String, AppError> {
let content = read_file(path)?;
let json = parse_json(&content)?;
let name = validate_name(&json)?;
Ok(name)
}
fn main() {
match process_file("config.json") {
Ok(name) => println!("Name: {name}"),
Err(e) => eprintln!("Error: {e}"),
}
}
10. Serialization, Zero-Copy, and Binary Data 🟡
What you’ll learn:
- serde fundamentals: derive macros, attributes, and enum representations
- Zero-copy deserialization for high-performance read-heavy workloads
- The serde format ecosystem (JSON, TOML, bincode, MessagePack)
- Binary data handling with
repr(C), zerocopy, andbytes::Bytes
serde Fundamentals
serde (SERialize/DEserialize) is the universal serialization framework for Rust.
It separates data model (your structs) from format (JSON, TOML, binary):
use serde::{Serialize, Deserialize};
#[derive(Debug, Serialize, Deserialize)]
struct ServerConfig {
name: String,
port: u16,
#[serde(default)] // Use Default::default() if missing
max_connections: usize,
#[serde(skip_serializing_if = "Option::is_none")]
tls_cert_path: Option<String>,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Deserialize from JSON:
let json_input = r#"{
"name": "hw-diag",
"port": 8080
}"#;
let config: ServerConfig = serde_json::from_str(json_input)?;
println!("{config:?}");
// ServerConfig { name: "hw-diag", port: 8080, max_connections: 0, tls_cert_path: None }
// Serialize to JSON:
let output = serde_json::to_string_pretty(&config)?;
println!("{output}");
// Same struct, different format — no code changes:
let toml_input = r#"
name = "hw-diag"
port = 8080
"#;
let config: ServerConfig = toml::from_str(toml_input)?;
println!("{config:?}");
Ok(())
}
Key insight: Your struct derives
SerializeandDeserializeonce. Then it works with every serde-compatible format — JSON, TOML, YAML, bincode, MessagePack, CBOR, postcard, and dozens more.
Common serde Attributes
serde provides fine-grained control over serialization through field and container attributes:
use serde::{Serialize, Deserialize};
// --- Container attributes (on the struct/enum) ---
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "camelCase")] // JSON convention: field_name → fieldName
#[serde(deny_unknown_fields)] // Reject extra keys — strict parsing
struct DiagResult {
test_name: String, // Serialized as "testName"
pass_count: u32, // Serialized as "passCount"
fail_count: u32, // Serialized as "failCount"
}
// --- Field attributes ---
#[derive(Serialize, Deserialize)]
struct Sensor {
#[serde(rename = "sensor_id")] // Override field name for serialization
id: u64,
#[serde(default)] // Use Default if missing from input
enabled: bool,
#[serde(default = "default_threshold")]
threshold: f64,
#[serde(skip)] // Never serialize or deserialize
cached_value: Option<f64>,
#[serde(skip_serializing_if = "Vec::is_empty")]
tags: Vec<String>,
#[serde(flatten)] // Inline nested struct fields
metadata: Metadata,
#[serde(with = "hex_bytes")] // Custom ser/de module
raw_data: Vec<u8>,
}
fn default_threshold() -> f64 { 1.0 }
#[derive(Serialize, Deserialize)]
struct Metadata {
vendor: String,
model: String,
}
// With #[serde(flatten)], the JSON looks like:
// { "sensor_id": 1, "vendor": "Intel", "model": "X200", ... }
// NOT: { "sensor_id": 1, "metadata": { "vendor": "Intel", ... } }
Most-used attributes cheat sheet:
| Attribute | Level | Effect |
|---|---|---|
rename_all = "camelCase" | Container | Rename all fields to camelCase/snake_case/SCREAMING_SNAKE_CASE |
deny_unknown_fields | Container | Error on unexpected keys (strict mode) |
default | Field | Use Default::default() when field missing |
rename = "..." | Field | Custom serialized name |
skip | Field | Exclude from ser/de entirely |
skip_serializing_if = "fn" | Field | Conditionally exclude (e.g., Option::is_none) |
flatten | Field | Inline a nested struct’s fields |
with = "module" | Field | Use custom serialize/deserialize functions |
alias = "..." | Field | Accept alternative names during deserialization |
deserialize_with = "fn" | Field | Custom deserialize function only |
untagged | Enum | Try each variant in order (no discriminant in output) |
Enum Representations
serde provides four representations for enums in formats like JSON:
use serde::{Serialize, Deserialize};
// 1. Externally tagged (DEFAULT):
#[derive(Serialize, Deserialize)]
enum Command {
Reboot,
RunDiag { test_name: String, timeout_secs: u64 },
SetFanSpeed(u8),
}
// "Reboot" → Command::Reboot
// {"RunDiag": {"test_name": "gpu", "timeout_secs": 60}} → Command::RunDiag { ... }
// 2. Internally tagged — #[serde(tag = "type")]:
#[derive(Serialize, Deserialize)]
#[serde(tag = "type")]
enum Event {
Start { timestamp: u64 },
Error { code: i32, message: String },
End { timestamp: u64, success: bool },
}
// {"type": "Start", "timestamp": 1706000000}
// {"type": "Error", "code": 42, "message": "timeout"}
// 3. Adjacently tagged — #[serde(tag = "t", content = "c")]:
#[derive(Serialize, Deserialize)]
#[serde(tag = "t", content = "c")]
enum Payload {
Text(String),
Binary(Vec<u8>),
}
// {"t": "Text", "c": "hello"}
// {"t": "Binary", "c": [0, 1, 2]}
// 4. Untagged — #[serde(untagged)]:
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
enum StringOrNumber {
Str(String),
Num(f64),
}
// "hello" → StringOrNumber::Str("hello")
// 42.0 → StringOrNumber::Num(42.0)
// ⚠️ Tried IN ORDER — first matching variant wins
Which representation to choose: Use internally tagged (
tag = "type") for most JSON APIs — it’s the most readable and matches conventions in Go, Python, and TypeScript. Use untagged only for “union” types where the shape alone disambiguates.
Zero-Copy Deserialization
serde can deserialize without allocating new strings — borrowing directly from the input buffer. This is the key to high-performance parsing:
use serde::Deserialize;
// --- Owned (allocating) ---
// Each String field copies bytes from the input into new heap allocations.
#[derive(Deserialize)]
struct OwnedRecord {
name: String, // Allocates a new String
value: String, // Allocates another String
}
// --- Zero-copy (borrowing) ---
// &'de str fields borrow directly from the input — ZERO allocation.
#[derive(Deserialize)]
struct BorrowedRecord<'a> {
name: &'a str, // Points into the input buffer
value: &'a str, // Points into the input buffer
}
fn main() {
let input = r#"{"name": "cpu_temp", "value": "72.5"}"#;
// Owned: allocates two String objects
let owned: OwnedRecord = serde_json::from_str(input).unwrap();
// Zero-copy: `name` and `value` point into `input` — no allocation
let borrowed: BorrowedRecord = serde_json::from_str(input).unwrap();
// The output is lifetime-bound: borrowed can't outlive input
println!("{}: {}", borrowed.name, borrowed.value);
}
Understanding the lifetime:
// Deserialize<'de> — the struct can borrow from data with lifetime 'de:
// struct BorrowedRecord<'a> where 'a == 'de
// Only works when the input buffer lives long enough
// DeserializeOwned — the struct owns all its data, no borrowing:
// trait DeserializeOwned: for<'de> Deserialize<'de> {}
// Works with any input lifetime (the struct is independent)
use serde::de::DeserializeOwned;
// This function requires owned types — input can be temporary
fn parse_owned<T: DeserializeOwned>(input: &str) -> T {
serde_json::from_str(input).unwrap()
}
// This function allows borrowing — more efficient but restricts lifetimes
fn parse_borrowed<'a, T: Deserialize<'a>>(input: &'a str) -> T {
serde_json::from_str(input).unwrap()
}
When to use zero-copy:
- Parsing large files where you only need a few fields
- High-throughput pipelines (network packets, log lines)
- When the input buffer already lives long enough (e.g., memory-mapped file)
When NOT to use zero-copy:
- Input is ephemeral (network read buffer that’s reused)
- You need to store the result beyond the input’s lifetime
- Fields need transformation (escapes, normalization)
Practical tip:
Cow<'a, str>gives you the best of both — borrow when possible, allocate when necessary (e.g., when JSON escape sequences need unescaping). serde supports Cow natively.
The Format Ecosystem
| Format | Crate | Human-Readable | Size | Speed | Use Case |
|---|---|---|---|---|---|
| JSON | serde_json | ✅ | Large | Good | Config files, REST APIs, logging |
| TOML | toml | ✅ | Medium | Good | Config files (Cargo.toml style) |
| YAML | serde_yaml | ✅ | Medium | Good | Config files (complex nesting) |
| bincode | bincode | ❌ | Small | Fast | IPC, caches, Rust-to-Rust |
| postcard | postcard | ❌ | Tiny | Very fast | Embedded systems, no_std |
| MessagePack | rmp-serde | ❌ | Small | Fast | Cross-language binary protocol |
| CBOR | ciborium | ❌ | Small | Fast | IoT, constrained environments |
#![allow(unused)]
fn main() {
// Same struct, many formats — serde's power:
#[derive(serde::Serialize, serde::Deserialize, Debug)]
struct DiagConfig {
name: String,
tests: Vec<String>,
timeout_secs: u64,
}
let config = DiagConfig {
name: "accel_diag".into(),
tests: vec!["memory".into(), "compute".into()],
timeout_secs: 300,
};
// JSON: {"name":"accel_diag","tests":["memory","compute"],"timeout_secs":300}
let json = serde_json::to_string(&config).unwrap(); // 67 bytes
// bincode: compact binary — ~40 bytes, no field names
let bin = bincode::serialize(&config).unwrap(); // Much smaller
// postcard: even smaller, varint encoding — great for embedded
// let post = postcard::to_allocvec(&config).unwrap();
}
Choose your format:
- Config files humans edit → TOML or JSON
- Rust-to-Rust IPC/caching → bincode (fast, compact, not cross-language)
- Cross-language binary → MessagePack or CBOR
- Embedded /
no_std→ postcard
Binary Data and repr(C)
For hardware diagnostics, parsing binary protocol data is common. Rust provides tools for safe, zero-copy binary data handling:
#![allow(unused)]
fn main() {
// --- #[repr(C)]: Predictable memory layout ---
// Ensures fields are laid out in declaration order with C padding rules.
// Essential for matching hardware register layouts and protocol headers.
#[repr(C)]
#[derive(Debug, Clone, Copy)]
struct IpmiHeader {
rs_addr: u8,
net_fn_lun: u8,
checksum: u8,
rq_addr: u8,
rq_seq_lun: u8,
cmd: u8,
}
// --- Safe binary parsing with manual deserialization ---
impl IpmiHeader {
fn from_bytes(data: &[u8]) -> Option<Self> {
if data.len() < std::mem::size_of::<Self>() {
return None;
}
Some(IpmiHeader {
rs_addr: data[0],
net_fn_lun: data[1],
checksum: data[2],
rq_addr: data[3],
rq_seq_lun: data[4],
cmd: data[5],
})
}
fn net_fn(&self) -> u8 { self.net_fn_lun >> 2 }
fn lun(&self) -> u8 { self.net_fn_lun & 0x03 }
}
// --- Endianness-aware parsing ---
fn read_u16_le(data: &[u8], offset: usize) -> u16 {
u16::from_le_bytes([data[offset], data[offset + 1]])
}
fn read_u32_be(data: &[u8], offset: usize) -> u32 {
u32::from_be_bytes([
data[offset], data[offset + 1],
data[offset + 2], data[offset + 3],
])
}
// --- #[repr(C, packed)]: Remove padding (alignment = 1) ---
#[repr(C, packed)]
#[derive(Debug, Clone, Copy)]
struct PcieCapabilityHeader {
cap_id: u8, // Capability ID
next_cap: u8, // Pointer to next capability
cap_reg: u16, // Capability-specific register
}
// ⚠️ Packed structs: taking &field creates an unaligned reference — UB.
// Always copy fields out: let id = header.cap_id; // OK (Copy)
// Never do: let r = &header.cap_reg; // UB if unaligned
}
zerocopy and bytemuck — Safe Transmutation
Instead of unsafe transmute, use crates that verify layout safety at compile time:
#![allow(unused)]
fn main() {
// --- zerocopy: Compile-time checked zero-copy conversions ---
// Cargo.toml: zerocopy = { version = "0.8", features = ["derive"] }
use zerocopy::{FromBytes, IntoBytes, KnownLayout, Immutable};
#[derive(FromBytes, IntoBytes, KnownLayout, Immutable, Debug)]
#[repr(C)]
struct SensorReading {
sensor_id: u16,
flags: u8,
_reserved: u8,
value: u32, // Fixed-point: actual = value / 1000.0
}
fn parse_sensor(raw: &[u8]) -> Option<&SensorReading> {
// Safe zero-copy: verifies alignment and size AT COMPILE TIME
SensorReading::ref_from_bytes(raw).ok()
// Returns &SensorReading pointing INTO raw — no copy, no allocation
}
// --- bytemuck: Simple, battle-tested ---
// Cargo.toml: bytemuck = { version = "1", features = ["derive"] }
use bytemuck::{Pod, Zeroable};
#[derive(Pod, Zeroable, Clone, Copy, Debug)]
#[repr(C)]
struct GpuRegister {
address: u32,
value: u32,
}
fn cast_registers(data: &[u8]) -> &[GpuRegister] {
// Safe cast: Pod guarantees all bit patterns are valid
bytemuck::cast_slice(data)
}
}
When to use which:
| Approach | Safety | Overhead | Use When |
|---|---|---|---|
| Manual field-by-field parsing | ✅ Safe | Copy fields | Small structs, complex layouts |
zerocopy | ✅ Safe | Zero-copy | Large buffers, many reads, compile-time checks |
bytemuck | ✅ Safe | Zero-copy | Simple Pod types, casting slices |
unsafe { transmute() } | ❌ Unsafe | Zero-copy | Last resort — avoid in application code |
bytes::Bytes — Reference-Counted Buffers
The bytes crate (used by tokio, hyper, tonic) provides zero-copy byte buffers
with reference counting — Bytes is to Vec<u8> what Arc<[u8]> is to owned slices:
use bytes::{Bytes, BytesMut, Buf, BufMut};
fn main() {
// --- BytesMut: mutable buffer for building data ---
let mut buf = BytesMut::with_capacity(1024);
buf.put_u8(0x01); // Write a byte
buf.put_u16(0x1234); // Write u16 (big-endian)
buf.put_slice(b"hello"); // Write raw bytes
buf.put(&b"world"[..]); // Write from slice
// Freeze into immutable Bytes (zero cost):
let data: Bytes = buf.freeze();
// --- Bytes: immutable, reference-counted, cloneable ---
let data2 = data.clone(); // Cheap: increments refcount, NOT deep copy
let slice = data.slice(3..8); // Zero-copy sub-slice (shares buffer)
// Read from Bytes using the Buf trait:
let mut reader = &data[..];
let byte = reader.get_u8(); // 0x01
let short = reader.get_u16(); // 0x1234
// Split without copying:
let mut original = Bytes::from_static(b"HEADER\x00PAYLOAD");
let header = original.split_to(6); // header = "HEADER", original = "\x00PAYLOAD"
println!("header: {:?}", &header[..]);
println!("payload: {:?}", &original[1..]);
}
bytes vs Vec<u8>:
| Feature | Vec<u8> | Bytes |
|---|---|---|
| Clone cost | O(n) deep copy | O(1) refcount increment |
| Sub-slicing | Borrows with lifetime | Owned, refcount-tracked |
| Thread safety | Not Sync (needs Arc) | Send + Sync built in |
| Mutability | Direct &mut | Split into BytesMut first |
| Ecosystem | Standard library | tokio, hyper, tonic, axum |
When to use bytes: Network protocols, packet parsing, any scenario where you receive a buffer and need to split it into parts that are processed by different components or threads. The zero-copy splitting is the killer feature.
Key Takeaways — Serialization & Binary Data
- serde’s derive macros handle 90% of cases; use attributes (
rename,skip,default) for the rest- Zero-copy deserialization (
&'a strin structs) avoids allocation for read-heavy workloadsrepr(C)+zerocopy/bytemuckfor hardware register layouts;bytes::Bytesfor reference-counted buffers
See also: Ch 9 — Error Handling for combining serde errors with
thiserror. Ch 11 — Unsafe forrepr(C)and FFI data layouts.
flowchart LR
subgraph Input
JSON["JSON"]
TOML["TOML"]
Bin["bincode"]
MsgP["MessagePack"]
end
subgraph serde["serde data model"]
Ser["Serialize"]
De["Deserialize"]
end
subgraph Output
Struct["Rust struct"]
Enum["Rust enum"]
end
JSON --> De
TOML --> De
Bin --> De
MsgP --> De
De --> Struct
De --> Enum
Struct --> Ser
Enum --> Ser
Ser --> JSON
Ser --> Bin
style JSON fill:#e8f4f8,stroke:#2980b9,color:#000
style TOML fill:#e8f4f8,stroke:#2980b9,color:#000
style Bin fill:#e8f4f8,stroke:#2980b9,color:#000
style MsgP fill:#e8f4f8,stroke:#2980b9,color:#000
style Ser fill:#fef9e7,stroke:#f1c40f,color:#000
style De fill:#fef9e7,stroke:#f1c40f,color:#000
style Struct fill:#d4efdf,stroke:#27ae60,color:#000
style Enum fill:#d4efdf,stroke:#27ae60,color:#000
Exercise: Custom serde Deserialization ★★★ (~45 min)
Design a HumanDuration wrapper that deserializes from human-readable strings like "30s", "5m", "2h" using a custom serde deserializer. It should also serialize back to the same format.
🔑 Solution
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::fmt;
#[derive(Debug, Clone, PartialEq)]
struct HumanDuration(std::time::Duration);
impl HumanDuration {
fn from_str(s: &str) -> Result<Self, String> {
let s = s.trim();
if s.is_empty() { return Err("empty duration string".into()); }
let (num_str, suffix) = s.split_at(
s.find(|c: char| !c.is_ascii_digit()).unwrap_or(s.len())
);
let value: u64 = num_str.parse()
.map_err(|_| format!("invalid number: {num_str}"))?;
let duration = match suffix {
"s" | "sec" => std::time::Duration::from_secs(value),
"m" | "min" => std::time::Duration::from_secs(value * 60),
"h" | "hr" => std::time::Duration::from_secs(value * 3600),
"ms" => std::time::Duration::from_millis(value),
other => return Err(format!("unknown suffix: {other}")),
};
Ok(HumanDuration(duration))
}
}
impl fmt::Display for HumanDuration {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let secs = self.0.as_secs();
if secs == 0 {
write!(f, "{}ms", self.0.as_millis())
} else if secs % 3600 == 0 {
write!(f, "{}h", secs / 3600)
} else if secs % 60 == 0 {
write!(f, "{}m", secs / 60)
} else {
write!(f, "{}s", secs)
}
}
}
impl Serialize for HumanDuration {
fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_str(&self.to_string())
}
}
impl<'de> Deserialize<'de> for HumanDuration {
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
let s = String::deserialize(deserializer)?;
HumanDuration::from_str(&s).map_err(serde::de::Error::custom)
}
}
#[derive(Debug, Deserialize, Serialize)]
struct Config {
timeout: HumanDuration,
retry_interval: HumanDuration,
}
fn main() {
let json = r#"{ "timeout": "30s", "retry_interval": "5m" }"#;
let config: Config = serde_json::from_str(json).unwrap();
assert_eq!(config.timeout.0, std::time::Duration::from_secs(30));
assert_eq!(config.retry_interval.0, std::time::Duration::from_secs(300));
let serialized = serde_json::to_string(&config).unwrap();
assert!(serialized.contains("30s"));
println!("Config: {serialized}");
}
11. Unsafe Rust — Controlled Danger 🔴
What you’ll learn:
- The five unsafe superpowers and when each is needed
- Writing sound abstractions: safe API, unsafe internals
- FFI patterns for calling C from Rust (and back)
- Common UB pitfalls and arena/slab allocator patterns
The Five Unsafe Superpowers
unsafe unlocks five operations that the compiler can’t verify:
#![allow(unused)]
fn main() {
// SAFETY: each operation is explained inline below.
unsafe {
// 1. Dereference a raw pointer
let ptr: *const i32 = &42;
let value = *ptr; // Could be a dangling/null pointer
// 2. Call an unsafe function
let layout = std::alloc::Layout::new::<u64>();
let mem = std::alloc::alloc(layout);
// 3. Access a mutable static variable
static mut COUNTER: u32 = 0;
COUNTER += 1; // Data race if multiple threads access
// 4. Implement an unsafe trait
// unsafe impl Send for MyType {}
// 5. Access fields of a union
// union IntOrFloat { i: i32, f: f32 }
// let u = IntOrFloat { i: 42 };
// let f = u.f; // Reinterpret bits — could be garbage
}
}
Key principle:
unsafedoesn’t turn off the borrow checker or type system. It only unlocks these five specific capabilities. All other Rust rules still apply.
Writing Sound Abstractions
The purpose of unsafe is to build safe abstractions around unsafe operations:
#![allow(unused)]
fn main() {
/// A fixed-capacity stack-allocated buffer.
/// All public methods are safe — the unsafe is encapsulated.
pub struct StackBuf<T, const N: usize> {
data: [std::mem::MaybeUninit<T>; N],
len: usize,
}
impl<T, const N: usize> StackBuf<T, N> {
pub fn new() -> Self {
StackBuf {
// Each element is individually MaybeUninit — no unsafe needed.
// `const { ... }` blocks (Rust 1.79+) let us repeat a non-Copy
// const expression N times.
data: [const { std::mem::MaybeUninit::uninit() }; N],
len: 0,
}
}
pub fn push(&mut self, value: T) -> Result<(), T> {
if self.len >= N {
return Err(value); // Buffer full — return value to caller
}
// SAFETY: len < N, so data[len] is within bounds.
// We write a valid T into the MaybeUninit slot.
self.data[self.len] = std::mem::MaybeUninit::new(value);
self.len += 1;
Ok(())
}
pub fn get(&self, index: usize) -> Option<&T> {
if index < self.len {
// SAFETY: index < len, and data[0..len] are all initialized.
Some(unsafe { self.data[index].assume_init_ref() })
} else {
None
}
}
}
impl<T, const N: usize> Drop for StackBuf<T, N> {
fn drop(&mut self) {
// SAFETY: data[0..len] are initialized — drop them properly.
for i in 0..self.len {
unsafe { self.data[i].assume_init_drop(); }
}
}
}
}
The three rules of sound unsafe code:
- Document invariants — every
// SAFETY:comment explains why the operation is valid - Encapsulate — the unsafe is inside a safe API; users can’t trigger UB
- Minimize — only the smallest possible block is
unsafe
FFI Patterns: Calling C from Rust
#![allow(unused)]
fn main() {
// Declare the C function signature:
extern "C" {
fn strlen(s: *const std::ffi::c_char) -> usize;
fn printf(format: *const std::ffi::c_char, ...) -> std::ffi::c_int;
}
// Safe wrapper:
fn safe_strlen(s: &str) -> usize {
let c_string = std::ffi::CString::new(s).expect("string contains null byte");
// SAFETY: c_string is a valid null-terminated string, alive for the call.
unsafe { strlen(c_string.as_ptr()) }
}
// Calling Rust from C (export a function):
#[no_mangle]
pub extern "C" fn rust_add(a: i32, b: i32) -> i32 {
a + b
}
}
Common FFI types:
| Rust | C | Notes |
|---|---|---|
i32 / u32 | int32_t / uint32_t | Fixed-width, safe |
*const T / *mut T | const T* / T* | Raw pointers |
std::ffi::CStr | const char* (borrowed) | Null-terminated, borrowed |
std::ffi::CString | char* (owned) | Null-terminated, owned |
std::ffi::c_void | void | Opaque pointer target |
Option<fn(...)> | Nullable function pointer | None = NULL |
Common UB Pitfalls
| Pitfall | Example | Why It’s UB |
|---|---|---|
| Null dereference | *std::ptr::null::<i32>() | Dereferencing null is always UB |
| Dangling pointer | Dereference after drop() | Memory may be reused |
| Data race | Two threads write to static mut | Unsynchronized concurrent writes |
Wrong assume_init | MaybeUninit::<String>::uninit().assume_init() | Reading uninitialized memory. Note: [const { MaybeUninit::uninit() }; N] (Rust 1.79+) is the safe way to create an array of MaybeUninit — no unsafe or assume_init needed (see StackBuf::new() above). |
| Aliasing violation | Creating two &mut to same data | Violates Rust’s aliasing model |
| Invalid enum value | std::mem::transmute::<u8, bool>(2) | bool can only be 0 or 1 |
When to use
unsafein production:
- FFI boundaries (calling C/C++ code)
- Performance-critical inner loops (avoid bounds checks)
- Building primitives (
Vec,HashMap— these use unsafe internally)- Never in application logic if you can avoid it
Custom Allocators — Arena and Slab Patterns
In C, you’d write custom malloc() replacements for specific allocation patterns —
arena allocators that free everything at once, slab allocators for fixed-size objects,
or pool allocators for high-throughput systems. Rust provides the same power through
the GlobalAlloc trait and allocator crates, with the added benefit of lifetime-scoped
arenas that prevent use-after-free at compile time.
Arena Allocators — Bulk Allocation, Bulk Free
An arena allocates by bumping a pointer forward. Individual items can’t be freed — the entire arena is freed at once. This is perfect for request-scoped or frame-scoped allocations:
#![allow(unused)]
fn main() {
use bumpalo::Bump;
fn process_sensor_frame(raw_data: &[u8]) {
// Create an arena for this frame's allocations
let arena = Bump::new();
// Allocate objects in the arena — ~2ns each (just a pointer bump)
let header = arena.alloc(parse_header(raw_data));
let readings: &mut [f32] = arena.alloc_slice_fill_default(header.sensor_count);
for (i, chunk) in raw_data[header.payload_offset..].chunks(4).enumerate() {
if i < readings.len() {
readings[i] = f32::from_le_bytes(chunk.try_into().unwrap());
}
}
// Use readings...
let avg = readings.iter().sum::<f32>() / readings.len() as f32;
println!("Frame avg: {avg:.2}");
// `arena` drops here — ALL allocations freed at once in O(1)
// No per-object destructor overhead, no fragmentation
}
fn parse_header(_: &[u8]) -> Header { Header { sensor_count: 4, payload_offset: 8 } }
struct Header { sensor_count: usize, payload_offset: usize }
}
Arena vs standard allocator:
| Aspect | Vec::new() / Box::new() | Bump arena |
|---|---|---|
| Alloc speed | ~25ns (malloc) | ~2ns (pointer bump) |
| Free speed | Per-object destructor | O(1) bulk free |
| Fragmentation | Yes (long-lived processes) | None within arena |
| Lifetime safety | Heap — freed on Drop | Arena reference — compile-time scoped |
| Use case | General purpose | Request/frame/batch processing |
typed-arena — Type-Safe Arena
When all arena objects are the same type, typed-arena provides a simpler API
that returns references with the arena’s lifetime:
#![allow(unused)]
fn main() {
use typed_arena::Arena;
struct AstNode<'a> {
value: i32,
children: Vec<&'a AstNode<'a>>,
}
fn build_tree() {
let arena: Arena<AstNode<'_>> = Arena::new();
// Allocate nodes — returns &AstNode tied to arena's lifetime
let root = arena.alloc(AstNode { value: 1, children: vec![] });
let left = arena.alloc(AstNode { value: 2, children: vec![] });
let right = arena.alloc(AstNode { value: 3, children: vec![] });
// Build the tree — all references valid as long as `arena` lives
// (Mutable access requires interior mutability for truly mutable trees)
println!("Root: {}, Left: {}, Right: {}", root.value, left.value, right.value);
// `arena` drops here — all nodes freed at once
}
}
Slab Allocators — Fixed-Size Object Pools
A slab allocator pre-allocates a pool of fixed-size slots. Objects are allocated and returned individually, but all slots are the same size — eliminating fragmentation and enabling O(1) alloc/free:
#![allow(unused)]
fn main() {
use slab::Slab;
struct Connection {
id: u64,
buffer: [u8; 1024],
active: bool,
}
fn connection_pool_example() {
// Pre-allocate a slab for connections
let mut connections: Slab<Connection> = Slab::with_capacity(256);
// Insert returns a key (usize index) — O(1)
let key1 = connections.insert(Connection {
id: 1001,
buffer: [0; 1024],
active: true,
});
let key2 = connections.insert(Connection {
id: 1002,
buffer: [0; 1024],
active: true,
});
// Access by key — O(1)
if let Some(conn) = connections.get_mut(key1) {
conn.buffer[0..5].copy_from_slice(b"hello");
}
// Remove returns the value — O(1), slot is reused for next insert
let removed = connections.remove(key2);
assert_eq!(removed.id, 1002);
// Next insert reuses the freed slot — no fragmentation
let key3 = connections.insert(Connection {
id: 1003,
buffer: [0; 1024],
active: true,
});
assert_eq!(key3, key2); // Same slot reused!
}
}
Implementing a Minimal Arena (for no_std)
For bare-metal environments where you can’t pull in bumpalo, here’s a
minimal arena built on unsafe:
#![allow(unused)]
#![cfg_attr(not(test), no_std)]
fn main() {
use core::alloc::Layout;
use core::cell::{Cell, UnsafeCell};
/// A simple bump allocator backed by a fixed-size byte array.
/// Not thread-safe — use per-core or with a lock for multi-threaded contexts.
///
/// **Important**: Like `bumpalo`, this arena does NOT call destructors on
/// allocated items when the arena is dropped. Types with `Drop` impls will
/// leak their resources (file handles, sockets, etc.). Only allocate types
/// without meaningful `Drop` impls, or manually drop them before the arena.
pub struct FixedArena<const N: usize> {
// UnsafeCell is REQUIRED here: we mutate `buf` through `&self`.
// Without UnsafeCell, casting &self.buf to *mut u8 would be UB
// (violates Rust's aliasing model — shared ref implies immutable).
buf: UnsafeCell<[u8; N]>,
offset: Cell<usize>, // Interior mutability for &self allocation
}
impl<const N: usize> FixedArena<N> {
pub const fn new() -> Self {
FixedArena {
buf: UnsafeCell::new([0; N]),
offset: Cell::new(0),
}
}
/// Allocate a `T` in the arena. Returns `None` if out of space.
pub fn alloc<T>(&self, value: T) -> Option<&mut T> {
let layout = Layout::new::<T>();
let current = self.offset.get();
// Align up
let aligned = (current + layout.align() - 1) & !(layout.align() - 1);
let new_offset = aligned + layout.size();
if new_offset > N {
return None; // Arena full
}
self.offset.set(new_offset);
// SAFETY:
// - `aligned` is within `buf` bounds (checked above)
// - Alignment is correct (aligned to T's requirement)
// - No aliasing: each alloc returns a unique, non-overlapping region
// - UnsafeCell grants permission to mutate through &self
// - The arena outlives the returned reference (caller must ensure)
let ptr = unsafe {
let base = (self.buf.get() as *mut u8).add(aligned);
let typed = base as *mut T;
typed.write(value);
&mut *typed
};
Some(ptr)
}
/// Reset the arena — invalidates all previous allocations.
///
/// # Safety
/// Caller must ensure no references to arena-allocated data exist.
pub unsafe fn reset(&self) {
self.offset.set(0);
}
pub fn used(&self) -> usize {
self.offset.get()
}
pub fn remaining(&self) -> usize {
N - self.offset.get()
}
}
}
Choosing an Allocator Strategy
Note: The diagram below uses Mermaid syntax. It renders on GitHub and in tools that support Mermaid (mdBook with
mermaidplugin, VS Code with Mermaid extension). In plain Markdown viewers, you’ll see the raw source.
graph TD
A["What's your allocation pattern?"] --> B{All same type?}
A --> I{"Environment?"}
B -->|Yes| C{Need individual free?}
B -->|No| D{Need individual free?}
C -->|Yes| E["<b>Slab</b><br/>slab crate<br/>O(1) alloc + free<br/>Index-based access"]
C -->|No| F["<b>typed-arena</b><br/>Bulk alloc, bulk free<br/>Lifetime-scoped refs"]
D -->|Yes| G["<b>Standard allocator</b><br/>Box, Vec, etc.<br/>General-purpose malloc"]
D -->|No| H["<b>Bump arena</b><br/>bumpalo crate<br/>~2ns alloc, O(1) bulk free"]
I -->|no_std| J["FixedArena (custom)<br/>or embedded-alloc"]
I -->|std| K["bumpalo / typed-arena / slab"]
style E fill:#91e5a3,color:#000
style F fill:#91e5a3,color:#000
style G fill:#89CFF0,color:#000
style H fill:#91e5a3,color:#000
style J fill:#ffa07a,color:#000
style K fill:#91e5a3,color:#000
| C Pattern | Rust Equivalent | Key Advantage |
|---|---|---|
Custom malloc() pool | #[global_allocator] impl | Type-safe, debuggable |
obstack (GNU) | bumpalo::Bump | Lifetime-scoped, no use-after-free |
Kernel slab (kmem_cache) | slab::Slab<T> | Type-safe, index-based |
| Stack-allocated temp buffer | FixedArena<N> (above) | No heap, const constructible |
alloca() | [T; N] or SmallVec | Compile-time sized, no UB |
Cross-reference: For bare-metal allocator setup (
#[global_allocator]withembedded-alloc), see the Rust Training for C Programmers, Chapter 15.1 “Global Allocator Setup” which covers the embedded-specific bootstrapping.
Key Takeaways — Unsafe Rust
- Document invariants (
SAFETY:comments), encapsulate behind safe APIs, minimize unsafe scope[const { MaybeUninit::uninit() }; N](Rust 1.79+) replaces the oldassume_initanti-pattern- FFI requires
extern "C",#[repr(C)], and careful null/lifetime handling- Arena and slab allocators trade general-purpose flexibility for allocation speed
See also: Ch 4 — PhantomData for variance and drop-check interactions with unsafe code. Ch 8 — Smart Pointers for Pin and self-referential types.
Exercise: Safe Wrapper around Unsafe ★★★ (~45 min)
Write a FixedVec<T, const N: usize> — a fixed-capacity, stack-allocated vector.
Requirements:
push(&mut self, value: T) -> Result<(), T>returnsErr(value)when fullpop(&mut self) -> Option<T>returns and removes the last elementas_slice(&self) -> &[T]borrows initialized elements- All public methods must be safe; all unsafe must be encapsulated with
SAFETY:comments Dropmust clean up initialized elements
🔑 Solution
use std::mem::MaybeUninit;
pub struct FixedVec<T, const N: usize> {
data: [MaybeUninit<T>; N],
len: usize,
}
impl<T, const N: usize> FixedVec<T, N> {
pub fn new() -> Self {
FixedVec {
data: [const { MaybeUninit::uninit() }; N],
len: 0,
}
}
pub fn push(&mut self, value: T) -> Result<(), T> {
if self.len >= N { return Err(value); }
// SAFETY: len < N, so data[len] is within bounds.
self.data[self.len] = MaybeUninit::new(value);
self.len += 1;
Ok(())
}
pub fn pop(&mut self) -> Option<T> {
if self.len == 0 { return None; }
self.len -= 1;
// SAFETY: data[len] was initialized (len was > 0 before decrement).
Some(unsafe { self.data[self.len].assume_init_read() })
}
pub fn as_slice(&self) -> &[T] {
// SAFETY: data[0..len] are all initialized, and MaybeUninit<T>
// has the same layout as T.
unsafe { std::slice::from_raw_parts(self.data.as_ptr() as *const T, self.len) }
}
pub fn len(&self) -> usize { self.len }
pub fn is_empty(&self) -> bool { self.len == 0 }
}
impl<T, const N: usize> Drop for FixedVec<T, N> {
fn drop(&mut self) {
// SAFETY: data[0..len] are initialized — drop each one.
for i in 0..self.len {
unsafe { self.data[i].assume_init_drop(); }
}
}
}
fn main() {
let mut v = FixedVec::<String, 4>::new();
v.push("hello".into()).unwrap();
v.push("world".into()).unwrap();
assert_eq!(v.as_slice(), &["hello", "world"]);
assert_eq!(v.pop(), Some("world".into()));
assert_eq!(v.len(), 1);
}
12. Macros — Code That Writes Code 🟡
What you’ll learn:
- Declarative macros (
macro_rules!) with pattern matching and repetition- When macros are the right tool vs generics/traits
- Procedural macros: derive, attribute, and function-like
- Writing a custom derive macro with
synandquote
Declarative Macros (macro_rules!)
Macros match patterns on syntax and expand to code at compile time:
#![allow(unused)]
fn main() {
// A simple macro that creates a HashMap
macro_rules! hashmap {
// Match: key => value pairs separated by commas
( $( $key:expr => $value:expr ),* $(,)? ) => {
{
let mut map = std::collections::HashMap::new();
$( map.insert($key, $value); )*
map
}
};
}
let scores = hashmap! {
"Alice" => 95,
"Bob" => 87,
"Carol" => 92,
};
// Expands to:
// let mut map = HashMap::new();
// map.insert("Alice", 95);
// map.insert("Bob", 87);
// map.insert("Carol", 92);
// map
}
Macro fragment types:
| Fragment | Matches | Example |
|---|---|---|
$x:expr | Any expression | 42, a + b, foo() |
$x:ty | A type | i32, Vec<String> |
$x:ident | An identifier | my_var, Config |
$x:pat | A pattern | Some(x), _ |
$x:stmt | A statement | let x = 5; |
$x:tt | A single token tree | Anything (most flexible) |
$x:literal | A literal value | 42, "hello", true |
Repetition: $( ... ),* means “zero or more, comma-separated”
#![allow(unused)]
fn main() {
// Generate test functions automatically
macro_rules! test_cases {
( $( $name:ident: $input:expr => $expected:expr ),* $(,)? ) => {
$(
#[test]
fn $name() {
assert_eq!(process($input), $expected);
}
)*
};
}
test_cases! {
test_empty: "" => "",
test_hello: "hello" => "HELLO",
test_trim: " spaces " => "SPACES",
}
// Generates three separate #[test] functions
}
When (Not) to Use Macros
Use macros when:
- Reducing boilerplate that traits/generics can’t handle (variadic arguments, DRY test generation)
- Creating DSLs (
html!,sql!,vec!) - Conditional code generation (
cfg!,compile_error!)
Don’t use macros when:
- A function or generic would work (macros are harder to debug, autocomplete doesn’t help)
- You need type checking inside the macro (macros operate on tokens, not types)
- The pattern is used once or twice (not worth the abstraction cost)
#![allow(unused)]
fn main() {
// ❌ Unnecessary macro — a function works fine:
macro_rules! double {
($x:expr) => { $x * 2 };
}
// ✅ Just use a function:
fn double(x: i32) -> i32 { x * 2 }
// ✅ Good macro use — variadic, can't be a function:
macro_rules! println {
($($arg:tt)*) => { /* format string + args */ };
}
}
Procedural Macros Overview
Procedural macros are Rust functions that transform token streams. They require a separate crate with proc-macro = true:
#![allow(unused)]
fn main() {
// Three types of proc macros:
// 1. Derive macros — #[derive(MyTrait)]
// Generate trait implementations from struct definitions
#[derive(Debug, Clone, Serialize, Deserialize)]
struct Config {
name: String,
port: u16,
}
// 2. Attribute macros — #[my_attribute]
// Transform the annotated item
#[route(GET, "/api/users")]
async fn list_users() -> Json<Vec<User>> { /* ... */ }
// 3. Function-like macros — my_macro!(...)
// Custom syntax
let query = sql!(SELECT * FROM users WHERE id = ?);
}
Derive Macros in Practice
The most common proc macro type. Here’s how #[derive(Debug)] works conceptually:
#![allow(unused)]
fn main() {
// Input (your struct):
#[derive(Debug)]
struct Point {
x: f64,
y: f64,
}
// The derive macro generates:
impl std::fmt::Debug for Point {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Point")
.field("x", &self.x)
.field("y", &self.y)
.finish()
}
}
}
Commonly used derive macros:
| Derive | Crate | What It Generates |
|---|---|---|
Debug | std | fmt::Debug impl (debug printing) |
Clone, Copy | std | Value duplication |
PartialEq, Eq | std | Equality comparison |
Hash | std | Hashing for HashMap keys |
Serialize, Deserialize | serde | JSON/YAML/etc. encoding |
Error | thiserror | std::error::Error + Display |
Parser | clap | CLI argument parsing |
Builder | derive_builder | Builder pattern |
Practical advice: Use derive macros liberally — they eliminate error-prone boilerplate. Writing your own proc macros is an advanced topic; use existing ones (
serde,thiserror,clap) before building custom ones.
Macro Hygiene and $crate
Hygiene means that identifiers created inside a macro don’t collide with
identifiers in the caller’s scope. Rust’s macro_rules! is partially hygienic:
macro_rules! make_var {
() => {
let x = 42; // This 'x' is in the MACRO's scope
};
}
fn main() {
let x = 10;
make_var!(); // Creates a different 'x' (hygienic)
println!("{x}"); // Prints 10, not 42 — macro's x doesn't leak
}
$crate: When writing macros in a library, use $crate to refer to
your own crate — it resolves correctly regardless of how users import your crate:
#![allow(unused)]
fn main() {
// In my_diagnostics crate:
pub fn log_result(msg: &str) {
println!("[diag] {msg}");
}
#[macro_export]
macro_rules! diag_log {
($($arg:tt)*) => {
// ✅ $crate always resolves to my_diagnostics, even if the user
// renamed the crate in their Cargo.toml
$crate::log_result(&format!($($arg)*))
};
}
// ❌ Without $crate:
// my_diagnostics::log_result(...) ← breaks if user writes:
// [dependencies]
// diag = { package = "my_diagnostics", version = "1" }
}
Rule: Always use
$crate::in#[macro_export]macros. Never use your crate’s name directly.
Recursive Macros and tt Munching
Recursive macros process input one token at a time — a technique called
tt munching (token-tree munching):
// Count the number of expressions passed to the macro
macro_rules! count {
// Base case: no tokens left
() => { 0usize };
// Recursive case: consume one expression, count the rest
($head:expr $(, $tail:expr)* $(,)?) => {
1usize + count!($($tail),*)
};
}
fn main() {
let n = count!("a", "b", "c", "d");
assert_eq!(n, 4);
// Works at compile time too:
const N: usize = count!(1, 2, 3);
assert_eq!(N, 3);
}
#![allow(unused)]
fn main() {
// Build a heterogeneous tuple from a list of expressions:
macro_rules! tuple_from {
// Base: single element
($single:expr $(,)?) => { ($single,) };
// Recursive: first element + rest
($head:expr, $($tail:expr),+ $(,)?) => {
($head, tuple_from!($($tail),+))
};
}
let t = tuple_from!(1, "hello", 3.14, true);
// Expands to: (1, ("hello", (3.14, (true,))))
}
Fragment specifier subtleties:
| Fragment | Gotcha |
|---|---|
$x:expr | Greedily parses — 1 + 2 is ONE expression, not three tokens |
$x:ty | Greedily parses — Vec<String> is one type; can’t be followed by + or < |
$x:tt | Matches exactly ONE token tree — most flexible, least checked |
$x:ident | Only plain identifiers — not paths like std::io |
$x:pat | In Rust 2021, matches A | B patterns; use $x:pat_param for single patterns |
When to use
tt: When you need to forward tokens to another macro without the parser constraining them.$($args:tt)*is the “accept everything” pattern (used byprintln!,format!,vec!).
Writing a Derive Macro with syn and quote
Derive macros live in a separate crate (proc-macro = true) and transform
a token stream using syn (parse Rust) and quote (generate Rust):
my_derive/Cargo.toml
[lib] proc-macro = true
[dependencies] syn = { version = “2”, features = [“full”] } quote = “1” proc-macro2 = “1”
#![allow(unused)]
fn main() {
// my_derive/src/lib.rs
use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, DeriveInput};
/// Derive macro that generates a `describe()` method
/// returning the struct name and field names.
#[proc_macro_derive(Describe)]
pub fn derive_describe(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as DeriveInput);
let name = &input.ident;
let name_str = name.to_string();
// Extract field names (only for structs with named fields)
let fields = match &input.data {
syn::Data::Struct(data) => {
data.fields.iter()
.filter_map(|f| f.ident.as_ref())
.map(|id| id.to_string())
.collect::<Vec<_>>()
}
_ => vec![],
};
let field_list = fields.join(", ");
let expanded = quote! {
impl #name {
pub fn describe() -> String {
format!("{} {{ {} }}", #name_str, #field_list)
}
}
};
TokenStream::from(expanded)
}
}
// In the application crate:
use my_derive::Describe;
#[derive(Describe)]
struct SensorReading {
sensor_id: u16,
value: f64,
timestamp: u64,
}
fn main() {
println!("{}", SensorReading::describe());
// "SensorReading { sensor_id, value, timestamp }"
}
The workflow: TokenStream (raw tokens) → syn::parse (AST) →
inspect/transform → quote! (generate tokens) → TokenStream (back to compiler).
| Crate | Role | Key types |
|---|---|---|
proc-macro | Compiler interface | TokenStream |
syn | Parse Rust source into AST | DeriveInput, ItemFn, Type |
quote | Generate Rust tokens from templates | quote!{}, #variable interpolation |
proc-macro2 | Bridge between syn/quote and proc-macro | TokenStream, Span |
Practical tip: Start by studying the source of a simple derive macro like
thiserrororderive_morebefore writing your own. Thecargo expandcommand (viacargo-expand) shows what any macro expands to — invaluable for debugging.
Key Takeaways — Macros
macro_rules!for simple code generation; proc macros (syn+quote) for complex derives- Prefer generics/traits over macros when possible — macros are harder to debug and maintain
$crateensures hygiene;ttmunching enables recursive pattern matching
See also: Ch 2 — Traits for when traits/generics beat macros. Ch 13 — Testing for testing macro-generated code.
flowchart LR
A["Source code"] --> B["macro_rules!<br>pattern matching"]
A --> C["#[derive(MyMacro)]<br>proc macro"]
B --> D["Token expansion"]
C --> E["syn: parse AST"]
E --> F["Transform"]
F --> G["quote!: generate tokens"]
G --> D
D --> H["Compiled code"]
style A fill:#e8f4f8,stroke:#2980b9,color:#000
style B fill:#d4efdf,stroke:#27ae60,color:#000
style C fill:#fdebd0,stroke:#e67e22,color:#000
style D fill:#fef9e7,stroke:#f1c40f,color:#000
style E fill:#fdebd0,stroke:#e67e22,color:#000
style F fill:#fdebd0,stroke:#e67e22,color:#000
style G fill:#fdebd0,stroke:#e67e22,color:#000
style H fill:#d4efdf,stroke:#27ae60,color:#000
Exercise: Declarative Macro — map! ★ (~15 min)
Write a map! macro that creates a HashMap from key-value pairs:
let m = map! {
"host" => "localhost",
"port" => "8080",
};
assert_eq!(m.get("host"), Some(&"localhost"));
Requirements: support trailing comma and empty invocation map!{}.
🔑 Solution
macro_rules! map {
() => { std::collections::HashMap::new() };
( $( $key:expr => $val:expr ),+ $(,)? ) => {{
let mut m = std::collections::HashMap::new();
$( m.insert($key, $val); )+
m
}};
}
fn main() {
let config = map! {
"host" => "localhost",
"port" => "8080",
"timeout" => "30",
};
assert_eq!(config.len(), 3);
assert_eq!(config["host"], "localhost");
let empty: std::collections::HashMap<String, String> = map!();
assert!(empty.is_empty());
let scores = map! { 1 => 100, 2 => 200 };
assert_eq!(scores[&1], 100);
}
13. Testing and Benchmarking Patterns 🟢
What you’ll learn:
- Rust’s three test tiers: unit, integration, and doc tests
- Property-based testing with proptest for discovering edge cases
- Benchmarking with criterion for reliable performance measurement
- Mocking strategies without heavyweight frameworks
Unit Tests, Integration Tests, Doc Tests
Rust has three testing tiers built into the language:
#![allow(unused)]
fn main() {
// --- Unit tests: in the same file as the code ---
pub fn factorial(n: u64) -> u64 {
(1..=n).product()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_factorial_zero() {
// (1..=0).product() returns 1 — the multiplication identity for empty ranges
assert_eq!(factorial(0), 1);
}
#[test]
fn test_factorial_five() {
assert_eq!(factorial(5), 120);
}
#[test]
#[cfg(debug_assertions)] // overflow checks are only enabled in debug mode
#[should_panic(expected = "overflow")]
fn test_factorial_overflow() {
// ⚠️ This test only passes in debug mode (overflow checks enabled).
// In release mode (`cargo test --release`), u64 arithmetic wraps
// silently and no panic occurs. Use `checked_mul` or the
// `overflow-checks = true` profile setting for release-mode safety.
factorial(100); // Should panic on overflow
}
#[test]
fn test_with_result() -> Result<(), Box<dyn std::error::Error>> {
// Tests can return Result — ? works inside!
let value: u64 = "42".parse()?;
assert_eq!(value, 42);
Ok(())
}
}
}
#![allow(unused)]
fn main() {
// --- Integration tests: in tests/ directory ---
// tests/integration_test.rs
// These test your crate's PUBLIC API only
use my_crate::factorial;
#[test]
fn test_factorial_from_outside() {
assert_eq!(factorial(10), 3_628_800);
}
}
#![allow(unused)]
fn main() {
// --- Doc tests: in documentation comments ---
/// Computes the factorial of `n`.
///
/// # Examples
///
/// ```
/// use my_crate::factorial;
/// assert_eq!(factorial(5), 120);
/// ```
///
/// # Panics
///
/// Panics if the result overflows `u64`.
///
/// ```should_panic
/// my_crate::factorial(100);
/// ```
pub fn factorial(n: u64) -> u64 {
(1..=n).product()
}
// Doc tests are compiled and run by `cargo test` — they keep examples honest.
}
Test Fixtures and Setup
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
// Shared setup — create a helper function
fn setup_database() -> TestDb {
let db = TestDb::new_in_memory();
db.run_migrations();
db.seed_test_data();
db
}
#[test]
fn test_user_creation() {
let db = setup_database();
let user = db.create_user("Alice", "alice@test.com").unwrap();
assert_eq!(user.name, "Alice");
}
#[test]
fn test_user_deletion() {
let db = setup_database();
db.create_user("Bob", "bob@test.com").unwrap();
assert!(db.delete_user("Bob").is_ok());
assert!(db.get_user("Bob").is_none());
}
// Cleanup with Drop (RAII):
struct TempDir {
path: std::path::PathBuf,
}
impl TempDir {
fn new() -> Self {
// Cargo.toml: rand = "0.8"
let path = std::env::temp_dir().join(format!("test_{}", rand::random::<u32>()));
std::fs::create_dir_all(&path).unwrap();
TempDir { path }
}
}
impl Drop for TempDir {
fn drop(&mut self) {
let _ = std::fs::remove_dir_all(&self.path);
}
}
#[test]
fn test_file_operations() {
let dir = TempDir::new(); // Created
std::fs::write(dir.path.join("test.txt"), "hello").unwrap();
assert!(dir.path.join("test.txt").exists());
} // dir dropped here → temp directory cleaned up
}
}
Property-Based Testing (proptest)
Instead of testing specific values, test properties that should always hold:
#![allow(unused)]
fn main() {
// Cargo.toml: proptest = "1"
use proptest::prelude::*;
fn reverse(v: &[i32]) -> Vec<i32> {
v.iter().rev().cloned().collect()
}
proptest! {
#[test]
fn test_reverse_twice_is_identity(v in prop::collection::vec(any::<i32>(), 0..100)) {
// Property: reversing twice gives back the original
assert_eq!(reverse(&reverse(&v)), v);
}
#[test]
fn test_reverse_preserves_length(v in prop::collection::vec(any::<i32>(), 0..100)) {
assert_eq!(reverse(&v).len(), v.len());
}
#[test]
fn test_sort_is_idempotent(mut v in prop::collection::vec(any::<i32>(), 0..100)) {
v.sort();
let sorted_once = v.clone();
v.sort();
assert_eq!(v, sorted_once); // Sorting twice = sorting once
}
#[test]
fn test_parse_roundtrip(x in any::<f64>().prop_filter("finite", |x| x.is_finite())) {
// Property: formatting then parsing gives back the same value
let s = format!("{x}");
let parsed: f64 = s.parse().unwrap();
prop_assert!((x - parsed).abs() < f64::EPSILON);
}
}
}
When to use proptest: When you’re testing a function with a large input space and want confidence it works for edge cases you didn’t think of. proptest generates hundreds of random inputs and shrinks failures to the minimal reproducing case.
Benchmarking with criterion
#![allow(unused)]
fn main() {
// Cargo.toml:
// [dev-dependencies]
// criterion = { version = "0.5", features = ["html_reports"] }
//
// [[bench]]
// name = "my_benchmarks"
// harness = false
// benches/my_benchmarks.rs
use criterion::{criterion_group, criterion_main, Criterion, black_box};
fn fibonacci(n: u64) -> u64 {
match n {
0 | 1 => n,
_ => fibonacci(n - 1) + fibonacci(n - 2),
}
}
fn bench_fibonacci(c: &mut Criterion) {
c.bench_function("fibonacci 20", |b| {
b.iter(|| fibonacci(black_box(20)))
});
// Compare different implementations:
let mut group = c.benchmark_group("fibonacci_compare");
for size in [10, 15, 20, 25] {
group.bench_with_input(
criterion::BenchmarkId::from_parameter(size),
&size,
|b, &size| b.iter(|| fibonacci(black_box(size))),
);
}
group.finish();
}
criterion_group!(benches, bench_fibonacci);
criterion_main!(benches);
// Run: cargo bench
// Produces HTML reports in target/criterion/
}
Mocking Strategies without Frameworks
Rust’s trait system provides natural dependency injection — no mocking framework required:
#![allow(unused)]
fn main() {
// Define behavior as a trait
trait Clock {
fn now(&self) -> std::time::Instant;
}
trait HttpClient {
fn get(&self, url: &str) -> Result<String, String>;
}
// Production implementations
struct RealClock;
impl Clock for RealClock {
fn now(&self) -> std::time::Instant { std::time::Instant::now() }
}
// Service depends on abstractions
struct CacheService<C: Clock, H: HttpClient> {
clock: C,
client: H,
ttl: std::time::Duration,
}
impl<C: Clock, H: HttpClient> CacheService<C, H> {
fn fetch(&self, url: &str) -> Result<String, String> {
// Uses self.clock and self.client — injectable
self.client.get(url)
}
}
// Test with mock implementations — no framework needed!
#[cfg(test)]
mod tests {
use super::*;
struct MockClock {
fixed_time: std::time::Instant,
}
impl Clock for MockClock {
fn now(&self) -> std::time::Instant { self.fixed_time }
}
struct MockHttpClient {
response: String,
}
impl HttpClient for MockHttpClient {
fn get(&self, _url: &str) -> Result<String, String> {
Ok(self.response.clone())
}
}
#[test]
fn test_cache_service() {
let service = CacheService {
clock: MockClock { fixed_time: std::time::Instant::now() },
client: MockHttpClient { response: "cached data".into() },
ttl: std::time::Duration::from_secs(300),
};
assert_eq!(service.fetch("http://example.com").unwrap(), "cached data");
}
}
}
Test philosophy: Prefer real dependencies in integration tests, trait-based mocks in unit tests. Avoid mocking frameworks unless your dependency graph is complex — Rust’s trait generics handle most cases naturally.
Key Takeaways — Testing
- Doc tests (
///) double as documentation and regression tests — they’re compiled and runproptestgenerates random inputs to find edge cases you’d never write manuallycriterionprovides statistically rigorous benchmarks with HTML reports- Mock via trait generics + test doubles, not mock frameworks
See also: Ch 12 — Macros for testing macro-generated code. Ch 14 — API Design for how module layout affects test organization.
Exercise: Property-Based Testing with proptest ★★ (~25 min)
Write a SortedVec<T: Ord> wrapper that maintains a sorted invariant. Use proptest to verify that:
- After any sequence of insertions, the internal vec is always sorted
contains()agrees with the stdlibVec::contains()- The length equals the number of insertions
🔑 Solution
#[derive(Debug)]
struct SortedVec<T: Ord> {
inner: Vec<T>,
}
impl<T: Ord> SortedVec<T> {
fn new() -> Self { SortedVec { inner: Vec::new() } }
fn insert(&mut self, value: T) {
let pos = self.inner.binary_search(&value).unwrap_or_else(|p| p);
self.inner.insert(pos, value);
}
fn contains(&self, value: &T) -> bool {
self.inner.binary_search(value).is_ok()
}
fn len(&self) -> usize { self.inner.len() }
fn as_slice(&self) -> &[T] { &self.inner }
}
#[cfg(test)]
mod tests {
use super::*;
use proptest::prelude::*;
proptest! {
#[test]
fn always_sorted(values in proptest::collection::vec(-1000i32..1000, 0..100)) {
let mut sv = SortedVec::new();
for v in &values {
sv.insert(*v);
}
for w in sv.as_slice().windows(2) {
prop_assert!(w[0] <= w[1]);
}
prop_assert_eq!(sv.len(), values.len());
}
#[test]
fn contains_matches_stdlib(values in proptest::collection::vec(0i32..50, 1..30)) {
let mut sv = SortedVec::new();
for v in &values {
sv.insert(*v);
}
for v in &values {
prop_assert!(sv.contains(v));
}
prop_assert!(!sv.contains(&9999));
}
}
}
14. Crate Architecture and API Design 🟡
What you’ll learn:
- Module layout conventions and re-export strategies
- The public API design checklist for polished crates
- Ergonomic parameter patterns:
impl Into,AsRef,Cow- “Parse, don’t validate” with
TryFromand validated types- Feature flags, conditional compilation, and workspace organization
Module Layout Conventions
my_crate/
├── Cargo.toml
├── src/
│ ├── lib.rs # Crate root — re-exports and public API
│ ├── config.rs # Feature module
│ ├── parser/ # Complex module with sub-modules
│ │ ├── mod.rs # or parser.rs at parent level (Rust 2018+)
│ │ ├── lexer.rs
│ │ └── ast.rs
│ ├── error.rs # Error types
│ └── utils.rs # Internal helpers (pub(crate))
├── tests/
│ └── integration.rs # Integration tests
├── benches/
│ └── perf.rs # Benchmarks
└── examples/
└── basic.rs # cargo run --example basic
#![allow(unused)]
fn main() {
// lib.rs — curate your public API with re-exports:
mod config;
mod error;
mod parser;
mod utils;
// Re-export what users need:
pub use config::Config;
pub use error::Error;
pub use parser::Parser;
// Public types are at the crate root — users write:
// use my_crate::Config;
// NOT: use my_crate::config::Config;
}
Visibility modifiers:
| Modifier | Visible To |
|---|---|
pub | Everyone |
pub(crate) | This crate only |
pub(super) | Parent module |
pub(in path) | Specific ancestor module |
| (none) | Current module and its children |
Public API Design Checklist
- Accept references, return owned —
fn process(input: &str) -> String - Use
impl Traitfor parameters —fn read(r: impl Read)instead offn read<R: Read>(r: R)for cleaner signatures - Return
Result, notpanic!— let callers decide how to handle errors - Implement standard traits —
Debug,Display,Clone,Default,From/Into - Make invalid states unrepresentable — use type states and newtypes
- Follow the builder pattern for complex configuration — with type-state if fields are required
- Seal traits you don’t want users to implement —
pub trait Sealed: private::Sealed {} - Mark types and functions
#[must_use]— prevents silent discard of importantResults, guards, or values. Apply to any type where ignoring the return value is almost certainly a bug:#![allow(unused)] fn main() { #[must_use = "dropping the guard immediately releases the lock"] pub struct LockGuard<'a, T> { /* ... */ } #[must_use] pub fn validate(input: &str) -> Result<ValidInput, ValidationError> { /* ... */ } }
#![allow(unused)]
fn main() {
// Sealed trait pattern — users can use but not implement:
mod private {
pub trait Sealed {}
}
pub trait DatabaseDriver: private::Sealed {
fn connect(&self, url: &str) -> Connection;
}
// Only types in THIS crate can implement Sealed → only we can implement DatabaseDriver
pub struct PostgresDriver;
impl private::Sealed for PostgresDriver {}
impl DatabaseDriver for PostgresDriver {
fn connect(&self, url: &str) -> Connection { /* ... */ }
}
}
#[non_exhaustive]— mark public enums and structs so that adding variants or fields is not a breaking change. Downstream crates must use a wildcard arm (_ =>) in match statements, and cannot construct the type with struct literal syntax:#![allow(unused)] fn main() { #[non_exhaustive] pub enum DiagError { Timeout, HardwareFault, // Adding a new variant in a future release is NOT a semver break. } }
Ergonomic Parameter Patterns — impl Into, AsRef, Cow
One of Rust’s most impactful API patterns is accepting the most general type in
function parameters, so callers don’t need repetitive .to_string(), &*s, or .as_ref()
at every call site. This is the Rust-specific version of “be liberal in what you accept.”
impl Into<T> — Accept Anything Convertible
#![allow(unused)]
fn main() {
// ❌ Friction: callers must convert manually
fn connect(host: String, port: u16) -> Connection {
// ...
}
connect("localhost".to_string(), 5432); // Annoying .to_string()
connect(hostname.clone(), 5432); // Unnecessary clone if we already have String
// ✅ Ergonomic: accept anything that converts to String
fn connect(host: impl Into<String>, port: u16) -> Connection {
let host = host.into(); // Convert once, inside the function
// ...
}
connect("localhost", 5432); // &str — zero friction
connect(hostname, 5432); // String — moved, no clone
connect(arc_str, 5432); // Arc<str> if From is implemented
}
This works because Rust’s From/Into trait pair provides blanket conversions.
When you accept impl Into<T>, you’re saying: “give me anything that knows how to
become a T.”
AsRef<T> — Borrow as a Reference
AsRef<T> is the borrowing counterpart to Into<T>. Use it when you only need
to read the data, not take ownership:
#![allow(unused)]
fn main() {
use std::path::Path;
// ❌ Forces callers to convert to &Path
fn file_exists(path: &Path) -> bool {
path.exists()
}
file_exists(Path::new("/tmp/test.txt")); // Awkward
// ✅ Accept anything that can behave as a &Path
fn file_exists(path: impl AsRef<Path>) -> bool {
path.as_ref().exists()
}
file_exists("/tmp/test.txt"); // &str ✅
file_exists(String::from("/tmp/test.txt")); // String ✅
file_exists(Path::new("/tmp/test.txt")); // &Path ✅
file_exists(PathBuf::from("/tmp/test.txt")); // PathBuf ✅
// Same pattern for string-like parameters:
fn log_message(msg: impl AsRef<str>) {
println!("[LOG] {}", msg.as_ref());
}
log_message("hello"); // &str ✅
log_message(String::from("hello")); // String ✅
}
Cow<T> — Clone on Write
Cow<'a, T> (Clone on Write) delays allocation until mutation is needed.
It holds either a borrowed &T or an owned T::Owned. This is perfect when
most calls don’t need to modify the data:
#![allow(unused)]
fn main() {
use std::borrow::Cow;
/// Normalizes a diagnostic message — only allocates if changes are needed.
fn normalize_message(msg: &str) -> Cow<'_, str> {
if msg.contains('\t') || msg.contains('\r') {
// Must allocate — we need to modify the content
Cow::Owned(msg.replace('\t', " ").replace('\r', ""))
} else {
// No allocation — just borrow the original
Cow::Borrowed(msg)
}
}
// Most messages pass through without allocation:
let clean = normalize_message("All tests passed"); // Borrowed — free
let fixed = normalize_message("Error:\tfailed\r\n"); // Owned — allocated
// Cow<str> implements Deref<Target=str>, so it works like &str:
println!("{}", clean);
println!("{}", fixed.to_uppercase());
}
Quick Reference: Which to Use
Do you need ownership of the data inside the function?
├── YES → impl Into<T>
│ "Give me anything that can become a T"
└── NO → Do you only need to read it?
├── YES → impl AsRef<T> or &T
│ "Give me anything I can borrow as a &T"
└── MAYBE (might need to modify sometimes?)
└── Cow<'_, T>
"Borrow if possible, clone only when you must"
| Pattern | Ownership | Allocation | When to use |
|---|---|---|---|
&str | Borrowed | Never | Simple string params |
impl AsRef<str> | Borrowed | Never | Accept String, &str, etc. — read only |
impl Into<String> | Owned | On conversion | Accept &str, String — will store/own |
Cow<'_, str> | Either | Only if modified | Processing that usually doesn’t modify |
&[u8] / impl AsRef<[u8]> | Borrowed | Never | Byte-oriented APIs |
Borrow<T>vsAsRef<T>: Both provide&T, butBorrow<T>additionally guarantees thatEq,Ord, andHashare consistent between the original and borrowed form. This is whyHashMap<String, V>::get()accepts&Q where String: Borrow<Q>— notAsRef. UseBorrowwhen the borrowed form is used as a lookup key; useAsReffor general “give me a reference” parameters.
Composing Conversions in APIs
#![allow(unused)]
fn main() {
/// A well-designed diagnostic API using ergonomic parameters:
pub struct DiagRunner {
name: String,
config_path: PathBuf,
results: HashMap<String, TestResult>,
}
impl DiagRunner {
/// Accept any string-like type for name, any path-like type for config.
pub fn new(
name: impl Into<String>,
config_path: impl Into<PathBuf>,
) -> Self {
DiagRunner {
name: name.into(),
config_path: config_path.into(),
}
}
/// Accept any AsRef<str> for read-only lookup.
pub fn get_result(&self, test_name: impl AsRef<str>) -> Option<&TestResult> {
self.results.get(test_name.as_ref())
}
}
// All of these work with zero caller friction:
let runner = DiagRunner::new("GPU Diag", "/etc/diag_tool/config.json");
let runner = DiagRunner::new(format!("Diag-{}", node_id), config_path);
let runner = DiagRunner::new(name_string, path_buf);
}
Case Study: Designing a Public Crate API — Before & After
A real-world example of evolving a stringly-typed internal API into an ergonomic, type-safe public API. Consider a configuration parser crate:
Before (stringly-typed, easy to misuse):
#![allow(unused)]
fn main() {
// ❌ All parameters are strings — no compile-time validation
pub fn parse_config(path: &str, format: &str, strict: bool) -> Result<Config, String> {
// What formats are valid? "json"? "JSON"? "Json"?
// Is path a file path or URL?
// What does "strict" even mean?
todo!()
}
}
After (type-safe, self-documenting):
#![allow(unused)]
fn main() {
use std::path::Path;
/// Supported configuration formats.
#[derive(Debug, Clone, Copy)]
#[non_exhaustive] // Adding formats won't break downstream
pub enum Format {
Json,
Toml,
Yaml,
}
/// Controls parsing strictness.
#[derive(Debug, Clone, Copy, Default)]
pub enum Strictness {
/// Reject unknown fields (default for libraries)
#[default]
Strict,
/// Ignore unknown fields (useful for forward-compatible configs)
Lenient,
}
pub fn parse_config(
path: &Path, // Type-enforced: must be a filesystem path
format: Format, // Enum: impossible to pass invalid format
strictness: Strictness, // Named alternatives, not a bare bool
) -> Result<Config, ConfigError> {
todo!()
}
}
What improved:
| Aspect | Before | After |
|---|---|---|
| Format validation | Runtime string comparison | Compile-time enum |
| Path type | Raw &str (could be anything) | &Path (filesystem-specific) |
| Strictness | Mystery bool | Self-documenting enum |
| Error type | String (opaque) | ConfigError (structured) |
| Extensibility | Breaking changes | #[non_exhaustive] |
Rule of thumb: If you find yourself writing a
matchon string values, consider replacing the parameter with an enum. If a parameter is a boolean that isn’t obvious from context, use a two-variant enum instead.
Parse Don’t Validate — TryFrom and Validated Types
“Parse, don’t validate” is a principle that says: don’t check data and then pass
around the raw unchecked form — instead, parse it into a type that can only exist
if the data is valid. Rust’s TryFrom trait is the standard tool for this.
The Problem: Validation Without Enforcement
#![allow(unused)]
fn main() {
// ❌ Validate-then-use: nothing prevents using an invalid value after the check
fn process_port(port: u16) {
if port == 0 || port > 65535 {
panic!("Invalid port"); // We checked, but...
}
start_server(port); // What if someone calls start_server(0) directly?
}
// ❌ Stringly-typed: an email is just a String — any garbage gets through
fn send_email(to: String, body: String) {
// Is `to` actually a valid email? We don't know.
// Someone could pass "not-an-email" and we only find out at the SMTP server.
}
}
The Solution: Parse Into Validated Newtypes with TryFrom
use std::convert::TryFrom;
use std::fmt;
/// A validated TCP port number (1–65535).
/// If you have a `Port`, it is guaranteed valid.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Port(u16);
impl TryFrom<u16> for Port {
type Error = PortError;
fn try_from(value: u16) -> Result<Self, Self::Error> {
if value == 0 {
Err(PortError::Zero)
} else {
Ok(Port(value))
}
}
}
impl Port {
pub fn get(&self) -> u16 { self.0 }
}
#[derive(Debug)]
pub enum PortError {
Zero,
InvalidFormat,
}
impl fmt::Display for PortError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
PortError::Zero => write!(f, "port must be non-zero"),
PortError::InvalidFormat => write!(f, "invalid port format"),
}
}
}
impl std::error::Error for PortError {}
// Now the type system enforces validity:
fn start_server(port: Port) {
// No validation needed — Port can only be constructed via TryFrom,
// which already verified it's valid.
println!("Listening on port {}", port.get());
}
// Usage:
fn main() -> Result<(), Box<dyn std::error::Error>> {
let port = Port::try_from(8080)?; // ✅ Validated once at the boundary
start_server(port); // No re-validation anywhere downstream
let bad = Port::try_from(0); // ❌ Err(PortError::Zero)
Ok(())
}
Real-World Example: Validated IPMI Address
#![allow(unused)]
fn main() {
/// A validated IPMI slave address (0x20–0xFE, even only).
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct IpmiAddr(u8);
#[derive(Debug)]
pub enum IpmiAddrError {
Odd(u8),
OutOfRange(u8),
}
impl fmt::Display for IpmiAddrError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
IpmiAddrError::Odd(v) => write!(f, "IPMI address 0x{v:02X} must be even"),
IpmiAddrError::OutOfRange(v) => {
write!(f, "IPMI address 0x{v:02X} out of range (0x20..=0xFE)")
}
}
}
}
impl TryFrom<u8> for IpmiAddr {
type Error = IpmiAddrError;
fn try_from(value: u8) -> Result<Self, Self::Error> {
if value % 2 != 0 {
Err(IpmiAddrError::Odd(value))
} else if value < 0x20 || value > 0xFE {
Err(IpmiAddrError::OutOfRange(value))
} else {
Ok(IpmiAddr(value))
}
}
}
impl IpmiAddr {
pub fn get(&self) -> u8 { self.0 }
}
// Downstream code never needs to re-check:
fn send_ipmi_command(addr: IpmiAddr, cmd: u8, data: &[u8]) -> Result<Vec<u8>, IpmiError> {
// addr.get() is guaranteed to be a valid, even IPMI address
raw_ipmi_send(addr.get(), cmd, data)
}
}
Parsing Strings with FromStr
For types that are commonly parsed from text (CLI args, config files), implement FromStr:
#![allow(unused)]
fn main() {
use std::str::FromStr;
impl FromStr for Port {
type Err = PortError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let n: u16 = s.parse().map_err(|_| PortError::InvalidFormat)?;
Port::try_from(n)
}
}
// Now works with .parse():
let port: Port = "8080".parse()?; // Validates in one step
// And with clap CLI parsing:
// #[derive(Parser)]
// struct Args {
// #[arg(short, long)]
// port: Port, // clap calls FromStr automatically
// }
}
TryFrom Chain for Complex Validation
#![allow(unused)]
fn main() {
// Stub types for this example — in production these would be in
// separate modules with their own TryFrom implementations.
}
#![allow(unused)]
fn main() {
struct Hostname(String);
impl TryFrom<String> for Hostname {
type Error = String;
fn try_from(s: String) -> Result<Self, String> { Ok(Hostname(s)) }
}
struct Timeout(u64);
impl TryFrom<u64> for Timeout {
type Error = String;
fn try_from(ms: u64) -> Result<Self, String> {
if ms == 0 { Err("timeout must be > 0".into()) } else { Ok(Timeout(ms)) }
}
}
struct RawConfig { host: String, port: u16, timeout_ms: u64 }
#[derive(Debug)]
enum ConfigError {
InvalidHost(String),
InvalidPort(PortError),
InvalidTimeout(String),
}
impl From<std::io::Error> for ConfigError {
fn from(e: std::io::Error) -> Self { ConfigError::InvalidHost(e.to_string()) }
}
impl From<serde_json::Error> for ConfigError {
fn from(e: serde_json::Error) -> Self { ConfigError::InvalidHost(e.to_string()) }
}
/// A validated configuration that can only exist if all fields are valid.
pub struct ValidConfig {
pub host: Hostname,
pub port: Port,
pub timeout_ms: Timeout,
}
impl TryFrom<RawConfig> for ValidConfig {
type Error = ConfigError;
fn try_from(raw: RawConfig) -> Result<Self, Self::Error> {
Ok(ValidConfig {
host: Hostname::try_from(raw.host)
.map_err(ConfigError::InvalidHost)?,
port: Port::try_from(raw.port)
.map_err(ConfigError::InvalidPort)?,
timeout_ms: Timeout::try_from(raw.timeout_ms)
.map_err(ConfigError::InvalidTimeout)?,
})
}
}
// Parse once at the boundary, use the validated type everywhere:
fn load_config(path: &str) -> Result<ValidConfig, ConfigError> {
let raw: RawConfig = serde_json::from_str(&std::fs::read_to_string(path)?)?;
ValidConfig::try_from(raw) // All validation happens here
}
}
Summary: Validate vs Parse
| Approach | Data checked? | Compiler enforces validity? | Re-validation needed? |
|---|---|---|---|
| Runtime checks (if/assert) | ✅ | ❌ | Every function boundary |
Validated newtype + TryFrom | ✅ | ✅ | Never — type is proof |
The rule: parse at the boundary, use validated types everywhere inside.
Raw strings, integers, and byte slices enter your system, get parsed into
validated types via TryFrom/FromStr, and from that point forward the type
system guarantees they’re valid.
Feature Flags and Conditional Compilation
Cargo.toml
[features] default = [“json”] # Enabled by default json = [“dep:serde_json”] # Enables JSON support xml = [“dep:quick-xml”] # Enables XML support full = [“json”, “xml”] # Meta-feature: enables all
[dependencies] serde = “1” serde_json = { version = “1”, optional = true } quick-xml = { version = “0.31”, optional = true }
#![allow(unused)]
fn main() {
// Conditional compilation based on features:
#[cfg(feature = "json")]
pub fn to_json<T: serde::Serialize>(value: &T) -> String {
serde_json::to_string(value).unwrap()
}
#[cfg(feature = "xml")]
pub fn to_xml<T: serde::Serialize>(value: &T) -> String {
quick_xml::se::to_string(value).unwrap()
}
// Compile error if a required feature isn't enabled:
#[cfg(not(any(feature = "json", feature = "xml")))]
compile_error!("At least one format feature (json, xml) must be enabled");
}
Best practices:
- Keep
defaultfeatures minimal — users can opt in - Use
dep:syntax (Rust 1.60+) for optional dependencies to avoid creating implicit features - Document features in your README and crate-level docs
Workspace Organization
For large projects, use a Cargo workspace to share dependencies and build artifacts:
Root Cargo.toml
[workspace] members = [ “core”, # Shared types and traits “parser”, # Parsing library “server”, # Binary — the main application “client”, # Client library “cli”, # CLI binary ]
Shared dependency versions:
[workspace.dependencies] serde = { version = “1”, features = [“derive”] } tokio = { version = “1”, features = [“full”] } tracing = “0.1”
In each member’s Cargo.toml:
[dependencies]
serde =
#![allow(unused)]
fn main() {
**Benefits**:
}
- Single
Cargo.lock— all crates use the same dependency versions cargo test --workspaceruns all tests- Shared build cache — compiling one crate benefits all
- Clean dependency boundaries between components
.cargo/config.toml: Project-Level Configuration
The .cargo/config.toml file (at the workspace root or in $HOME/.cargo/)
customizes Cargo behavior without modifying Cargo.toml:
.cargo/config.toml
Default target for this workspace
[build] target = “x86_64-unknown-linux-gnu”
Custom runner — e.g., run via QEMU for cross-compiled binaries
[target.aarch64-unknown-linux-gnu] runner = “qemu-aarch64-static” linker = “aarch64-linux-gnu-gcc”
Cargo aliases — custom shortcut commands
[alias] xt = “test –workspace –release” # cargo xt = run all tests in release ci = “clippy –workspace – -D warnings” # cargo ci = lint with errors on warnings cov = “llvm-cov –workspace” # cargo cov = coverage (requires cargo-llvm-cov)
Environment variables for build scripts
[env] IPMI_LIB_PATH = “/usr/lib/bmc”
Use a custom registry (for internal packages)
[registries.internal]
index = “https://gitlab.internal/crates/index”
#![allow(unused)]
fn main() {
Common configuration patterns:
}
| Setting | Purpose | Example |
|---|---|---|
[build] target | Default compilation target | x86_64-unknown-linux-musl for static builds |
[target.X] runner | How to run the binary | "qemu-aarch64-static" for cross-compiled |
[target.X] linker | Which linker to use | "aarch64-linux-gnu-gcc" |
[alias] | Custom cargo subcommands | xt = "test --workspace" |
[env] | Build-time environment variables | Library paths, feature toggles |
[net] offline | Prevent network access | true for air-gapped builds |
Compile-Time Environment Variables: env!() and option_env!()
Rust can embed environment variables into the binary at compile time — useful for version strings, build metadata, and configuration:
#![allow(unused)]
fn main() {
// env!() — panics at compile time if the variable is missing
const VERSION: &str = env!("CARGO_PKG_VERSION"); // "0.1.0" from Cargo.toml
const PKG_NAME: &str = env!("CARGO_PKG_NAME"); // Crate name from Cargo.toml
// option_env!() — returns Option<&str>, doesn't panic if missing
const BUILD_SHA: Option<&str> = option_env!("GIT_SHA");
const BUILD_TIME: Option<&str> = option_env!("BUILD_TIMESTAMP");
fn print_version() {
println!("{PKG_NAME} v{VERSION}");
if let Some(sha) = BUILD_SHA {
println!(" commit: {sha}");
}
if let Some(time) = BUILD_TIME {
println!(" built: {time}");
}
}
}
Cargo automatically sets many useful environment variables:
| Variable | Value | Use case |
|---|---|---|
CARGO_PKG_VERSION | "1.2.3" | Version reporting |
CARGO_PKG_NAME | "diag_tool" | Binary identification |
CARGO_PKG_AUTHORS | From Cargo.toml | About/help text |
CARGO_MANIFEST_DIR | Absolute path to Cargo.toml | Locating test data files |
OUT_DIR | Build output directory | build.rs code generation target |
TARGET | Target triple | Platform-specific logic in build.rs |
You can set custom env vars from build.rs:
// build.rs
fn main() {
println!("cargo::rustc-env=GIT_SHA={}", git_sha());
println!("cargo::rustc-env=BUILD_TIMESTAMP={}", timestamp());
}
cfg_attr: Conditional Attributes
cfg_attr applies an attribute only when a condition is true. This is more
targeted than #[cfg()], which includes/excludes entire items:
#![allow(unused)]
fn main() {
// Derive Serialize only when the "serde" feature is enabled:
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
#[derive(Debug, Clone)]
pub struct DiagResult {
pub fc: u32,
pub passed: bool,
pub message: String,
}
// Without "serde" feature: no serde dependency needed at all
// With "serde" feature: DiagResult is serializable
// Conditional attribute for testing:
#[cfg_attr(test, derive(PartialEq))] // Only derive PartialEq in test builds
pub struct LargeStruct { /* ... */ }
// Platform-specific function attributes:
#[cfg_attr(target_os = "linux", link_name = "ioctl")]
#[cfg_attr(target_os = "freebsd", link_name = "__ioctl")]
extern "C" fn platform_ioctl(fd: i32, request: u64) -> i32;
}
| Pattern | What it does |
|---|---|
#[cfg(feature = "x")] | Include/exclude the entire item |
#[cfg_attr(feature = "x", derive(Foo))] | Add derive(Foo) only when feature “x” is on |
#[cfg_attr(test, allow(unused))] | Suppress warnings only in test builds |
#[cfg_attr(doc, doc = "...")] | Documentation visible only in cargo doc |
cargo deny and cargo audit: Supply-Chain Security
Install security audit tools
cargo install cargo-deny cargo install cargo-audit
Check for known vulnerabilities in dependencies
cargo audit
Comprehensive checks: licenses, bans, advisories, sources
cargo deny check
#![allow(unused)]
fn main() {
Configure `cargo deny` with a `deny.toml` at the workspace root:
}
deny.toml
[advisories] vulnerability = “deny” # Fail on known vulnerabilities unmaintained = “warn” # Warn on unmaintained crates
[licenses] allow = [“MIT”, “Apache-2.0”, “BSD-2-Clause”, “BSD-3-Clause”] deny = [“GPL-3.0”] # Reject copyleft licenses
[bans] multiple-versions = “warn” # Warn if multiple versions of same crate deny = [
#![allow(unused)]
fn main() {
{ name = "openssl" }, # Force use of rustls instead
]
[sources]
allow-git = [] # No git dependencies in production
}
| Tool | Purpose | When to run |
|---|---|---|
cargo audit | Check for known CVEs in dependencies | CI pipeline, pre-release |
cargo deny check | Licenses, bans, advisories, sources | CI pipeline |
cargo deny check licenses | License compliance only | Before open-sourcing |
cargo deny check bans | Prevent specific crates | Enforce architecture decisions |
Doc Tests: Tests Inside Documentation
Rust doc comments (///) can contain code blocks that are compiled and run as tests:
#![allow(unused)]
fn main() {
/// Parses a diagnostic fault code from a string.
///
/// # Examples
///
/// ```
/// use my_crate::parse_fc;
///
/// let fc = parse_fc("FC:12345").unwrap();
/// assert_eq!(fc, 12345);
/// ```
///
/// Invalid input returns an error:
///
/// ```
/// use my_crate::parse_fc;
///
/// assert!(parse_fc("not-a-fc").is_err());
/// ```
pub fn parse_fc(input: &str) -> Result<u32, ParseError> {
input.strip_prefix("FC:")
.ok_or(ParseError::MissingPrefix)?
.parse()
.map_err(ParseError::InvalidNumber)
}
}
cargo test --doc # Run only doc tests
cargo test # Runs unit + integration + doc tests
Module-level documentation uses //! at the top of a file:
#![allow(unused)]
fn main() {
//! # Diagnostic Framework
//!
//! This crate provides the core diagnostic execution engine.
//! It supports running diagnostic tests, collecting results,
//! and reporting to the BMC via IPMI.
//!
//! ## Quick Start
//!
//! ```no_run
//! use diag_framework::Framework;
//!
//! let mut fw = Framework::new("config.json")?;
//! fw.run_all_tests()?;
//! ```
}
Benchmarking with Criterion
Full coverage: See the Benchmarking with criterion section in Chapter 13 (Testing and Benchmarking Patterns) for complete
criterionsetup, API examples, and a comparison table vscargo bench. Below is a quick-reference for architecture-specific usage.
When benchmarking your crate’s public API, place benchmarks in benches/ and
keep them focused on the hot path — typically parsers, serializers, or
validation boundaries:
cargo bench # Run all benchmarks
cargo bench -- parse_config # Run specific benchmark
# Results in target/criterion/ with HTML reports
Key Takeaways — Architecture & API Design
- Accept the most general type (
impl Into,impl AsRef,Cow); return the most specific- Parse Don’t Validate: use
TryFromto create types that are valid by construction#[non_exhaustive]on public enums prevents breaking changes when adding variants#[must_use]catches silent discards of important values
See also: Ch 9 — Error Handling for error type design in public APIs. Ch 13 — Testing for testing your crate’s public API.
Exercise: Crate API Refactoring ★★ (~30 min)
Refactor the following “stringly-typed” API into one that uses TryFrom, newtypes, and builder pattern:
// BEFORE: Easy to misuse
fn create_server(host: &str, port: &str, max_conn: &str) -> Server { ... }
Design a ServerConfig with validated types Host, Port (1–65535), and MaxConnections (1–10000) that reject invalid values at parse time.
🔑 Solution
#[derive(Debug, Clone)]
struct Host(String);
impl TryFrom<&str> for Host {
type Error = String;
fn try_from(s: &str) -> Result<Self, String> {
if s.is_empty() { return Err("host cannot be empty".into()); }
if s.contains(' ') { return Err("host cannot contain spaces".into()); }
Ok(Host(s.to_string()))
}
}
#[derive(Debug, Clone, Copy)]
struct Port(u16);
impl TryFrom<u16> for Port {
type Error = String;
fn try_from(p: u16) -> Result<Self, String> {
if p == 0 { return Err("port must be >= 1".into()); }
Ok(Port(p))
}
}
#[derive(Debug, Clone, Copy)]
struct MaxConnections(u32);
impl TryFrom<u32> for MaxConnections {
type Error = String;
fn try_from(n: u32) -> Result<Self, String> {
if n == 0 || n > 10_000 {
return Err(format!("max_connections must be 1–10000, got {n}"));
}
Ok(MaxConnections(n))
}
}
#[derive(Debug)]
struct ServerConfig {
host: Host,
port: Port,
max_connections: MaxConnections,
}
impl ServerConfig {
fn new(host: Host, port: Port, max_connections: MaxConnections) -> Self {
ServerConfig { host, port, max_connections }
}
}
fn main() {
let config = ServerConfig::new(
Host::try_from("localhost").unwrap(),
Port::try_from(8080).unwrap(),
MaxConnections::try_from(100).unwrap(),
);
println!("{config:?}");
// Invalid values caught at parse time:
assert!(Host::try_from("").is_err());
assert!(Port::try_from(0).is_err());
assert!(MaxConnections::try_from(99999).is_err());
}
15. Async/Await Essentials 🔴
What you’ll learn:
- How Rust’s
Futuretrait differs from Go’s goroutines and Python’s asyncio- Tokio quick-start: spawning tasks,
join!, and runtime configuration- Common async pitfalls and how to fix them
- When to offload blocking work with
spawn_blocking
Futures, Runtimes, and async fn
Rust’s async model is fundamentally different from Go’s goroutines or Python’s asyncio.
Understanding three concepts is enough to get started:
- A
Futureis a lazy state machine — callingasync fndoesn’t execute anything; it returns aFuturethat must be polled. - You need a runtime to poll futures —
tokio,async-std, orsmol. The standard library definesFuturebut provides no runtime. async fnis sugar — the compiler transforms it into a state machine that implementsFuture.
#![allow(unused)]
fn main() {
// A Future is just a trait:
pub trait Future {
type Output;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}
// async fn desugars to:
// fn fetch_data(url: &str) -> impl Future<Output = Result<Vec<u8>, Error>>
async fn fetch_data(url: &str) -> Result<Vec<u8>, reqwest::Error> {
let response = reqwest::get(url).await?; // .await yields until ready
let bytes = response.bytes().await?;
Ok(bytes.to_vec())
}
}
Tokio Quick Start
Cargo.toml
[dependencies] tokio = { version = “1”, features = [“full”] }
use tokio::time::{sleep, Duration};
use tokio::task;
#[tokio::main]
async fn main() {
// Spawn concurrent tasks (like lightweight threads):
let handle_a = task::spawn(async {
sleep(Duration::from_millis(100)).await;
"task A done"
});
let handle_b = task::spawn(async {
sleep(Duration::from_millis(50)).await;
"task B done"
});
// .await both — they run concurrently, not sequentially:
let (a, b) = tokio::join!(handle_a, handle_b);
println!("{}, {}", a.unwrap(), b.unwrap());
}
Async Common Pitfalls
| Pitfall | Why It Happens | Fix |
|---|---|---|
| Blocking in async | std::thread::sleep or CPU work blocks the executor | Use tokio::task::spawn_blocking or rayon |
Send bound errors | Future held across .await contains !Send type (e.g., Rc, MutexGuard) | Restructure to drop non-Send values before .await |
| Future not polled | Calling async fn without .await or spawning — nothing happens | Always .await or tokio::spawn the returned future |
Holding MutexGuard across .await | std::sync::MutexGuard is !Send; async tasks may resume on different thread | Use tokio::sync::Mutex or drop the guard before .await |
| Accidental sequential execution | let a = foo().await; let b = bar().await; runs sequentially | Use tokio::join! or tokio::spawn for concurrency |
#![allow(unused)]
fn main() {
// ❌ Blocking the async executor:
async fn bad() {
std::thread::sleep(std::time::Duration::from_secs(5)); // Blocks entire thread!
}
// ✅ Offload blocking work:
async fn good() {
tokio::task::spawn_blocking(|| {
std::thread::sleep(std::time::Duration::from_secs(5)); // Runs on blocking pool
}).await.unwrap();
}
}
Comprehensive async coverage: For
Stream,select!, cancellation safety, structured concurrency, andtowermiddleware, see our dedicated Async Rust Training guide. This section covers just enough to read and write basic async code.
Spawning and Structured Concurrency
Tokio’s spawn creates a new asynchronous task — similar to thread::spawn but
much lighter:
use tokio::task;
use tokio::time::{sleep, Duration};
#[tokio::main]
async fn main() {
// Spawn three concurrent tasks
let h1 = task::spawn(async {
sleep(Duration::from_millis(200)).await;
"fetched user profile"
});
let h2 = task::spawn(async {
sleep(Duration::from_millis(100)).await;
"fetched order history"
});
let h3 = task::spawn(async {
sleep(Duration::from_millis(150)).await;
"fetched recommendations"
});
// Wait for all three concurrently (not sequentially!)
let (r1, r2, r3) = tokio::join!(h1, h2, h3);
println!("{}", r1.unwrap());
println!("{}", r2.unwrap());
println!("{}", r3.unwrap());
}
join! vs try_join! vs select!:
| Macro | Behavior | Use when |
|---|---|---|
join! | Waits for ALL futures | All tasks must complete |
try_join! | Waits for all, short-circuits on first Err | Tasks return Result |
select! | Returns when FIRST future completes | Timeouts, cancellation |
use tokio::time::{timeout, Duration};
async fn fetch_with_timeout() -> Result<String, Box<dyn std::error::Error>> {
let result = timeout(Duration::from_secs(5), async {
// Simulate slow network call
tokio::time::sleep(Duration::from_millis(100)).await;
Ok::<_, Box<dyn std::error::Error>>("data".to_string())
}).await??; // First ? unwraps Elapsed, second ? unwraps inner Result
Ok(result)
}
Send Bounds and Why Futures Must Be Send
When you tokio::spawn a future, it may resume on a different OS thread.
This means the future must be Send. Common pitfalls:
use std::rc::Rc;
async fn not_send() {
let rc = Rc::new(42); // Rc is !Send
tokio::time::sleep(std::time::Duration::from_millis(10)).await;
println!("{}", rc); // rc is held across .await — future is !Send
}
// Fix 1: Drop before .await
async fn fixed_drop() {
let data = {
let rc = Rc::new(42);
*rc // Copy the value out
}; // rc dropped here
tokio::time::sleep(std::time::Duration::from_millis(10)).await;
println!("{}", data); // Just an i32, which is Send
}
// Fix 2: Use Arc instead of Rc
async fn fixed_arc() {
let arc = std::sync::Arc::new(42); // Arc is Send
tokio::time::sleep(std::time::Duration::from_millis(10)).await;
println!("{}", arc); // ✅ Future is Send
}
Comprehensive async coverage: For
Stream,select!, cancellation safety, structured concurrency, andtowermiddleware, see our dedicated Async Rust Training guide. This section covers just enough to read and write basic async code.
See also: Ch 5 — Channels for synchronous channels. Ch 6 — Concurrency for OS threads vs async tasks.
Key Takeaways — Async
async fnreturns a lazyFuture— nothing runs until you.awaitor spawn it- Use
tokio::task::spawn_blockingfor CPU-heavy or blocking work inside async contexts- Don’t hold
std::sync::MutexGuardacross.await— usetokio::sync::Mutexinstead- Futures must be
Sendwhen spawned — drop!Sendtypes before.awaitpoints
Exercise: Concurrent Fetcher with Timeout ★★ (~25 min)
Write an async function fetch_all that spawns three tokio::spawn tasks, each
simulating a network call with tokio::time::sleep. Join all three with
tokio::try_join! wrapped in tokio::time::timeout(Duration::from_secs(5), ...).
Return Result<Vec<String>, ...> or an error if any task fails or the deadline
expires.
🔑 Solution
use tokio::time::{sleep, timeout, Duration};
async fn fake_fetch(name: &'static str, delay_ms: u64) -> Result<String, String> {
sleep(Duration::from_millis(delay_ms)).await;
Ok(format!("{name}: OK"))
}
async fn fetch_all() -> Result<Vec<String>, Box<dyn std::error::Error>> {
let deadline = Duration::from_secs(5);
let (a, b, c) = timeout(deadline, async {
let h1 = tokio::spawn(fake_fetch("svc-a", 100));
let h2 = tokio::spawn(fake_fetch("svc-b", 200));
let h3 = tokio::spawn(fake_fetch("svc-c", 150));
tokio::try_join!(h1, h2, h3)
})
.await??;
Ok(vec![a?, b?, c?])
}
#[tokio::main]
async fn main() {
let results = fetch_all().await.unwrap();
for r in &results {
println!("{r}");
}
}
17. Exercises
Exercises
Exercise 1: Type-Safe State Machine ★★ (~30 min)
Build a traffic light state machine using the type-state pattern. The light must transition Red → Green → Yellow → Red and no other order should be possible.
🔑 Solution
use std::marker::PhantomData;
struct Red;
struct Green;
struct Yellow;
struct TrafficLight<State> {
_state: PhantomData<State>,
}
impl TrafficLight<Red> {
fn new() -> Self {
println!("🔴 Red — STOP");
TrafficLight { _state: PhantomData }
}
fn go(self) -> TrafficLight<Green> {
println!("🟢 Green — GO");
TrafficLight { _state: PhantomData }
}
}
impl TrafficLight<Green> {
fn caution(self) -> TrafficLight<Yellow> {
println!("🟡 Yellow — CAUTION");
TrafficLight { _state: PhantomData }
}
}
impl TrafficLight<Yellow> {
fn stop(self) -> TrafficLight<Red> {
println!("🔴 Red — STOP");
TrafficLight { _state: PhantomData }
}
}
fn main() {
let light = TrafficLight::new(); // Red
let light = light.go(); // Green
let light = light.caution(); // Yellow
let light = light.stop(); // Red
// light.caution(); // ❌ Compile error: no method `caution` on Red
// TrafficLight::new().stop(); // ❌ Compile error: no method `stop` on Red
}
Key takeaway: Invalid transitions are compile errors, not runtime panics.
Exercise 2: Unit-of-Measure with PhantomData ★★ (~30 min)
Extend the unit-of-measure pattern from Ch4 to support:
Meters,Seconds,Kilograms- Addition of same units
- Multiplication:
Meters * Meters = SquareMeters - Division:
Meters / Seconds = MetersPerSecond
🔑 Solution
use std::marker::PhantomData;
use std::ops::{Add, Mul, Div};
#[derive(Clone, Copy)]
struct Meters;
#[derive(Clone, Copy)]
struct Seconds;
#[derive(Clone, Copy)]
struct Kilograms;
#[derive(Clone, Copy)]
struct SquareMeters;
#[derive(Clone, Copy)]
struct MetersPerSecond;
#[derive(Debug, Clone, Copy)]
struct Qty<U> {
value: f64,
_unit: PhantomData<U>,
}
impl<U> Qty<U> {
fn new(v: f64) -> Self { Qty { value: v, _unit: PhantomData } }
}
impl<U> Add for Qty<U> {
type Output = Qty<U>;
fn add(self, rhs: Self) -> Self::Output { Qty::new(self.value + rhs.value) }
}
impl Mul<Qty<Meters>> for Qty<Meters> {
type Output = Qty<SquareMeters>;
fn mul(self, rhs: Qty<Meters>) -> Qty<SquareMeters> {
Qty::new(self.value * rhs.value)
}
}
impl Div<Qty<Seconds>> for Qty<Meters> {
type Output = Qty<MetersPerSecond>;
fn div(self, rhs: Qty<Seconds>) -> Qty<MetersPerSecond> {
Qty::new(self.value / rhs.value)
}
}
fn main() {
let width = Qty::<Meters>::new(5.0);
let height = Qty::<Meters>::new(3.0);
let area = width * height; // Qty<SquareMeters>
println!("Area: {:.1} m²", area.value);
let dist = Qty::<Meters>::new(100.0);
let time = Qty::<Seconds>::new(9.58);
let speed = dist / time;
println!("Speed: {:.2} m/s", speed.value);
let sum = width + height; // Same unit ✅
println!("Sum: {:.1} m", sum.value);
// let bad = width + time; // ❌ Compile error: can't add Meters + Seconds
}
Exercise 3: Channel-Based Worker Pool ★★★ (~45 min)
Build a worker pool using channels where:
- A dispatcher sends
Jobstructs through a channel - N workers consume jobs and send results back
- Use
crossbeam-channel(orstd::sync::mpscif crossbeam is unavailable)
🔑 Solution
use std::sync::mpsc;
use std::thread;
struct Job {
id: u64,
data: String,
}
struct JobResult {
job_id: u64,
output: String,
worker_id: usize,
}
fn worker_pool(jobs: Vec<Job>, num_workers: usize) -> Vec<JobResult> {
let (job_tx, job_rx) = mpsc::channel::<Job>();
let (result_tx, result_rx) = mpsc::channel::<JobResult>();
// Wrap receiver in Arc<Mutex> for sharing among workers
let job_rx = std::sync::Arc::new(std::sync::Mutex::new(job_rx));
// Spawn workers
let mut handles = Vec::new();
for worker_id in 0..num_workers {
let job_rx = job_rx.clone();
let result_tx = result_tx.clone();
handles.push(thread::spawn(move || {
loop {
// Lock, receive, unlock — short critical section
let job = {
let rx = job_rx.lock().unwrap();
rx.recv() // Blocks until a job or channel closes
};
match job {
Ok(job) => {
let output = format!("processed '{}' by worker {worker_id}", job.data);
result_tx.send(JobResult {
job_id: job.id,
output,
worker_id,
}).unwrap();
}
Err(_) => break, // Channel closed — exit
}
}
}));
}
drop(result_tx); // Drop our copy so result channel closes when workers finish
// Dispatch jobs
let num_jobs = jobs.len();
for job in jobs {
job_tx.send(job).unwrap();
}
drop(job_tx); // Close the job channel — workers will exit after draining
// Collect results
let mut results = Vec::new();
for result in result_rx {
results.push(result);
}
assert_eq!(results.len(), num_jobs);
for h in handles { h.join().unwrap(); }
results
}
fn main() {
let jobs: Vec<Job> = (0..20).map(|i| Job {
id: i,
data: format!("task-{i}"),
}).collect();
let results = worker_pool(jobs, 4);
for r in &results {
println!("[worker {}] job {}: {}", r.worker_id, r.job_id, r.output);
}
}
Exercise 4: Higher-Order Combinator Pipeline ★★ (~25 min)
Create a Pipeline struct that chains transformations. It should support .pipe(f) to add a transformation and .execute(input) to run the full chain.
🔑 Solution
struct Pipeline<T> {
transforms: Vec<Box<dyn Fn(T) -> T>>,
}
impl<T: 'static> Pipeline<T> {
fn new() -> Self {
Pipeline { transforms: Vec::new() }
}
fn pipe(mut self, f: impl Fn(T) -> T + 'static) -> Self {
self.transforms.push(Box::new(f));
self
}
fn execute(self, input: T) -> T {
self.transforms.into_iter().fold(input, |val, f| f(val))
}
}
fn main() {
let result = Pipeline::new()
.pipe(|s: String| s.trim().to_string())
.pipe(|s| s.to_uppercase())
.pipe(|s| format!(">>> {s} <<<"))
.execute(" hello world ".to_string());
println!("{result}"); // >>> HELLO WORLD <<<
// Numeric pipeline:
let result = Pipeline::new()
.pipe(|x: i32| x * 2)
.pipe(|x| x + 10)
.pipe(|x| x * x)
.execute(5);
println!("{result}"); // (5*2 + 10)^2 = 400
}
Bonus: Generic pipeline that changes type between stages would use a different design — each .pipe() returns a Pipeline with a different output type (this requires more advanced generic plumbing).
Exercise 5: Error Hierarchy with thiserror ★★ (~30 min)
Design an error type hierarchy for a file-processing application that can fail during I/O, parsing (JSON and CSV), and validation. Use thiserror and demonstrate ? propagation.
🔑 Solution
use thiserror::Error;
#[derive(Error, Debug)]
pub enum AppError {
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),
#[error("JSON parse error: {0}")]
Json(#[from] serde_json::Error),
#[error("CSV error at line {line}: {message}")]
Csv { line: usize, message: String },
#[error("validation error: {field} — {reason}")]
Validation { field: String, reason: String },
}
fn read_file(path: &str) -> Result<String, AppError> {
Ok(std::fs::read_to_string(path)?) // io::Error → AppError::Io via #[from]
}
fn parse_json(content: &str) -> Result<serde_json::Value, AppError> {
Ok(serde_json::from_str(content)?) // serde_json::Error → AppError::Json
}
fn validate_name(value: &serde_json::Value) -> Result<String, AppError> {
let name = value.get("name")
.and_then(|v| v.as_str())
.ok_or_else(|| AppError::Validation {
field: "name".into(),
reason: "must be a non-null string".into(),
})?;
if name.is_empty() {
return Err(AppError::Validation {
field: "name".into(),
reason: "must not be empty".into(),
});
}
Ok(name.to_string())
}
fn process_file(path: &str) -> Result<String, AppError> {
let content = read_file(path)?;
let json = parse_json(&content)?;
let name = validate_name(&json)?;
Ok(name)
}
fn main() {
match process_file("config.json") {
Ok(name) => println!("Name: {name}"),
Err(e) => eprintln!("Error: {e}"),
}
}
Exercise 6: Generic Trait with Associated Types ★★★ (~40 min)
Design a Repository<T> trait with associated Error and Id types. Implement it for an in-memory store and demonstrate compile-time type safety.
🔑 Solution
use std::collections::HashMap;
trait Repository {
type Item;
type Id;
type Error;
fn get(&self, id: &Self::Id) -> Result<Option<&Self::Item>, Self::Error>;
fn insert(&mut self, item: Self::Item) -> Result<Self::Id, Self::Error>;
fn delete(&mut self, id: &Self::Id) -> Result<bool, Self::Error>;
}
#[derive(Debug, Clone)]
struct User {
name: String,
email: String,
}
struct InMemoryUserRepo {
data: HashMap<u64, User>,
next_id: u64,
}
impl InMemoryUserRepo {
fn new() -> Self {
InMemoryUserRepo { data: HashMap::new(), next_id: 1 }
}
}
// Error type is Infallible — in-memory ops never fail
impl Repository for InMemoryUserRepo {
type Item = User;
type Id = u64;
type Error = std::convert::Infallible;
fn get(&self, id: &u64) -> Result<Option<&User>, Self::Error> {
Ok(self.data.get(id))
}
fn insert(&mut self, item: User) -> Result<u64, Self::Error> {
let id = self.next_id;
self.next_id += 1;
self.data.insert(id, item);
Ok(id)
}
fn delete(&mut self, id: &u64) -> Result<bool, Self::Error> {
Ok(self.data.remove(id).is_some())
}
}
// Generic function works with ANY repository:
fn create_and_fetch<R: Repository>(repo: &mut R, item: R::Item) -> Result<(), R::Error>
where
R::Item: std::fmt::Debug,
R::Id: std::fmt::Debug,
{
let id = repo.insert(item)?;
println!("Inserted with id: {id:?}");
let retrieved = repo.get(&id)?;
println!("Retrieved: {retrieved:?}");
Ok(())
}
fn main() {
let mut repo = InMemoryUserRepo::new();
create_and_fetch(&mut repo, User {
name: "Alice".into(),
email: "alice@example.com".into(),
}).unwrap();
}
Exercise 7: Safe Wrapper around Unsafe (Ch11) ★★★ (~45 min)
Write a FixedVec<T, const N: usize> — a fixed-capacity, stack-allocated vector.
Requirements:
push(&mut self, value: T) -> Result<(), T>returnsErr(value)when fullpop(&mut self) -> Option<T>returns and removes the last elementas_slice(&self) -> &[T]borrows initialized elements- All public methods must be safe; all unsafe must be encapsulated with
SAFETY:comments Dropmust clean up initialized elements
Hint: Use MaybeUninit<T> and [const { MaybeUninit::uninit() }; N].
🔑 Solution
use std::mem::MaybeUninit;
pub struct FixedVec<T, const N: usize> {
data: [MaybeUninit<T>; N],
len: usize,
}
impl<T, const N: usize> FixedVec<T, N> {
pub fn new() -> Self {
FixedVec {
data: [const { MaybeUninit::uninit() }; N],
len: 0,
}
}
pub fn push(&mut self, value: T) -> Result<(), T> {
if self.len >= N { return Err(value); }
// SAFETY: len < N, so data[len] is within bounds.
self.data[self.len] = MaybeUninit::new(value);
self.len += 1;
Ok(())
}
pub fn pop(&mut self) -> Option<T> {
if self.len == 0 { return None; }
self.len -= 1;
// SAFETY: data[len] was initialized (len was > 0 before decrement).
Some(unsafe { self.data[self.len].assume_init_read() })
}
pub fn as_slice(&self) -> &[T] {
// SAFETY: data[0..len] are all initialized, and MaybeUninit<T>
// has the same layout as T.
unsafe { std::slice::from_raw_parts(self.data.as_ptr() as *const T, self.len) }
}
pub fn len(&self) -> usize { self.len }
pub fn is_empty(&self) -> bool { self.len == 0 }
}
impl<T, const N: usize> Drop for FixedVec<T, N> {
fn drop(&mut self) {
// SAFETY: data[0..len] are initialized — drop each one.
for i in 0..self.len {
unsafe { self.data[i].assume_init_drop(); }
}
}
}
fn main() {
let mut v = FixedVec::<String, 4>::new();
v.push("hello".into()).unwrap();
v.push("world".into()).unwrap();
assert_eq!(v.as_slice(), &["hello", "world"]);
assert_eq!(v.pop(), Some("world".into()));
assert_eq!(v.len(), 1);
// Drop cleans up remaining "hello"
}
Exercise 8: Declarative Macro — map! (Ch12) ★ (~15 min)
Write a map! macro that creates a HashMap from key-value pairs, similar to vec![]:
#![allow(unused)]
fn main() {
let m = map! {
"host" => "localhost",
"port" => "8080",
};
assert_eq!(m.get("host"), Some(&"localhost"));
assert_eq!(m.len(), 2);
}
Requirements:
- Support trailing comma
- Support empty invocation
map!{} - Work with any types that implement
Into<K>andInto<V>for maximum flexibility
🔑 Solution
macro_rules! map {
// Empty case
() => {
std::collections::HashMap::new()
};
// One or more key => value pairs (trailing comma optional)
( $( $key:expr => $val:expr ),+ $(,)? ) => {{
let mut m = std::collections::HashMap::new();
$( m.insert($key, $val); )+
m
}};
}
fn main() {
// Basic usage:
let config = map! {
"host" => "localhost",
"port" => "8080",
"timeout" => "30",
};
assert_eq!(config.len(), 3);
assert_eq!(config["host"], "localhost");
// Empty map:
let empty: std::collections::HashMap<String, String> = map!();
assert!(empty.is_empty());
// Different types:
let scores = map! {
1 => 100,
2 => 200,
};
assert_eq!(scores[&1], 100);
}
Exercise 9: Custom serde Deserialization (Ch10) ★★★ (~45 min)
Design a Duration wrapper that deserializes from human-readable strings like "30s", "5m", "2h" using a custom serde deserializer. The struct should also serialize back to the same format.
🔑 Solution
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::fmt;
#[derive(Debug, Clone, PartialEq)]
struct HumanDuration(std::time::Duration);
impl HumanDuration {
fn from_str(s: &str) -> Result<Self, String> {
let s = s.trim();
if s.is_empty() { return Err("empty duration string".into()); }
let (num_str, suffix) = s.split_at(
s.find(|c: char| !c.is_ascii_digit()).unwrap_or(s.len())
);
let value: u64 = num_str.parse()
.map_err(|_| format!("invalid number: {num_str}"))?;
let duration = match suffix {
"s" | "sec" => std::time::Duration::from_secs(value),
"m" | "min" => std::time::Duration::from_secs(value * 60),
"h" | "hr" => std::time::Duration::from_secs(value * 3600),
"ms" => std::time::Duration::from_millis(value),
other => return Err(format!("unknown suffix: {other}")),
};
Ok(HumanDuration(duration))
}
}
impl fmt::Display for HumanDuration {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let secs = self.0.as_secs();
if secs == 0 {
write!(f, "{}ms", self.0.as_millis())
} else if secs % 3600 == 0 {
write!(f, "{}h", secs / 3600)
} else if secs % 60 == 0 {
write!(f, "{}m", secs / 60)
} else {
write!(f, "{}s", secs)
}
}
}
impl Serialize for HumanDuration {
fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_str(&self.to_string())
}
}
impl<'de> Deserialize<'de> for HumanDuration {
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
let s = String::deserialize(deserializer)?;
HumanDuration::from_str(&s).map_err(serde::de::Error::custom)
}
}
#[derive(Debug, Deserialize, Serialize)]
struct Config {
timeout: HumanDuration,
retry_interval: HumanDuration,
}
fn main() {
let json = r#"{ "timeout": "30s", "retry_interval": "5m" }"#;
let config: Config = serde_json::from_str(json).unwrap();
assert_eq!(config.timeout.0, std::time::Duration::from_secs(30));
assert_eq!(config.retry_interval.0, std::time::Duration::from_secs(300));
// Round-trips correctly:
let serialized = serde_json::to_string(&config).unwrap();
assert!(serialized.contains("30s"));
assert!(serialized.contains("5m"));
println!("Config: {serialized}");
}
Exercise 10 — Concurrent Fetcher with Timeout ★★ (~25 min)
Write an async function fetch_all that spawns three tokio::spawn tasks, each
simulating a network call with tokio::time::sleep. Join all three with
tokio::try_join! wrapped in tokio::time::timeout(Duration::from_secs(5), ...).
Return Result<Vec<String>, ...> or an error if any task fails or the deadline
expires.
Learning goals: tokio::spawn, try_join!, timeout, error propagation
across task boundaries.
Hint
Each spawned task returns Result<String, _>. try_join! unwraps all three.
Wrap the whole try_join! in timeout() — the Elapsed error means you hit the
deadline.
Solution
use tokio::time::{sleep, timeout, Duration};
async fn fake_fetch(name: &'static str, delay_ms: u64) -> Result<String, String> {
sleep(Duration::from_millis(delay_ms)).await;
Ok(format!("{name}: OK"))
}
async fn fetch_all() -> Result<Vec<String>, Box<dyn std::error::Error>> {
let deadline = Duration::from_secs(5);
let (a, b, c) = timeout(deadline, async {
let h1 = tokio::spawn(fake_fetch("svc-a", 100));
let h2 = tokio::spawn(fake_fetch("svc-b", 200));
let h3 = tokio::spawn(fake_fetch("svc-c", 150));
tokio::try_join!(h1, h2, h3)
})
.await??; // first ? = timeout, second ? = join
Ok(vec![a?, b?, c?]) // unwrap inner Results
}
#[tokio::main]
async fn main() {
let results = fetch_all().await.unwrap();
for r in &results {
println!("{r}");
}
}
Exercise 11 — Async Channel Pipeline ★★★ (~40 min)
Build a producer → transformer → consumer pipeline using tokio::sync::mpsc:
- Producer: sends integers 1..=20 into channel A (capacity 4).
- Transformer: reads from channel A, squares each value, sends into channel B.
- Consumer: reads from channel B, collects into a
Vec<u64>, returns it.
All three stages run as concurrent tokio::spawn tasks. Use bounded channels to
demonstrate back-pressure. Assert the final vec equals [1, 4, 9, ..., 400].
Learning goals: mpsc::channel, bounded back-pressure, tokio::spawn with
move closures, graceful shutdown via channel close.
Solution
use tokio::sync::mpsc;
#[tokio::main]
async fn main() {
let (tx_a, mut rx_a) = mpsc::channel::<u64>(4); // bounded — back-pressure
let (tx_b, mut rx_b) = mpsc::channel::<u64>(4);
// Producer
let producer = tokio::spawn(async move {
for i in 1..=20u64 {
tx_a.send(i).await.unwrap();
}
// tx_a dropped here → channel A closes
});
// Transformer
let transformer = tokio::spawn(async move {
while let Some(val) = rx_a.recv().await {
tx_b.send(val * val).await.unwrap();
}
// tx_b dropped here → channel B closes
});
// Consumer
let consumer = tokio::spawn(async move {
let mut results = Vec::new();
while let Some(val) = rx_b.recv().await {
results.push(val);
}
results
});
producer.await.unwrap();
transformer.await.unwrap();
let results = consumer.await.unwrap();
let expected: Vec<u64> = (1..=20).map(|x: u64| x * x).collect();
assert_eq!(results, expected);
println!("Pipeline complete: {results:?}");
}
Summary and Reference Card
Quick Reference Card
Pattern Decision Guide
Need type safety for primitives?
└── Newtype pattern (Ch3)
Need compile-time state enforcement?
└── Type-state pattern (Ch3)
Need a "tag" with no runtime data?
└── PhantomData (Ch4)
Need to break Rc/Arc reference cycles?
└── Weak<T> / sync::Weak<T> (Ch8)
Need to wait for a condition without busy-looping?
└── Condvar + Mutex (Ch6)
Need to handle "one of N types"?
├── Known closed set → Enum
├── Open set, hot path → Generics
├── Open set, cold path → dyn Trait
└── Completely unknown types → Any + TypeId (Ch2)
Need shared state across threads?
├── Simple counter/flag → Atomics
├── Short critical section → Mutex
├── Read-heavy → RwLock
├── Lazy one-time init → OnceLock / LazyLock (Ch6)
└── Complex state → Actor + Channels
Need to parallelize computation?
├── Collection processing → rayon::par_iter
├── Background task → thread::spawn
└── Borrow local data → thread::scope
Need async I/O or concurrent networking?
├── Basic → tokio + async/await (Ch15)
└── Advanced (streams, middleware) → see Async Rust Training
Need error handling?
├── Library → thiserror (#[derive(Error)])
└── Application → anyhow (Result<T>)
Need to prevent a value from being moved?
└── Pin<T> (Ch8) — required for Futures, self-referential types
Trait Bounds Cheat Sheet
| Bound | Meaning |
|---|---|
T: Clone | Can be duplicated |
T: Send | Can be moved to another thread |
T: Sync | &T can be shared between threads |
T: 'static | Contains no non-static references |
T: Sized | Size known at compile time (default) |
T: ?Sized | Size may not be known ([T], dyn Trait) |
T: Unpin | Safe to move after pinning |
T: Default | Has a default value |
T: Into<U> | Can be converted to U |
T: AsRef<U> | Can be borrowed as &U |
T: Deref<Target = U> | Auto-derefs to &U |
F: Fn(A) -> B | Callable, borrows state immutably |
F: FnMut(A) -> B | Callable, may mutate state |
F: FnOnce(A) -> B | Callable exactly once, may consume state |
Lifetime Elision Rules
The compiler inserts lifetimes automatically in three cases (so you don’t have to):
#![allow(unused)]
fn main() {
// Rule 1: Each reference parameter gets its own lifetime
// fn foo(x: &str, y: &str) → fn foo<'a, 'b>(x: &'a str, y: &'b str)
// Rule 2: If there's exactly ONE input lifetime, it's used for all outputs
// fn foo(x: &str) -> &str → fn foo<'a>(x: &'a str) -> &'a str
// Rule 3: If one parameter is &self or &mut self, its lifetime is used
// fn foo(&self, x: &str) -> &str → fn foo<'a>(&'a self, x: &str) -> &'a str
}
When you MUST write explicit lifetimes:
- Multiple input references and a reference output (compiler can’t guess which input)
- Struct fields that hold references:
struct Ref<'a> { data: &'a str } 'staticbounds when you need data without borrowed references
Common Derive Traits
#![allow(unused)]
fn main() {
#[derive(
Debug, // {:?} formatting
Clone, // .clone()
Copy, // Implicit copy (only for simple types)
PartialEq, Eq, // == comparison
PartialOrd, Ord, // < > comparison + sorting
Hash, // HashMap/HashSet key
Default, // Type::default()
)]
struct MyType { /* ... */ }
}
Module Visibility Quick Reference
pub → visible everywhere
pub(crate) → visible within the crate
pub(super) → visible to parent module
pub(in path) → visible within a specific path
(nothing) → private to current module + children
Further Reading
| Resource | Why |
|---|---|
| Rust Design Patterns | Catalog of idiomatic patterns and anti-patterns |
| Rust API Guidelines | Official checklist for polished public APIs |
| Rust Atomics and Locks | Mara Bos’s deep dive into concurrency primitives |
| The Rustonomicon | Official guide to unsafe Rust and dark corners |
| Error Handling in Rust | Andrew Gallant’s comprehensive guide |
| Jon Gjengset — Crust of Rust series | Deep dives into iterators, lifetimes, channels, etc. |
| Effective Rust | 35 specific ways to improve your Rust code |
End of Rust Patterns & Engineering How-Tos
Capstone Project: Type-Safe Task Scheduler
This project integrates patterns from across the book into a single, production-style system. You’ll build a type-safe, concurrent task scheduler that uses generics, traits, typestate, channels, error handling, and testing.
Estimated time: 4–6 hours | Difficulty: ★★★
What you’ll practice:
- Generics and trait bounds (Ch 1–2)
- Typestate pattern for task lifecycle (Ch 3)
- PhantomData for zero-cost state markers (Ch 4)
- Channels for worker communication (Ch 5)
- Concurrency with scoped threads (Ch 6)
- Error handling with
thiserror(Ch 9)- Testing with property-based tests (Ch 13)
- API design with
TryFromand validated types (Ch 14)
The Problem
Build a task scheduler where:
- Tasks have a typed lifecycle:
Pending → Running → Completed(orFailed) - Workers pull tasks from a channel, execute them, and report results
- The scheduler manages task submission, worker coordination, and result collection
- Invalid state transitions are compile-time errors
stateDiagram-v2
[*] --> Pending: scheduler.submit(task)
Pending --> Running: worker picks up task
Running --> Completed: task succeeds
Running --> Failed: task returns Err
Completed --> [*]: scheduler.results()
Failed --> [*]: scheduler.results()
Pending --> Pending: ❌ can't execute directly
Completed --> Running: ❌ can't re-run
Step 1: Define the Task Types
Start with the typestate markers and a generic Task:
#![allow(unused)]
fn main() {
use std::marker::PhantomData;
// --- State markers (zero-sized) ---
struct Pending;
struct Running;
struct Completed;
struct Failed;
// --- Task ID (newtype for type safety) ---
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
struct TaskId(u64);
// --- The Task struct, parameterized by lifecycle state ---
struct Task<State, R> {
id: TaskId,
name: String,
_state: PhantomData<State>,
_result: PhantomData<R>,
}
}
Your job: Implement state transitions so that:
Task<Pending, R>can transition toTask<Running, R>(viastart())Task<Running, R>can transition toTask<Completed, R>orTask<Failed, R>- No other transitions compile
💡 Hint
Each transition method should consume self and return the new state:
#![allow(unused)]
fn main() {
impl<R> Task<Pending, R> {
fn start(self) -> Task<Running, R> {
Task {
id: self.id,
name: self.name,
_state: PhantomData,
_result: PhantomData,
}
}
}
}
Step 2: Define the Work Function
Tasks need a function to execute. Use a boxed closure:
#![allow(unused)]
fn main() {
struct WorkItem<R: Send + 'static> {
id: TaskId,
name: String,
work: Box<dyn FnOnce() -> Result<R, String> + Send>,
}
}
Your job: Implement WorkItem::new() that accepts a task name and closure.
Add a TaskId generator (simple atomic counter or mutex-protected counter).
Step 3: Error Handling
Define the scheduler’s error types using thiserror:
use thiserror::Error;
#[derive(Error, Debug)]
pub enum SchedulerError {
#[error("scheduler is shut down")]
ShutDown,
#[error("task {0:?} failed: {1}")]
TaskFailed(TaskId, String),
#[error("channel send error")]
ChannelError(#[from] std::sync::mpsc::SendError<()>),
#[error("worker panicked")]
WorkerPanic,
}
Step 4: The Scheduler
Build the scheduler using channels (Ch 5) and scoped threads (Ch 6):
#![allow(unused)]
fn main() {
use std::sync::mpsc;
struct Scheduler<R: Send + 'static> {
sender: Option<mpsc::Sender<WorkItem<R>>>,
results: mpsc::Receiver<TaskResult<R>>,
num_workers: usize,
}
struct TaskResult<R> {
id: TaskId,
name: String,
outcome: Result<R, String>,
}
}
Your job: Implement:
Scheduler::new(num_workers: usize) -> Self— creates channels and spawns workersScheduler::submit(&self, item: WorkItem<R>) -> Result<TaskId, SchedulerError>Scheduler::shutdown(self) -> Vec<TaskResult<R>>— drops the sender, joins workers, collects results
💡 Hint — Worker loop
#![allow(unused)]
fn main() {
fn worker_loop<R: Send + 'static>(
rx: std::sync::Arc<std::sync::Mutex<mpsc::Receiver<WorkItem<R>>>>,
result_tx: mpsc::Sender<TaskResult<R>>,
worker_id: usize,
) {
loop {
let item = {
let rx = rx.lock().unwrap();
rx.recv()
};
match item {
Ok(work_item) => {
let outcome = (work_item.work)();
let _ = result_tx.send(TaskResult {
id: work_item.id,
name: work_item.name,
outcome,
});
}
Err(_) => break, // Channel closed
}
}
}
}
Step 5: Integration Test
Write tests that verify:
- Happy path: Submit 10 tasks, shut down, verify all 10 results are
Ok - Error handling: Submit tasks that fail, verify
TaskResult.outcomeisErr - Empty scheduler: Create and immediately shut down — no panics
- Property test (bonus): Use
proptestto verify that for any N tasks (1..100), the scheduler always returns exactly N results
#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn happy_path() {
let scheduler = Scheduler::<String>::new(4);
for i in 0..10 {
let item = WorkItem::new(
format!("task-{i}"),
move || Ok(format!("result-{i}")),
);
scheduler.submit(item).unwrap();
}
let results = scheduler.shutdown();
assert_eq!(results.len(), 10);
for r in &results {
assert!(r.outcome.is_ok());
}
}
#[test]
fn handles_failures() {
let scheduler = Scheduler::<String>::new(2);
scheduler.submit(WorkItem::new("good", || Ok("ok".into()))).unwrap();
scheduler.submit(WorkItem::new("bad", || Err("boom".into()))).unwrap();
let results = scheduler.shutdown();
assert_eq!(results.len(), 2);
let failures: Vec<_> = results.iter()
.filter(|r| r.outcome.is_err())
.collect();
assert_eq!(failures.len(), 1);
}
}
}
Step 6: Put It All Together
Here’s the main() that demonstrates the full system:
fn main() {
let scheduler = Scheduler::<String>::new(4);
// Submit tasks with varying workloads
for i in 0..20 {
let item = WorkItem::new(
format!("compute-{i}"),
move || {
// Simulate work
std::thread::sleep(std::time::Duration::from_millis(10));
if i % 7 == 0 {
Err(format!("task {i} hit a simulated error"))
} else {
Ok(format!("task {i} completed with value {}", i * i))
}
},
);
// NOTE: .unwrap() is used for brevity — handle SendError in production.
scheduler.submit(item).unwrap();
}
println!("All tasks submitted. Shutting down...");
let results = scheduler.shutdown();
let (ok, err): (Vec<_>, Vec<_>) = results.iter()
.partition(|r| r.outcome.is_ok());
println!("\n✅ Succeeded: {}", ok.len());
for r in &ok {
println!(" {} → {}", r.name, r.outcome.as_ref().unwrap());
}
println!("\n❌ Failed: {}", err.len());
for r in &err {
println!(" {} → {}", r.name, r.outcome.as_ref().unwrap_err());
}
}
Evaluation Criteria
| Criterion | Target |
|---|---|
| Type safety | Invalid state transitions don’t compile |
| Concurrency | Workers run in parallel, no data races |
| Error handling | All failures captured in TaskResult, no panics |
| Testing | At least 3 tests; bonus for proptest |
| Code organization | Clean module structure, public API uses validated types |
| Documentation | Key types have doc comments explaining invariants |
Extension Ideas
Once the basic scheduler works, try these enhancements:
- Priority queue: Add a
Prioritynewtype (1–10) and process higher-priority tasks first - Retry policy: Failed tasks retry up to N times before being marked permanently failed
- Cancellation: Add a
cancel(TaskId)method that removes pending tasks - Async version: Port to
tokio::spawnwithtokio::sync::mpscchannels (Ch 15) - Metrics: Track per-worker task counts, average execution time, and failure rates