- Rust 99.9%
| .cargo | ||
| .ralph | ||
| crates | ||
| docs | ||
| examples | ||
| .dockerignore | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| Dockerfile.rarakoon | ||
| Makefile | ||
| README.md | ||
Rarakoon
Rarakoon is a Rust port of Arakoon, a distributed key-value store with preference for consistency, implementing the Multi-Paxos consensus protocol.
The original Arakoon is written in OCaml on top of Tokyo Cabinet. This port replaces OCaml/Lwt with Rust/Tokio, Tokyo Cabinet with pluggable storage backends (redb on-disk B-tree or in-memory BTreeMap), and OCaml serialization with rkyv zero-copy deserialization.
Architecture
The workspace is split into 12 crates with the following dependency graph:
rarakoon-bin
|
v
rarakoon-node
|
+---> paxos-core (pure Multi-Paxos FSM, no I/O)
+---> kv-store (pluggable: redb on-disk or in-memory BTreeMap)
+---> head-db (periodic KV snapshots for fast restart)
+---> write-ahead-log (segment WAL with CRC32C + zstd)
+---> tcp-messenger (async node-to-node TCP messaging)
| +---> connection-fsm (pure TCP connection state machine)
| +---> leaky-buffer (bounded async MPSC with drop semantics)
| +---> wire-codec (rkyv length-prefixed framing)
+---> rarakoon-client (client wire protocol codec)
integration-tests (Turmoil-based deterministic network tests)
| Crate | Purpose |
|---|---|
leaky-buffer |
Bounded async MPSC channel that drops on overflow |
connection-fsm |
Pure TCP connection state machine with backoff |
wire-codec |
rkyv-based length-prefixed frame codec |
write-ahead-log |
Segment-based WAL with CRC32C and zstd compression |
kv-store |
Pluggable KV store trait with redb (default) and in-memory backends |
head-db |
Periodic KV snapshots with CRC32C integrity (used with in-memory backend) |
paxos-core |
Pure Multi-Paxos consensus FSM (no async, no I/O) |
tcp-messenger |
Async node-to-node TCP messaging layer |
rarakoon-client |
Client protocol codec (Arakoon-compatible binary wire format) |
rarakoon-node |
Node driver wiring FSM to I/O, admin endpoint, catch-up, WAL compaction |
rarakoon-bin |
CLI entry point |
integration-tests |
Turmoil-based deterministic network simulation tests |
Build and Test
cargo build --release
cargo test --workspace
cargo clippy --workspace
# Integration tests run separately (uses Turmoil simulation feature):
cargo test -p rarakoon-integration-tests
Container Build
A static-musl container image is available for chaos testing and deployment:
make image # builds rarakoon:test with Podman
make image RUNTIME=docker # or with Docker
The image is a multi-stage Alpine 3.23 build producing a statically-linked binary.
See Dockerfile.rarakoon and the Makefile for details.
Run
cargo run --release --bin rarakoon-bin -- \
--config cluster.toml \
--node node_0 \
--log-format pretty
Configuration
Rarakoon uses TOML configuration files. Example:
[cluster]
id = "my-cluster"
master_mode = "elected" # "elected" or "readonly"
lease_period_secs = 10
[[nodes]]
name = "node_0"
addresses = ["127.0.0.1:4010"]
client_port = 4000
home_dir = "/var/lib/rarakoon/node_0"
compressor = "zstd"
[[nodes]]
name = "node_1"
addresses = ["127.0.0.1:4011"]
client_port = 4001
home_dir = "/var/lib/rarakoon/node_1"
compressor = "zstd"
[[nodes]]
name = "node_2"
addresses = ["127.0.0.1:4012"]
client_port = 4002
home_dir = "/var/lib/rarakoon/node_2"
compressor = "zstd"
Each node exposes three ports:
- messaging port (configured in
addresses): inter-node Paxos communication - client port: client operations (Get, Set, Delete, etc.)
- admin port (
client_port + 1000, configurable): HTTP endpoints for health, state, and metrics
Storage Backend
The KV store backend is selected per-node via a [nodes.store] section:
# Default: redb (on-disk, production)
[[nodes]]
name = "node_0"
home_dir = "/var/lib/rarakoon/node_0"
# Explicit in-memory mode (testing only)
[[nodes]]
name = "node_0"
home_dir = "/var/lib/rarakoon/node_0"
[nodes.store]
backend = "memory"
redb(default): persistent on-disk B-tree via redb. Crash-safe, MVCC, datasets can exceed RAM.memory: in-memory BTreeMap + head-db snapshots for restart acceleration. Useful for testing.
Tuning
Operational thresholds are configurable under [node.tuning]:
[node.tuning]
snapshot_threshold = 10000 # catch-up mode decision threshold
catchup_batch_size = 1000 # entries per catch-up batch
compaction_threshold = 5000 # WAL compaction trigger
segments_to_keep = 2 # WAL segments retained after pruning
All fields are optional and default to the values shown above.
Admin Endpoint
Each node runs an optional HTTP admin server (enabled by default, bound to 127.0.0.1):
GET /health—200when running,503while starting. No auth required.GET /state— JSON snapshot of node state (consensus_i, who_master, fsm_state, etc.). Requires auth.GET /metrics— Prometheus text format. Requires auth.
Authentication uses a bearer token configured via RARAKOON_ADMIN_TOKEN env var, a token_file, or inline in TOML. See ADR-017 for details.
Test Hooks
The test-hooks Cargo feature enables env-var-driven fault injection for chaos testing. Compiled out of release builds entirely.
cargo build -p rarakoon-node --features test-hooks
Available hooks: RARAKOON_FAIL_FSYNC, RARAKOON_FAIL_WAL_APPEND, RARAKOON_DROP_SEND, RARAKOON_ELECTION_TIMEOUT_MS. See ADR-018.
Key Design Decisions
- Pure consensus core:
paxos-corehas zero dependencies on async runtimes, serialization libraries, or I/O. It is a purestep(context, state, event) -> (state, effects)function, making it easy to test and reason about. - Leaky back-pressure: Inter-node message queues drop new messages when full instead of blocking the sender, preventing cascading slowdowns in the cluster.
- Zero-copy wire format: Node-to-node messages use rkyv for zero-copy deserialization, while the client protocol preserves Arakoon's original binary format for compatibility.
- Pluggable storage: The
KvStore+Snapshottabletrait abstraction allows swapping backends (redb for production, in-memory for tests) without touching consensus or transport code. - WAL compaction: A background Collapser task prunes the WAL after snapshots, keeping disk usage bounded.
- Enhanced catch-up: Nodes that fall behind use batched entry transfer for small gaps or full snapshot transfer for large ones, selected automatically by threshold.
Documentation
Architecture Decision Records are in docs/adr/ (ADR-001 through ADR-022). Product Requirements Documents and implementation plans are in docs/prd/ and docs/plans/.
License
Apache-2.0