A Deterministic Binary
Document Format
Smaller than JSON. No schema required.
C core with native bindings for Python & Node.js. Arena-allocated, zigzag-varint encoded, depth-limited, and thread-safe.
import crous
# Serialize any Python object
data = {
"name": "Alice",
"scores": [98, 95, 100],
"active": True
}
# Encode → compact binary
binary = crous.dumps(data) # 42 bytes (vs 76 bytes JSON)
# Decode → original object
result = crous.loads(binary)
assert result == dataMeasured, not marketed
Real numbers from Python 3.14 on Apple Silicon (ARM64). 500 iterations per benchmark. No cherry-picking.
Serialization Throughput
Output Size
| Dataset | Crous | JSON | MsgPack | Protobuf |
|---|---|---|---|---|
| Small API | 227 B | 272 B | 216 B | 346 B |
| Config Object | 749 B | 955 B | 750 B | 1,132 B |
| Large Records | 224.9 KB | 286.3 KB | 220.0 KB | 334.0 KB |
| Numeric | 207.1 KB | 219.1 KB | 156.4 KB | 249.9 KB |
Protobuf measured using schema-less google.protobuf.Struct (fair comparison — all formats used without pre-defined schemas). MsgPack wins on raw numeric payloads; Crous wins on structured data.
Built for determinism
Every design decision serves a single goal: the same data must always produce the same bytes, fast.
Schema-Free
No .proto files. No code generation. Serialize any structure directly — the type system is inferred from the data.
Zero-Copy Where Possible
The C core reads directly from buffer memory. String and bytes values reference the source buffer without allocation.
Single-Pass Encoding
Data is written in one forward pass. No backpatching, no length-prefix lookups, no two-phase serialization.
Arena Allocation
All memory is allocated from a per-operation arena and freed in one shot. No per-node malloc/free churn.
Bounded Recursion
Configurable nesting depth limit (default: 64). Prevents stack overflow from adversarial or malformed input.
Deterministic Output
Identical input always produces identical bytes. Critical for content-addressable storage and digest verification.
How it works under the hood
From Python dict to compact binary — a single-pass pipeline through the modular C core.
Encoding Pipeline
Module Architecture
Zigzag + varint encoding. 0 costs 1 byte, ±63 costs 1 byte, full i64 costs up to 10 bytes.
UTF-8 validated. Length-prefixed with varint. No null terminator stored.
Tag byte + varint count + inline children. No offset tables needed.
Honest trade-offs
Every format has strengths. Here's where Crous fits — and where it doesn't.
| Feature | Crous | JSON | MsgPack | Protobuf |
|---|---|---|---|---|
| Schema Required | No | No | No | Yes |
| Human Readable | CROUT mode | Yes | No | No |
| Binary Format | Yes | No | Yes | Yes |
| Native Tuple/Set | Yes | No | No | No |
| Tagged Values | Yes | No | Ext types | Any |
| Streaming | Yes | No | Yes | Yes |
| Deterministic | Yes | Impl-dep. | Mostly | No |
| Language Support | 2 SDKs | Universal | 30+ langs | 12+ langs |
| Zero-Copy Decode | Partial | No | Partial | Yes |
| Depth Limiting | Built-in | No | No | No |
Where Crous excels
- • Small-to-medium structured payloads (API responses, configs)
- • Python-native types: tuple, set, frozenset, bytes, complex
- • Deterministic encoding for content-addressable storage
- • Rapid prototyping without schema definition
Where others win
- • MsgPack: faster on large numeric arrays; broader language ecosystem
- • Protobuf: schema evolution, strong typing, gRPC integration
- • JSON: universal compatibility, human readability
Where Crous fits
Edge & Embedded
Small binary footprint on IoT, Raspberry Pi, or edge gateways where every byte counts.
Game State & Save Files
Deterministic serialization of complex game state with native tuple and set support.
IPC & Microservices
Fast inter-process communication. Encode in Python, decode in Node.js — identical wire format.
Content-Addressable Storage
Deterministic output means the same data always hashes to the same digest.
Telemetry & Logging
Streaming API for appending records without re-encoding the entire file.
Config & Data Files
CROUT text format for human-readable configs; FLUX binary for production deployment.
Safety is not optional
The C core is written defensively. Every untrusted input path is bounded and validated.
Arena Memory Safety
All allocations go through a per-operation arena. No dangling pointers, no use-after-free — the entire arena is reclaimed in one call.
Bounded Recursion
Configurable nesting depth limit (default 64). Prevents stack overflow from deeply nested or adversarial payloads.
Thread Safety
The Python binding releases the GIL during encode/decode. No shared mutable state between operations.
Input Validation
UTF-8 strings are validated on decode. Varint overflows are checked. Truncated buffers return structured errors.
import crous
# Depth limiting — prevents stack overflow
crous.dumps(nested_data, max_depth=32)
# Graceful error handling
try:
result = crous.loads(untrusted_bytes)
except crous.CrousDecodeError as e:
print(f"Invalid payload: {e}")
# Thread-safe — GIL released during C operations
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as pool:
results = list(pool.map(crous.loads, payloads))Native SDKs
Same wire format. Same semantics. Pick your runtime.
Python SDK
C extension · Python 3.6+Full FLUX binary, CROUT text, and streaming support. Custom serializers. Native tuple, set, frozenset, bytes, and complex number support.
DocumentationNode.js SDK
N-API addon · ABI stableN-API native addon with cross-platform builds. File I/O, custom serializers, and Buffer integration.
Documentation