v2.0.0STABLE

A Deterministic Binary
Document Format

Smaller than JSON. No schema required.

C core with native bindings for Python & Node.js. Arena-allocated, zigzag-varint encoded, depth-limited, and thread-safe.

example.py
import crous

# Serialize any Python object
data = {
    "name": "Alice",
    "scores": [98, 95, 100],
    "active": True
}

# Encode → compact binary
binary = crous.dumps(data)   # 42 bytes (vs 76 bytes JSON)

# Decode → original object
result = crous.loads(binary)
assert result == data
Benchmarks

Measured, not marketed

Real numbers from Python 3.14 on Apple Silicon (ARM64). 500 iterations per benchmark. No cherry-picking.

Serialization Throughput

Small API ResponseK ops/s
Crous
639
JSON
366
MsgPack
555
Protobuf
85
Config ObjectK ops/s
Crous
315
JSON
216
MsgPack
392
Protobuf
46
Text-Heavy DocumentK ops/s
Crous
7
JSON
4.8
MsgPack
16.3
Protobuf
0.7
Numeric PayloadK ops/s
Crous
1.1
JSON
0.54
MsgPack
3.2
Protobuf
0.14

Output Size

DatasetCrousJSONMsgPackProtobuf
Small API227 B272 B216 B346 B
Config Object749 B955 B750 B1,132 B
Large Records224.9 KB286.3 KB220.0 KB334.0 KB
Numeric207.1 KB219.1 KB156.4 KB249.9 KB

Protobuf measured using schema-less google.protobuf.Struct (fair comparison — all formats used without pre-defined schemas). MsgPack wins on raw numeric payloads; Crous wins on structured data.

Philosophy

Built for determinism

Every design decision serves a single goal: the same data must always produce the same bytes, fast.

Schema-Free

No .proto files. No code generation. Serialize any structure directly — the type system is inferred from the data.

Zero-Copy Where Possible

The C core reads directly from buffer memory. String and bytes values reference the source buffer without allocation.

Single-Pass Encoding

Data is written in one forward pass. No backpatching, no length-prefix lookups, no two-phase serialization.

Arena Allocation

All memory is allocated from a per-operation arena and freed in one shot. No per-node malloc/free churn.

Bounded Recursion

Configurable nesting depth limit (default: 64). Prevents stack overflow from adversarial or malformed input.

Deterministic Output

Identical input always produces identical bytes. Critical for content-addressable storage and digest verification.

Internals

How it works under the hood

From Python dict to compact binary — a single-pass pipeline through the modular C core.

Encoding Pipeline

{ }"name"[1, 2]True42Python Dictinput datastrintlistdictboolfloatbytesType InspectorintrospectCROUSsingle-pass encoderzigzagvarintarenataggedzero-copyC Core Encoderencode46 4C 55 58FLUX0A 05 6E 616D 65 03 416C 69 63 65C4 01 BE 01FF 08 A2 9C0A 05 6E 6142 BFLUX Binaryoutput1234

Module Architecture

SDK BINDINGSPython SDKpycrous.c · C-APINode.js SDKN-API · ABI stableVALUE CORECrousValuetagged union · 10 typesnullboolintfloatstrbyteslisttupledicttaggedENCODINGFLUXbinary codecencode · decode · streamCROUTtext formatlexer · parser · emitterVarintint encodingzigzag · LEB128FOUNDATIONArena AllocatorError HandlingBuffer I/OType Registry
Integers

Zigzag + varint encoding. 0 costs 1 byte, ±63 costs 1 byte, full i64 costs up to 10 bytes.

Strings

UTF-8 validated. Length-prefixed with varint. No null terminator stored.

Collections

Tag byte + varint count + inline children. No offset tables needed.

Comparison

Honest trade-offs

Every format has strengths. Here's where Crous fits — and where it doesn't.

FeatureCrousJSONMsgPackProtobuf
Schema RequiredNoNoNoYes
Human ReadableCROUT modeYesNoNo
Binary FormatYesNoYesYes
Native Tuple/SetYesNoNoNo
Tagged ValuesYesNoExt typesAny
StreamingYesNoYesYes
DeterministicYesImpl-dep.MostlyNo
Language Support2 SDKsUniversal30+ langs12+ langs
Zero-Copy DecodePartialNoPartialYes
Depth LimitingBuilt-inNoNoNo

Where Crous excels

  • • Small-to-medium structured payloads (API responses, configs)
  • • Python-native types: tuple, set, frozenset, bytes, complex
  • • Deterministic encoding for content-addressable storage
  • • Rapid prototyping without schema definition

Where others win

  • • MsgPack: faster on large numeric arrays; broader language ecosystem
  • • Protobuf: schema evolution, strong typing, gRPC integration
  • • JSON: universal compatibility, human readability
Use Cases

Where Crous fits

Edge & Embedded

Small binary footprint on IoT, Raspberry Pi, or edge gateways where every byte counts.

Game State & Save Files

Deterministic serialization of complex game state with native tuple and set support.

IPC & Microservices

Fast inter-process communication. Encode in Python, decode in Node.js — identical wire format.

Content-Addressable Storage

Deterministic output means the same data always hashes to the same digest.

Telemetry & Logging

Streaming API for appending records without re-encoding the entire file.

Config & Data Files

CROUT text format for human-readable configs; FLUX binary for production deployment.

Safety

Safety is not optional

The C core is written defensively. Every untrusted input path is bounded and validated.

Arena Memory Safety

All allocations go through a per-operation arena. No dangling pointers, no use-after-free — the entire arena is reclaimed in one call.

Bounded Recursion

Configurable nesting depth limit (default 64). Prevents stack overflow from deeply nested or adversarial payloads.

Thread Safety

The Python binding releases the GIL during encode/decode. No shared mutable state between operations.

Input Validation

UTF-8 strings are validated on decode. Varint overflows are checked. Truncated buffers return structured errors.

safety.py
import crous

# Depth limiting — prevents stack overflow
crous.dumps(nested_data, max_depth=32)

# Graceful error handling
try:
    result = crous.loads(untrusted_bytes)
except crous.CrousDecodeError as e:
    print(f"Invalid payload: {e}")

# Thread-safe — GIL released during C operations
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as pool:
    results = list(pool.map(crous.loads, payloads))

Start building

Read the docs. Run the benchmarks. Build something fast.