CROUT Format

CROUT (Crous Text) is a human-readable text serialization format. It's designed for debugging, inspection, and scenarios where readability matters more than size. CROUT features a token compression system that replaces repeated dictionary keys with single characters.

Basic Usage

crout_basic.py
import crous

data = {"name": "Alice", "age": 30, "active": True}

# Encode to CROUT text
text = crous.to_crout(data)
print(text)
# Output:
# CROUT1
# {s5:Alice , i30 , T}

# Decode from CROUT text
result = crous.from_crout(text)
assert result == data

Value Syntax

Every value in CROUT has a type prefix that indicates how to parse it:

PrefixTypeExampleDescription
NNullNPython None
TTrueTBoolean true
FFalseFBoolean false
iIntegeri42, i-764-bit signed integer
fFloatf3.14IEEE 754 double
sStrings5:helloLength-prefixed, binary-safe
bBytesb4:deadbeefLength-prefixed, hex-encoded
#Tagged#90:[i1,i2]Tagged value with numeric tag
{}Dict{s3:key:i42}Key-value mapping
[]List[i1,i2,i3]Ordered sequence
()Tuple(i1,i2,i3)Immutable sequence

Token Compression

CROUT features a token table that replaces frequently-used dictionary keys with single-character tokens. This significantly reduces size for data with repeated keys.

token_compression.py
import crous

# Data with repeated keys
data = [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    {"name": "Charlie", "age": 35},
]

text = crous.to_crout(data)
print(text)
# Output:
# CROUT1
# @ a=name
# @ c=age
# [{a:s5:Alice , c:i30} , {a:s3:Bob , c:i25} , {a:s7:Charlie , c:i35}]

# "name" → "a", "age" → "c" (single-character tokens)

Token Assignment

Tokens are assigned from a safe alphabet that avoids type prefixes (s, i, f,b, N, T, F). Keys appearing ≥ 2 times get tokens, sorted by frequency (most frequent first). Maximum 64 tokens.

Special Float Values

# Special float values
crous.to_crout(float('inf'))      # "finf"
crous.to_crout(float('-inf'))     # "f-inf"
crous.to_crout(float('nan'))      # "fnan"

CROUT ↔ FLUX Conversion

Crous provides direct conversion between CROUT text and FLUX binary without going through Python objects:

conversion.py
import crous

data = {"name": "Alice", "scores": [98, 95, 100]}

# Python → CROUT text
crout_text = crous.to_crout(data)

# CROUT text → FLUX binary (direct, no Python intermediary)
flux_binary = crous.crout_to_flux(crout_text)

# FLUX binary → CROUT text (direct)
crout_back = crous.flux_to_crout(flux_binary)

# All representations are equivalent
assert crous.from_crout(crout_text) == crous.loads(flux_binary)

CROUT Format Header

Every CROUT document starts with the magic string CROUT1 followed by optional token definitions:

CROUT1                      magic + version
@ a=name                    token "a" maps to key "name"
@ c=age                     token "c" maps to key "age"
[{a:s5:Alice , c:i30}]     data using tokens

String Encoding

Strings in CROUT use a length-prefix encoding: s{length}:{data}. This is binary-safe — strings can contain any bytes including null bytes, newlines, and other special characters.

# String encoding examples:
# s0:        → empty string ""
# s5:hello   → "hello"
# s11:hello world → "hello world"
# s3:a\nb    → "a\nb" (newline in string, length includes it)

Bytes Encoding

Bytes use hex encoding with a length prefix: b{length}:{hex}. The length is the number of decoded bytes (not the hex string length).

# Bytes encoding examples:
# b0:           → b""
# b3:414243     → b"ABC"  (hex for 0x41, 0x42, 0x43)
# b4:deadbeef   → b"\xde\xad\xbe\xef"