Type System

Crous maps Python types to an internal crous_value tree, which is then encoded to one of the wire formats. This page documents every supported type, its encoding behavior, and edge cases.

Core Types

Null

Python's None maps to the CROUS NULL type (tag 0x00).

crous.dumps(None)     # 7 bytes (6-byte header + 1 tag byte)
crous.loads(binary)   # None

Boolean

Booleans are encoded as single-byte tags: 0x01 for False,0x02 for True. Since Python's bool is a subclass of int, Crous checks for bool first to preserve the type.

crous.dumps(True)    # tag 0x02
crous.dumps(False)   # tag 0x01

# Type preservation
assert type(crous.loads(crous.dumps(True))) is bool   # not int!

Integer

Integers are stored as 64-bit signed values using zigzag encoding in FLUX format. Small integers (0–24) are encoded in a single byte.

# Small int optimization (single byte!)
crous.dumps(0)     # tag 0x10 → 1 byte
crous.dumps(24)    # tag 0x28 → 1 byte
crous.dumps(-1)    # tag 0x29 → 1 byte
crous.dumps(-32)   # tag 0x48 → 1 byte

# Larger integers use zigzag varint
crous.dumps(1000)  # tag 0x03 + zigzag varint

# Full 64-bit range
crous.dumps(2**63 - 1)   # max int64
crous.dumps(-(2**63))    # min int64

Integer Overflow

Integers outside the 64-bit signed range (-2⁶³ to 2⁶³-1) will raise a CrousEncodeError. Use default or a custom serializer to handle int values larger than 64 bits.

Float

Floating-point numbers are stored as 8-byte IEEE 754 doubles. Special values (NaN, Infinity, -Infinity) are preserved.

import math

crous.dumps(3.14)             # IEEE 754 double
crous.dumps(float('inf'))     # preserved
crous.dumps(float('-inf'))    # preserved
crous.dumps(float('nan'))     # preserved

# NaN comparison caveat
val = crous.loads(crous.dumps(float('nan')))
assert math.isnan(val)  # True (but val != val)

String

Strings are stored as UTF-8 with a varint length prefix. The encoder validates UTF-8 encoding and rejects invalid sequences.

crous.dumps("")           # empty string (length 0)
crous.dumps("hello")      # varint(5) + "hello"
crous.dumps("こんにちは")   # varint(15) + UTF-8 bytes

# Full Unicode support
crous.dumps("🎉🐍💚")     # emoji support
crous.dumps("مرحبا")       # Arabic
crous.dumps("Привет")      # Cyrillic

Bytes

Both bytes and bytearray are stored as raw byte sequences with a varint length prefix.

crous.dumps(b"\x00\x01\x02")      # raw bytes
crous.dumps(bytearray([1, 2, 3]))  # also works

# Round-trip always returns bytes (not bytearray)
result = crous.loads(crous.dumps(bytearray([1, 2, 3])))
assert type(result) is bytes

Container Types

List

Lists are encoded with a varint count followed by each element.

crous.dumps([1, 2, 3])          # varint(3) + elements
crous.dumps([])                 # empty list (count 0)
crous.dumps([1, "two", 3.0])   # mixed types OK
crous.dumps([[1, 2], [3, 4]])  # nested lists

Tuple

Tuples have their own type tag (TUPLE), distinct from lists. Type is preserved on round-trip.

# Tuples are NOT lists!
data_list = [1, 2, 3]
data_tuple = (1, 2, 3)

result_list = crous.loads(crous.dumps(data_list))
result_tuple = crous.loads(crous.dumps(data_tuple))

assert type(result_list) is list     # ✓
assert type(result_tuple) is tuple   # ✓ (preserved!)

Dictionary

Dictionaries are encoded with a varint count, then each key-value pair. Keys must be strings.

crous.dumps({"a": 1, "b": 2})  # varint(2) + pairs
crous.dumps({})                 # empty dict

# Keys MUST be strings
try:
    crous.dumps({1: "value"})
except crous.CrousEncodeError:
    print("Integer keys not supported!")

String Keys Only

Crous dictionaries only support string keys. Attempting to serialize a dict with non-string keys (int, tuple, etc.) will raise a CrousEncodeError.

Extended Types

Set

Sets are encoded as tagged values with tag 90, wrapping a list of the set's elements. On decode, the list is automatically reconstructed as a set.

data = {1, 2, 3, "four"}
binary = crous.dumps(data)
result = crous.loads(binary)

assert type(result) is set   # ✓
assert result == data         # ✓

Frozenset

Frozensets use tag 91 and are reconstructed as frozenset on decode.

data = frozenset([1, 2, 3])
result = crous.loads(crous.dumps(data))

assert type(result) is frozenset  # ✓

Tagged Values

Tagged values wrap any value with a numeric tag. Tags 90 and 91 are reserved for set/frozenset. Tags 100+ are used by the custom serializer registry.

Built-in Tag Assignments

TagTypeDescription
80datetimeNamed tag (parser only)
81dateNamed tag (parser only)
82timeNamed tag (parser only)
83timedeltaNamed tag (parser only)
84decimalNamed tag (parser only)
90setBuilt-in set encoding
91frozensetBuilt-in frozenset encoding
92complexNamed tag (parser only)
100+CustomAuto-assigned by register_serializer

Type Encoding Summary

Tag ByteTypeEncoding
0x00NULL1 byte
0x01FALSE1 byte
0x02TRUE1 byte
0x03INT1 + zigzag varint
0x04FLOAT1 + 8 bytes (big-endian)
0x05STRING1 + varint(len) + data
0x06BYTES1 + varint(len) + data
0x07LIST1 + varint(count) + elements
0x08DICT1 + varint(count) + pairs
0x09TAGGED1 + varint(tag) + value
0x0ATUPLE1 + varint(count) + elements
0x10–0x28POSINT1 byte (integers 0–24)
0x29–0x48NEGINT1 byte (integers -1 to -32)

Nesting Limits

Crous enforces a maximum nesting depth of 256 levels to prevent stack overflow. Attempting to encode deeper structures raises CrousEncodeError.

Size Limits

Individual strings and byte sequences are limited to 64 MB (67,108,864 bytes). This prevents memory exhaustion from malicious or corrupted data.