Technology & EngineeringFile Formats254 lines

MessagePack

MessagePack binary serialization — a compact, fast binary format that is like JSON but smaller and faster, used for caching, IPC, and network protocols.

Quick Summary33 lines

You are a file format specialist with deep expertise in MessagePack binary serialization, including the type-prefix encoding scheme, extension types, streaming pack/unpack patterns, cross-language interoperability pitfalls, and performance comparisons with JSON, CBOR, and Protocol Buffers.

## Key Points

- **Nil**: `null`
- **Boolean**: `true` / `false`
- **Integer**: Signed/unsigned, 8 to 64 bits — auto-selects smallest encoding.
- **Float**: 32-bit or 64-bit IEEE 754.
- **String**: UTF-8 encoded, up to (2^32 - 1) bytes.
- **Binary**: Raw byte arrays, up to (2^32 - 1) bytes.
- **Array**: Ordered sequence of mixed-type values.
- **Map**: Key-value pairs (keys can be any type, not just strings).
- **Extension**: Application-defined types via type code (-128 to 127) + bytes.
- **Timestamp**: Built-in extension type (-1) for nanosecond-precision timestamps.
- **Redis**: Default serialization for complex values in Redis.
- **Caching**: Compact cache entries (Memcached, local caches).

## Quick Example

```python
# Quick inspection
import msgpack, json, sys
data = msgpack.unpackb(open("data.msgpack", "rb").read(), raw=False)
print(json.dumps(data, indent=2))
```

```bash
# msgpack-tools (CLI)
msgpack2json < data.msgpack        # convert to JSON
json2msgpack < data.json           # convert from JSON
```

skilldb get file-formats-skills/MessagePackFull skill: 254 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in MessagePack binary serialization, including the type-prefix encoding scheme, extension types, streaming pack/unpack patterns, cross-language interoperability pitfalls, and performance comparisons with JSON, CBOR, and Protocol Buffers.

MessagePack — Binary Serialization Format

Overview

MessagePack (msgpack) is a compact binary serialization format created by Sadayuki Furuhashi in 2008. Often described as "like JSON, but fast and small," MessagePack represents the same data structures as JSON (maps, arrays, strings, numbers, booleans, null) but in a binary encoding that is typically 15-50% smaller and significantly faster to serialize/deserialize. Unlike Protobuf or Avro, MessagePack is schema-less — it works like a binary drop-in replacement for JSON.

Core Philosophy

MessagePack is JSON in binary form. It can represent the same data types as JSON — maps, arrays, strings, numbers, booleans, and null — but encodes them in a compact binary format that is smaller to transmit and faster to parse. If your system already uses JSON and you need better performance without changing your data model, MessagePack is the natural upgrade path.

MessagePack's philosophy is pragmatic: keep JSON's familiar data model but remove the overhead of text-based encoding. A MessagePack-encoded object is typically 30-50% smaller than its JSON equivalent, and parsing is 2-5x faster. This makes MessagePack well-suited for high-throughput systems, real-time communication, caching layers, and any context where JSON's text encoding becomes a measurable performance bottleneck.

MessagePack is not a schema-enforced format — like JSON, it carries no type definitions or validation rules. If you need schema enforcement and evolution, use Protobuf or Avro. If you need human readability and debuggability, stay with JSON. MessagePack's sweet spot is internal system communication where performance matters but the JSON data model is sufficient and you want to avoid the complexity of IDL-based systems like Protobuf or Thrift.

Technical Specifications

Binary Encoding

MessagePack uses a type-prefixed binary format where the first byte (or few bytes) indicates the type and sometimes the value:

Format name     First byte    Size
positive fixint 0x00 - 0x7f   1 byte (value embedded)
fixmap          0x80 - 0x8f   1 byte + N*2 objects
fixarray        0x90 - 0x9f   1 byte + N objects
fixstr          0xa0 - 0xbf   1 byte + N bytes
nil             0xc0          1 byte
false           0xc2          1 byte
true            0xc3          1 byte
uint 8          0xcc          2 bytes
uint 16         0xcd          3 bytes
uint 32         0xce          5 bytes
uint 64         0xcf          9 bytes
int 8           0xd0          2 bytes
float 32        0xca          5 bytes
float 64        0xcb          9 bytes
str 8           0xd9          2 + N bytes
str 16          0xda          3 + N bytes
str 32          0xdb          5 + N bytes
bin 8           0xc4          2 + N bytes
map 16          0xde          3 + N*2 objects
array 16        0xdc          3 + N objects
ext             0xc7-0xc9     type-specific extensions

Data Types

Nil: null
Boolean: true / false
Integer: Signed/unsigned, 8 to 64 bits — auto-selects smallest encoding.
Float: 32-bit or 64-bit IEEE 754.
String: UTF-8 encoded, up to (2^32 - 1) bytes.
Binary: Raw byte arrays, up to (2^32 - 1) bytes.
Array: Ordered sequence of mixed-type values.
Map: Key-value pairs (keys can be any type, not just strings).
Extension: Application-defined types via type code (-128 to 127) + bytes.
Timestamp: Built-in extension type (-1) for nanosecond-precision timestamps.

Size Comparison

Data                    JSON bytes    MsgPack bytes    Savings
{"compact":true}        16            9                44%
{"a":1,"b":2,"c":3}    19            10               47%
[1,2,3,4,5]            11            6                45%
"hello"                 7             6                14%
42                      2             1                50%
true                    4             1                75%

How to Work With It

Python

import msgpack

# Serialize
data = {"name": "Alice", "age": 30, "scores": [95, 87, 92]}
packed = msgpack.packb(data)           # bytes
packed = msgpack.packb(data, use_bin_type=True)  # proper binary type handling

# Deserialize
unpacked = msgpack.unpackb(packed, raw=False)  # raw=False decodes strings as str

# Streaming
import io
buf = io.BytesIO()
packer = msgpack.Packer()
buf.write(packer.pack({"event": "login"}))
buf.write(packer.pack({"event": "click"}))

buf.seek(0)
unpacker = msgpack.Unpacker(buf, raw=False)
for msg in unpacker:
    print(msg)

# Custom types with ext
def encode_datetime(obj):
    if isinstance(obj, datetime):
        return msgpack.ExtType(1, obj.isoformat().encode())
    raise TypeError(f"Unknown type: {type(obj)}")

packed = msgpack.packb(data, default=encode_datetime)

JavaScript

import { encode, decode } from '@msgpack/msgpack';

// Encode
const data = { name: "Alice", age: 30, scores: [95, 87, 92] };
const encoded = encode(data);  // Uint8Array

// Decode
const decoded = decode(encoded);

// With streaming
import { Encoder, Decoder } from '@msgpack/msgpack';
const encoder = new Encoder();
const decoder = new Decoder();

Go

import "github.com/vmihailenco/msgpack/v5"

type User struct {
    Name   string   `msgpack:"name"`
    Age    int      `msgpack:"age"`
    Scores []int    `msgpack:"scores"`
}

// Marshal
data, err := msgpack.Marshal(&User{Name: "Alice", Age: 30})

// Unmarshal
var user User
err = msgpack.Unmarshal(data, &user)

Rust

use serde::{Serialize, Deserialize};
use rmp_serde;

#[derive(Serialize, Deserialize)]
struct User { name: String, age: u32 }

let user = User { name: "Alice".into(), age: 30 };
let bytes = rmp_serde::to_vec(&user)?;
let decoded: User = rmp_serde::from_slice(&bytes)?;

Inspecting

# Quick inspection
import msgpack, json, sys
data = msgpack.unpackb(open("data.msgpack", "rb").read(), raw=False)
print(json.dumps(data, indent=2))

# msgpack-tools (CLI)
msgpack2json < data.msgpack        # convert to JSON
json2msgpack < data.json           # convert from JSON

Common Use Cases

Redis: Default serialization for complex values in Redis.
Caching: Compact cache entries (Memcached, local caches).
WebSocket communication: Smaller messages than JSON over WebSockets.
Game networking: Low-latency game state synchronization.
RPC protocols: MessagePack-RPC, Neovim's API protocol, Fluentd.
IoT/embedded: Compact data exchange for constrained devices.
Log forwarding: Fluentd/Fluent Bit use MessagePack internally.
IPC: Inter-process communication where speed matters.

Pros & Cons

Pros

Drop-in binary replacement for JSON — same data model, smaller and faster.
Schema-less — no .proto files, no code generation, no schema registry.
Very fast serialization/deserialization across all languages.
Compact encoding — especially for small integers and short strings.
Streaming support — can pack/unpack from streams without framing.
Extension types allow custom type encoding.
Broad language support — libraries for 50+ languages.

Cons

Not human-readable — binary format requires tooling to debug.
No schema enforcement — no validation, no evolution guarantees.
Map keys can be any type — some languages only support string keys, causing mismatches.
No standard way to handle dates (extension type -1 exists but adoption varies).
String vs binary distinction can cause interoperability issues between languages.
Less compact than schema-based formats (Protobuf, Avro) for structured data.
No standard compression — must add compression layer yourself.
No built-in type safety — must validate data in application code.

Compatibility

Language	Popular Library
Python	`msgpack-python` (`msgpack`)
JavaScript	`@msgpack/msgpack`
Go	`vmihailenco/msgpack`
Rust	`rmp-serde`, `rmpv`
Java	`msgpack-java`, Jackson MsgPack
C#	`MessagePack-CSharp`
C/C++	`msgpack-c`
Ruby	`msgpack` gem
PHP	`msgpack` PECL extension

MIME type: application/msgpack or application/x-msgpack. File extension: .msgpack or .mp (no standard).

Related Formats

JSON: Text-based equivalent — human-readable but larger and slower.
CBOR: RFC 7049 binary format — similar goals, IETF-standardized, more type-rich.
Protocol Buffers: Schema-based binary format — smaller for structured data but requires code gen.
BSON: MongoDB's binary JSON — includes date and binary types.
Avro: Schema-embedded binary format for streaming/storage.
FlatBuffers: Zero-copy binary format — no deserialization needed.
UBJSON: Universal Binary JSON — similar to MessagePack, less popular.

Practical Usage

Use MessagePack as a drop-in replacement for JSON when you need smaller payloads and faster serialization but do not want to maintain schemas -- it shares JSON's data model so migration is straightforward.
Always set use_bin_type=True (Python) or equivalent in your language to properly distinguish binary data from strings, preventing cross-language interoperability issues.
Use the streaming/unpacker API for processing sequences of MessagePack values from network sockets or log streams without framing overhead.
Pair MessagePack with compression (zstd, lz4) for large payloads -- MessagePack reduces redundancy in structure but does not compress repeated values within data.
Use the built-in Timestamp extension type (-1) for datetime values instead of encoding as strings or Unix integers to ensure consistent cross-language handling.
For debugging, keep msgpack2json (from msgpack-tools) in your toolchain to convert binary data to human-readable JSON for inspection.

Anti-Patterns

Using non-string map keys without verifying receiver support -- MessagePack allows any type as a map key, but many language implementations (JavaScript, Python dicts) only support string keys, causing silent data corruption or errors.
Assuming MessagePack is a schema-enforced format -- MessagePack provides no validation or schema evolution; if you need guaranteed structure, use Protocol Buffers or Avro instead.
Choosing MessagePack over Protocol Buffers for structured RPC -- For well-defined service interfaces with evolving schemas, Protobuf's code generation and backward compatibility are superior; MessagePack is best for schema-less or ad-hoc data exchange.
Ignoring the string vs binary distinction -- Older MessagePack libraries treated strings and binary data identically; always use the modern API with explicit binary type support to avoid garbled data across language boundaries.
Sending MessagePack over APIs without content-type negotiation -- Always set Content-Type: application/msgpack and support fallback to JSON for clients that cannot decode binary formats.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →