MessagePack
MessagePack binary serialization — a compact, fast binary format that is like JSON but smaller and faster, used for caching, IPC, and network protocols.
You are a file format specialist with deep expertise in MessagePack binary serialization, including the type-prefix encoding scheme, extension types, streaming pack/unpack patterns, cross-language interoperability pitfalls, and performance comparisons with JSON, CBOR, and Protocol Buffers.
## Key Points
- **Nil**: `null`
- **Boolean**: `true` / `false`
- **Integer**: Signed/unsigned, 8 to 64 bits — auto-selects smallest encoding.
- **Float**: 32-bit or 64-bit IEEE 754.
- **String**: UTF-8 encoded, up to (2^32 - 1) bytes.
- **Binary**: Raw byte arrays, up to (2^32 - 1) bytes.
- **Array**: Ordered sequence of mixed-type values.
- **Map**: Key-value pairs (keys can be any type, not just strings).
- **Extension**: Application-defined types via type code (-128 to 127) + bytes.
- **Timestamp**: Built-in extension type (-1) for nanosecond-precision timestamps.
- **Redis**: Default serialization for complex values in Redis.
- **Caching**: Compact cache entries (Memcached, local caches).
## Quick Example
```python
# Quick inspection
import msgpack, json, sys
data = msgpack.unpackb(open("data.msgpack", "rb").read(), raw=False)
print(json.dumps(data, indent=2))
```
```bash
# msgpack-tools (CLI)
msgpack2json < data.msgpack # convert to JSON
json2msgpack < data.json # convert from JSON
```skilldb get file-formats-skills/MessagePackFull skill: 254 linesYou are a file format specialist with deep expertise in MessagePack binary serialization, including the type-prefix encoding scheme, extension types, streaming pack/unpack patterns, cross-language interoperability pitfalls, and performance comparisons with JSON, CBOR, and Protocol Buffers.
MessagePack — Binary Serialization Format
Overview
MessagePack (msgpack) is a compact binary serialization format created by Sadayuki Furuhashi in 2008. Often described as "like JSON, but fast and small," MessagePack represents the same data structures as JSON (maps, arrays, strings, numbers, booleans, null) but in a binary encoding that is typically 15-50% smaller and significantly faster to serialize/deserialize. Unlike Protobuf or Avro, MessagePack is schema-less — it works like a binary drop-in replacement for JSON.
Core Philosophy
MessagePack is JSON in binary form. It can represent the same data types as JSON — maps, arrays, strings, numbers, booleans, and null — but encodes them in a compact binary format that is smaller to transmit and faster to parse. If your system already uses JSON and you need better performance without changing your data model, MessagePack is the natural upgrade path.
MessagePack's philosophy is pragmatic: keep JSON's familiar data model but remove the overhead of text-based encoding. A MessagePack-encoded object is typically 30-50% smaller than its JSON equivalent, and parsing is 2-5x faster. This makes MessagePack well-suited for high-throughput systems, real-time communication, caching layers, and any context where JSON's text encoding becomes a measurable performance bottleneck.
MessagePack is not a schema-enforced format — like JSON, it carries no type definitions or validation rules. If you need schema enforcement and evolution, use Protobuf or Avro. If you need human readability and debuggability, stay with JSON. MessagePack's sweet spot is internal system communication where performance matters but the JSON data model is sufficient and you want to avoid the complexity of IDL-based systems like Protobuf or Thrift.
Technical Specifications
Binary Encoding
MessagePack uses a type-prefixed binary format where the first byte (or few bytes) indicates the type and sometimes the value:
Format name First byte Size
positive fixint 0x00 - 0x7f 1 byte (value embedded)
fixmap 0x80 - 0x8f 1 byte + N*2 objects
fixarray 0x90 - 0x9f 1 byte + N objects
fixstr 0xa0 - 0xbf 1 byte + N bytes
nil 0xc0 1 byte
false 0xc2 1 byte
true 0xc3 1 byte
uint 8 0xcc 2 bytes
uint 16 0xcd 3 bytes
uint 32 0xce 5 bytes
uint 64 0xcf 9 bytes
int 8 0xd0 2 bytes
float 32 0xca 5 bytes
float 64 0xcb 9 bytes
str 8 0xd9 2 + N bytes
str 16 0xda 3 + N bytes
str 32 0xdb 5 + N bytes
bin 8 0xc4 2 + N bytes
map 16 0xde 3 + N*2 objects
array 16 0xdc 3 + N objects
ext 0xc7-0xc9 type-specific extensions
Data Types
- Nil:
null - Boolean:
true/false - Integer: Signed/unsigned, 8 to 64 bits — auto-selects smallest encoding.
- Float: 32-bit or 64-bit IEEE 754.
- String: UTF-8 encoded, up to (2^32 - 1) bytes.
- Binary: Raw byte arrays, up to (2^32 - 1) bytes.
- Array: Ordered sequence of mixed-type values.
- Map: Key-value pairs (keys can be any type, not just strings).
- Extension: Application-defined types via type code (-128 to 127) + bytes.
- Timestamp: Built-in extension type (-1) for nanosecond-precision timestamps.
Size Comparison
Data JSON bytes MsgPack bytes Savings
{"compact":true} 16 9 44%
{"a":1,"b":2,"c":3} 19 10 47%
[1,2,3,4,5] 11 6 45%
"hello" 7 6 14%
42 2 1 50%
true 4 1 75%
How to Work With It
Python
import msgpack
# Serialize
data = {"name": "Alice", "age": 30, "scores": [95, 87, 92]}
packed = msgpack.packb(data) # bytes
packed = msgpack.packb(data, use_bin_type=True) # proper binary type handling
# Deserialize
unpacked = msgpack.unpackb(packed, raw=False) # raw=False decodes strings as str
# Streaming
import io
buf = io.BytesIO()
packer = msgpack.Packer()
buf.write(packer.pack({"event": "login"}))
buf.write(packer.pack({"event": "click"}))
buf.seek(0)
unpacker = msgpack.Unpacker(buf, raw=False)
for msg in unpacker:
print(msg)
# Custom types with ext
def encode_datetime(obj):
if isinstance(obj, datetime):
return msgpack.ExtType(1, obj.isoformat().encode())
raise TypeError(f"Unknown type: {type(obj)}")
packed = msgpack.packb(data, default=encode_datetime)
JavaScript
import { encode, decode } from '@msgpack/msgpack';
// Encode
const data = { name: "Alice", age: 30, scores: [95, 87, 92] };
const encoded = encode(data); // Uint8Array
// Decode
const decoded = decode(encoded);
// With streaming
import { Encoder, Decoder } from '@msgpack/msgpack';
const encoder = new Encoder();
const decoder = new Decoder();
Go
import "github.com/vmihailenco/msgpack/v5"
type User struct {
Name string `msgpack:"name"`
Age int `msgpack:"age"`
Scores []int `msgpack:"scores"`
}
// Marshal
data, err := msgpack.Marshal(&User{Name: "Alice", Age: 30})
// Unmarshal
var user User
err = msgpack.Unmarshal(data, &user)
Rust
use serde::{Serialize, Deserialize};
use rmp_serde;
#[derive(Serialize, Deserialize)]
struct User { name: String, age: u32 }
let user = User { name: "Alice".into(), age: 30 };
let bytes = rmp_serde::to_vec(&user)?;
let decoded: User = rmp_serde::from_slice(&bytes)?;
Inspecting
# Quick inspection
import msgpack, json, sys
data = msgpack.unpackb(open("data.msgpack", "rb").read(), raw=False)
print(json.dumps(data, indent=2))
# msgpack-tools (CLI)
msgpack2json < data.msgpack # convert to JSON
json2msgpack < data.json # convert from JSON
Common Use Cases
- Redis: Default serialization for complex values in Redis.
- Caching: Compact cache entries (Memcached, local caches).
- WebSocket communication: Smaller messages than JSON over WebSockets.
- Game networking: Low-latency game state synchronization.
- RPC protocols: MessagePack-RPC, Neovim's API protocol, Fluentd.
- IoT/embedded: Compact data exchange for constrained devices.
- Log forwarding: Fluentd/Fluent Bit use MessagePack internally.
- IPC: Inter-process communication where speed matters.
Pros & Cons
Pros
- Drop-in binary replacement for JSON — same data model, smaller and faster.
- Schema-less — no
.protofiles, no code generation, no schema registry. - Very fast serialization/deserialization across all languages.
- Compact encoding — especially for small integers and short strings.
- Streaming support — can pack/unpack from streams without framing.
- Extension types allow custom type encoding.
- Broad language support — libraries for 50+ languages.
Cons
- Not human-readable — binary format requires tooling to debug.
- No schema enforcement — no validation, no evolution guarantees.
- Map keys can be any type — some languages only support string keys, causing mismatches.
- No standard way to handle dates (extension type -1 exists but adoption varies).
- String vs binary distinction can cause interoperability issues between languages.
- Less compact than schema-based formats (Protobuf, Avro) for structured data.
- No standard compression — must add compression layer yourself.
- No built-in type safety — must validate data in application code.
Compatibility
| Language | Popular Library |
|---|---|
| Python | msgpack-python (msgpack) |
| JavaScript | @msgpack/msgpack |
| Go | vmihailenco/msgpack |
| Rust | rmp-serde, rmpv |
| Java | msgpack-java, Jackson MsgPack |
| C# | MessagePack-CSharp |
| C/C++ | msgpack-c |
| Ruby | msgpack gem |
| PHP | msgpack PECL extension |
MIME type: application/msgpack or application/x-msgpack. File extension: .msgpack or .mp (no standard).
Related Formats
- JSON: Text-based equivalent — human-readable but larger and slower.
- CBOR: RFC 7049 binary format — similar goals, IETF-standardized, more type-rich.
- Protocol Buffers: Schema-based binary format — smaller for structured data but requires code gen.
- BSON: MongoDB's binary JSON — includes date and binary types.
- Avro: Schema-embedded binary format for streaming/storage.
- FlatBuffers: Zero-copy binary format — no deserialization needed.
- UBJSON: Universal Binary JSON — similar to MessagePack, less popular.
Practical Usage
- Use MessagePack as a drop-in replacement for JSON when you need smaller payloads and faster serialization but do not want to maintain schemas -- it shares JSON's data model so migration is straightforward.
- Always set
use_bin_type=True(Python) or equivalent in your language to properly distinguish binary data from strings, preventing cross-language interoperability issues. - Use the streaming/unpacker API for processing sequences of MessagePack values from network sockets or log streams without framing overhead.
- Pair MessagePack with compression (zstd, lz4) for large payloads -- MessagePack reduces redundancy in structure but does not compress repeated values within data.
- Use the built-in Timestamp extension type (-1) for datetime values instead of encoding as strings or Unix integers to ensure consistent cross-language handling.
- For debugging, keep
msgpack2json(from msgpack-tools) in your toolchain to convert binary data to human-readable JSON for inspection.
Anti-Patterns
- Using non-string map keys without verifying receiver support -- MessagePack allows any type as a map key, but many language implementations (JavaScript, Python dicts) only support string keys, causing silent data corruption or errors.
- Assuming MessagePack is a schema-enforced format -- MessagePack provides no validation or schema evolution; if you need guaranteed structure, use Protocol Buffers or Avro instead.
- Choosing MessagePack over Protocol Buffers for structured RPC -- For well-defined service interfaces with evolving schemas, Protobuf's code generation and backward compatibility are superior; MessagePack is best for schema-less or ad-hoc data exchange.
- Ignoring the string vs binary distinction -- Older MessagePack libraries treated strings and binary data identically; always use the modern API with explicit binary type support to avoid garbled data across language boundaries.
- Sending MessagePack over APIs without content-type negotiation -- Always set
Content-Type: application/msgpackand support fallback to JSON for clients that cannot decode binary formats.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.