DRAFT
Binary Application Record Encoding
Binary Application Record Encoding (BARE) is, as the name implies, a simple
binary representation for structured application data.
BARE messages omit type information, and are not self-describing. The structure
of a message must be established out of band, generally by prior agreement and
context - for example, if a BARE message is returned from /api/user/info, it
can be inferred from context that the message represents user information, and
the structure of such messages is available in the documentation for this API.
MESSAGE FORMAT
A BARE message is a single value of a pre-defined type, though the type and its
encoded value may be an aggregate type. The encoding of each type is specified
as follows:
BUILT-IN TYPES
The following primitive data types are supported:
uint, int
A variable-length integer. Each octet of the encoded value has
the most-significant bit set, except for the last octet. The
remaining bits are the integer value in 7-bit groups,
least-significant first.
Signed integers are mapped to unsigned integers using "zig-zag"
encoding: positive values x are written as 2*x + 0, negative
values are written as 2*(^x) + 1; that is, negative numbers are
complemented and whether to complement is encoded in bit 0.
The maximum precision of a varint is 64 bit.
u8, u16, u32, u64
An unsigned little-endian integer with a fixed length in bits.
The precision is 8, 16, 32, and 64 bits respectively.
i8, i16, i32, i64
A signed two's complement, little-endian integer with a fixed
length in bits. The precision is 8, 16, 32, and 64 bits
respectively.
f32, f64
A 32-, or 64-bit IEEE-754 floating point number, little-endian.
bool
A boolean, either true or false, represented respectively by a
one or a zero encoded as an 8-bit unsigned integer. Any non-zero
value is interpreted as true.
enum
A value from a set of possible values enumerated in advance,
encoded as a uint.
string
A UTF-8 string of text, prefixed by the string's length in bytes
as a uint.
data<length>
Arbitrary binary data with a fixed "length" in bytes, e.g.
data<16>. The binary data is encoded literally. The length must
be representable as a u64, but is not encoded into the message.
data
Arbitrary binary data of an undefined length. The length in
bytes is encoded as a uint, followed by the binary data encoded
literally.
void
A type with zero length. It is useful to create user-defined
types which alias void to create discrete options in a tagged
union which do not have any underlying storage.
Additionally, the following aggregate types are supported:
optional<type>
A value of "type" which may or may not be assigned, e.g.
optional<u32>. Represented either as an 8-bit unsigned integer
0, indicating that the value is unset; or any nonzero integer to
indicate that the value is set, followed by the value.
[length]type
An array of values of "type" with a fixed "length", e.g.
[8]string. The encoding of this value is the encoded member
values concatenated to one another, with no delimiters or length
prefix.
[]type
An array of values of "type" with an undefined length, e.g.
[]string. The length of the array in values is encoded into the
message as a uint, followed by the concatenated values.
map[type A]type B
A map of values of type B keyed by values of type A, e.g.
map[u32]string. The encoded representation of a map begins with
the number of key/value pairs encoded as a uint, followed by the
key/value pairs concatenated together. Each key/value pair is
encoded as the encoded key and encoded value concatenated.
The order of items is undefined, and if a key is repeated, the
last key/value pair of that key is considered authoritative.
(type | type | ...)
A tagged union whose value can be one of any type from a set.
Each type in the set is assigned a numeric representation,
starting at zero and incrementing for each type. The value is
encoded as the selected tag as a uint, followed by the value
itself encoded as that type.
struct
A set of values of arbitrary types, concatenated together in an
order known in advance.
USER-DEFINED TYPES
A user-defined type gives a name to a built-in type, or aliases another type.
This creates a distinct type, whose underlying storage is equivalent to the
type it names.
INVARIANTS
The following invariants must be upheld in a BARE schema:
1. Any type which is ultimately a void type (either directly or through
user-defined types) may not be used as an optional type, struct member, array
member, or map key or value. Void types may only be used as members of the
set of types in a tagged union.
2. The lengths of fixed-length arrays and data types must be at least 1.
3. Structs must have at least one field.
4. Unions must have at least one type.
5. Map keys must use a primitive type which is not data, data<length>.
6. Two or more values in the same enum cannot share the same value.
MESSAGE SCHEMA LANGUAGE
The use of a schema language is optional, and implementations should support
decoding arbitrary BARE messages without such a document, or by defining the
schema in a manner utilizing more native tools available from the language or
runtime environment.
However, it may be useful to have a schema language, for use with code
generation, documentation, or interoperability. A domain-specific language is
provided for this purpose.
During lexical analysis, whitespace may be used to separate tokens, and is then
discarded. Additionally, "#" is used for comments; if encountered, the "#"
character and any subsequent characters are discarded until a LF is found. The
syntax of this language is represented by the following ABNF grammar (see
RFC5234):
schema = 1*user-type
user-type = "type" user-type-name non-enum-type
user-type /= "enum" user-type-name enum-type
type = non-enum-type / enum-type
non-enum-type = primitive-type / aggregate-type / user-type-name
user-type-name = UPPER *(ALPHA / DIGIT) ; First letter is uppercase
primitive-type = "int" / "i8" / "i16" / "i32" / "i64"
primitive-type /= "uint" / "u8" / "u16" / "u32" / "u64"
primitive-type /= "f32" / "f64"
primitive-type /= "bool"
primitive-type /= "string"
primitive-type /= "data" / ("data" "<" integer ">")
primitive-type /= "void"
enum-type = "{" enum-values "}"
enum-values = enum-value / (enum-values enum-value)
enum-value = enum-value-name / (enum-value-name "=" integer)
enum-value-name = UPPER *(UPPER / DIGIT / "_")
aggregate-type = optional-type
aggregate-type /= array-type
aggregate-type /= map-type
aggregate-type /= union-type
aggregate-type /= struct-type
optional-type = "optional" "<" type ">"
array-type = "[" [integer] "]" type
integer = 1*DIGIT
map-type = "map" "[" type "]" type
union-type = "(" union-members ")"
union-members = union-member / (union-members "|" union-member)
union-member = type ["=" integer]
struct-type = "{" fields "}"
fields = field / (fields field)
field = 1*ALPHA ":" type
UPPER = %x41-5A ; uppercase ASCII letters
Here is a simple example schema using this language:
type PublicKey data<128>
type Time string # ISO 8601
enum Department {
ACCOUNTING
ADMINISTRATION
CUSTOMER_SERVICE
DEVELOPMENT
# Reserved for the CEO
JSMITH = 99
}
type Customer {
name: string
email: string
address: Address
orders: []{
orderId: i64
quantity: i32
}
metadata: map[string]data
}
type Employee {
name: string
email: string
address: Address
department: Department
hireDate: Time
publicKey: optional<PublicKey>
metadata: map[string]data
}
type Person (Customer | Employee)
type Address {
address: [4]string
city: string
state: string
country: string
}
The names of fields and user-defined types are informational: they are not
represented in BARE messages, but they may be used for code generation or to
provide meaningful names for readers of the schema.
Enum values are also informational. Values without an assigned integer are
assigned automatically in the order that they appear, starting from zero and
incrementing for each subsequent unassigned value. If an enum value is
explicitly specified, automatic assignment continues from that value plus one
for subsequent enum values.
Union type members are assigned a tag in the order that they appear, starting
from zero and incrementing for each subsequent type. If a tag value is
explicitly specified, automatic assignment continues from that value plus one
for subsequent values.
COMPATIBILITY BETWEEN SCHEMA UPGRADES
This section is informative.
The recommended approach for message versioning is with the use of union types.
Adding new types to a union is backwards compatible with previous messages. For
example, the following schema provides several versions of a message:
type Message (MessageV1 | MessageV2 | MessageV3)
type MessageV1 {
...
}
type MessageV2 {
...
}
type MessageV3 {
...
}
An updated schema which added a MessageV4 would still be able to decode
versions 1, 2, and 3. However, you must make the decision to use versioning in
advance. Replacing a struct type with a union type that contains the same
struct is NOT backwards compatible.
If you later decide to deprecate MessageV1, you may remove it and specify the
initial tag explicitly:
type Message (MessageV2 = 1 | MessageV3)
type MessageV2 {
...
}
type MessageV3 {
...
}
SECURITY CONSIDERATIONS
Implementations must take care when decoding types with an unbounded length
(e.g. []int, map, data), as a malicious message can be created with an excessive
length and cause a naive implementation to enable denial-of-service attacks,
failed allocations, or other security faults.