~poldi1405/lyml

b8ff5c2642bdbdae3cc7dfd02b16cff90b3026f4 — Moritz Poldrack 8 months ago
initial commit
2 files changed, 263 insertions(+), 0 deletions(-)

A README.md
A lyml.abnf
A  => README.md +231 -0
@@ 1,231 @@
# LYrical Markup Language

**Lyrical** *adjective* – having a pleasantly flowing quality suggestive of music

## Why yet another config format?

Just because. And maybe someone actually finds it useful.

## Core Ideas

LYML combines some ideas from various sources:

- maps onto a hashmap ([TOML](https://toml.io/en/))
- has types ([TOML](https://toml.io/en/))
- can be read "like a sentence" ([OpenBSD-like config](https://why-openbsd.rocks/fact/configuration-syntax/))
- can be written in blocks ([scfg](https://git.sr.ht/~emersion/scfg))

## Example

```
address "ta.rba.sh"

listen-for [
	protocol "http" on-port 80 with-tls disabled
	protocol "http" on-port 443 with-tls enabled and-advanced-options {
		cert-domains "ta.rba.sh" "rba.sh"
		from-ca "ZeroSSL"
	}
	protocol "plain" on-port 5
]

limit {
	filesize "256M"
	per-ip {
		upload "512M"
		download "5G"
	}
	tarball-files-to 10
	inactive-time-to "7d"
}
```

matches this JSON

```json
{
	"address":               "ta.rba.sh",
	"listen-for.0.protocol": "http",
	"listen-for.0.on-port":   80,
	"listen-for.0.with-tls":  false,
	"listen-for.1.protocol":  "https",
	"listen-for.1.on-port":   443,
	"listen-for.1.with-tls":  true,
	"listen-for.1.and-advanced-options.cert-domains": [
		"ta.rba.sh",
		"rba.sh",
	],
	"listen-for.1.and-advanced-options.from-ca": "ZeroSSL",
	"listen-for.2.protocol":                     "plain",
	"listen-for.2.on-port":                      5,
	"listen-for.2.with-tls":                     false,
	"limit.filesize":                            "256M",
	"limit.per-ip.upload":                       "512M",
	"limit.per-ip.download":                     "5G",
	"limit.tarball-files-to":                    10,
	"limit.inactive-time-to":                    "7d",
}
```

or this Go-struct:

```go
type Config struct {
	Address     string
	MaxFilesize string
	Listener    []struct{
		Protocol string         `lyml:"protocol"`
		Port     int            `lyml:"on-port"`
		TLS      bool           `lyml:"with-tls"`
		Advanced map[string]any `lyml:"and-advanced-options"`
	} `lyml:"listen-for"`
	Limit struct{
		Filesize       string
		PerIP          map[string]
		TarballFiles   int    `lyml:"tarball-files-to"`
		InactivePeriod string `lyml:"inactive-time-to"`
	}
}
```

## Spec

LYML aims to be readable "like a text". For that purpose, special characters
have been reduced as much as possible.

Important:
- LYML paths are **not** case sensitive
- LYML documents **must** be encoded using UTF-8
- Whitespace means tab (U+0009) or space (U+0020)
- Newline means linefeed (U+000A)
	- Parsers *may* ignore leading carriage returns (U+000D)

### Keys

> Keys are case-insensitive!

Keys **must** be made from at least one ASCII letter and can be followed by an
arbitrary number of ASCII letters, numbers, dashes, or underscores. Dashes and
underscores **must not** follow each other or stand at the end of a key. Keys
*should* be conducive to a readable configuration.

Regular Expression: [`[a-z]([a-z0-9]+|_[a-z0-9]+|-[a-z0-9]+)*`](https://regex101.com/r/UbsNO5/1)

```
key "value"
key-name "value"
key_name "value"
very-long_key-name_with-mixed_underscores-and_dashes "value"
```

### Paths

Keys can be combined to paths using a period `.` to separate elements.

```
map { key "value" }

map2.key "another value"
```

```json
{
	"map.key":  "value",
	"map2.key": "another value",
}
```

### Values

Values can be composed of types string, float, integer, boolean, arrays, and
maps. Their type is determined based on their structure or content. Types in
arrays **must not** be mixed or booleans.

- Strings
	- enclosed in quotation marks `"`
	- contained quotation marks are escaped `\"`
	- contained linebreaks are escaped `\n`
	- supports `\x__` for ASCII and `\u____` for Unicode characters
- Raw Strings
	- special case of strings
	- enclosed in triple single-quotes `'''`
- Floats
	- contains a period `.` to separate integer and fraction
- Integer
	- consist of nothing but numbers
- Booleans
	- `true`, `on`, `enable`, `enabled`
	- `false`, `off`, `disable`, `disabled`
- Arrays
	- can consist of strings, floats, integers, and maps

Regular Expression: [`("(\"|[^"\t\n])+"([ \t]+"[^"\t\n]+")*|\d+\.\d+([ \t]+\d+\.\d+)*|\d+([ \t]+\d+)*|(true|yes|enabled?|false|no|disabled?))`](https://regex101.com/r/BOs0nM/1)

#### Maps

Maps are created by enclosing the structures in curly braces `{…}`.

```
key1 {
	key "value" another-key 42
}

key2 {
	key "something"
	another-key 42
}
```
```json
{
	"key1.key":         "value",
	"key1.another-key": 42,
	"key2.key":         "something",
	"key2.another-key": 42,
}
```

#### Arrays

> Depending on the parser, array keys may be encoded using path-elements that
> do not conform to the key-format requirements. Addressing these fields
> afterwards, is not permitted in that case.

Arrays are created by chaining values or enclosing structure in `[…]` and using
line-wise notation. 

```
array "value1" "value2"

map-array [
	map-key1 42 map-key2 "this is the second key"
	map-key1 69 map-key2 "this is the second array of the array"
]
```
```json
{
	"array":                ["value1", "value2"],
	"map-array.0.map-key1": 42,
	"map-array.0.map-key2": "this is the second key",
	"map-array.1.map-key1": 69,
	"map-array.1.map-key1": "this is the second array of the array",
}
```

### Why no…

#### …boolean arrays?

Because they don't exactly lend themselves to "readable" configurations:

```
tls enabled on no disable yes
```

#### …time type?

Times are either stored as a string or an integer. Parsing of data is left to
libraries.

#### …numeric keys?

Numeric keys are not readable.

A  => lyml.abnf +32 -0
@@ 1,32 @@
document = *1( *( directive EOL ) directive )

directive = *WHITESPACE 1*( *( keyvalues WHITESPACE) keyvalues / block ) *1comment

path = *( key "." ) key
key = ASCIINONUM *( ASCII / "_" ASCII / "-" ASCII )

keyvalues = key 1*WHITESPACE ( *( string WHITESPACE ) string / *( float WHITESPACE ) float / *( integer WHITESPACE ) integer / boolean / "{" multiline-values "}" )
block = key 1*WHITESPACE "{" multiline-values "}"
multiline-values = *( *( EOL / WHITESPACE ) keyvalues ) *( EOL / WHITESPACE )
comment = "#" *UNICODE

string = basic-string / literal-string
basic-string = DQUOTE *( UNICODE / escape-sequence ) DQUOTE
literal-string = 3LQUOTE *UNICODE 3LQUOTE
float = 1*DIGIT "." 1*DIGIT
integer = 1*DIGIT
boolean = %i"true" / %i"yes" / %i"enable" / %i"enabled" / %i"false" / %i"no" / %i"disable" / %i"disabled"

escape-sequence = "\" ESCAPECHAR

ASCII = DIGIT / ASCIINONUM
ASCIINONUM = %x65-90 / %x97-7A
DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
DQUOTE = %x22
EOL = %x0A / %x0D %x0A ; LF or CR LF
ESCAPECHAR = "\" / "t" / "n" / %x22 / "x" 2HEX / "u" 4HEX
HEX = DIGIT / %i"A" / %i"B" / %i"C" / %i"D" / %i"E" / %i"F"
LQUOTE = %x27
UNICODE = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFF
              ; not "     not \     not DEL
WHITESPACE = %x20 / %x09
\ No newline at end of file