~quf/babel-poc

proof of concept for i18n in Rust with compile-time checks

refs

trunk
browse  log 

clone

read-only
https://git.sr.ht/~quf/babel-poc
read/write
git@git.sr.ht:~quf/babel-poc

You can also use your local clone with git send-email.

#babel

This is a proof of concept for multi-language text (where the language is chosen at runtime) with compile-time checks:

  • If a certain piece of text (or language) is not available or duplicate, this is a compile-time error.
  • If a certain piece of text (or language) is available but never used, this is a comile-time warning.

The complete text in all languages is embedded in the binary.

#How does it work?

The folder "i18n" contains several csv files with four columns: a language identifier, a text identifier, the text, and (optionally) a comment.

build.rs reads and analyzes these files and generates code based on them:

The language identifiers across all csv files are collected and an enum Language with one variant for each language is created: If the languages En, Ja, De, and Fr are present, build.rs will create the enum:

enum Language {
    De,
    En,
    Fr,
    Ja,
}

Similarly, the text identifiers are collected and turned into an enum Text with one variant per (unique) identifier.

The text for each language and identifier is collected and compressed, and included in the binary. The offset of each individual text within the whole is also compressed and included in the binary.

Comments are not used and not included in the binary. They are intended to provide space for clarifications, TODO markers, etc.

At runtime, the text and all offsets are uncompressed at the start. Then, the Language and TextId enums can be used to look up the location of the corresponding text in the collection.

#String interpolation

Because static strings are not enough for internationalization, some kind of runtime string interpolation is needed. Here, an extremely basic version is implemented.

#How to use

The program is extremely basic: It prints the current date, asks for your name and then says hello. Choose the language by handing a command line argument.

$ cargo run ger
Heute ist der 16. 5. 2023.

Wie heißt du?
> Marvin
Hallo Marvin.

$ cargo run 日本語
今日は2023年5月16日です。

名前はなんですか。
> Zaphod
Zaphod、こんにちは。

$ cargo run en
It's 5/16/2023.

What's your name?
> Arthur
Hello Arthur.

#Limitations

For production use, language-specific formatting for dates and numbers (decimal separators) would be required. This could be done at a higher level by first formatting the date and then interpolating the formatted date into the string. Here it's done by specifying the date order in the .csv but I'm not sure this approach would fly for bigger projects.

Runtime text interpolation (the user's name) is only checked at runtime. It would be great to verify at compile time that the value to be interpolated is guaranteed to be present. Unfortunately, I'm not sure how to do that. My best idea is a macro fmt!(dictionary, language, text_id, "key1"=value3, "key2"=value2, "key3"=value3, ...) that reads the csvs and checks that every key which appears in the text text_id for any language also has a corresponding "key"=value argument. However, this would only work if the text_id is used verbatim at compile-time (which may be a reasonable assumptions). (Checking that every "key"=value argument also appears in the text for every language is probably a bad idea, but checking that every "key"=value argument appears in some language may be reasonable.)