~kyllingene/rpkg

Sync for remote work (introduced Field)
Fixed example in README (bad data)
Modified README + added LICENSE_BSDIFF (thanks git)

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~kyllingene/rpkg
read/write
git@git.sr.ht:~kyllingene/rpkg

You can also use your local clone with git send-email.

#rpkg

rpkg is a software packaging solution, written entirely in Rust. It uses bsdiff delta compression and zlib to provide efficient and powerful distribution of updates, while also supporting the traditional "ship a directory" method.

rpkg works on the theory of "bundles" and "boxes". Bundles are a more traditional form of package: a compressed tar archive, with minimal metadata prepended.

However, boxes are more unique: they're a compressed series of diffs between versions of a package. One could ship their entire software history as a bundle and a box, where the box provides diffs between each version.

Currently, it is shipped as a single crate, meaning library users are stuck with the dependencies of the CLI; this will be fixed with the first proper release.

#Justification

(the following tests were performed via rpkg-utils)

This ends up saving a lot of space. I downloaded the latest 30 versions of syn (2.0.67 through 2.0.96) and bundled each of them, then created a box with diffs between each consecutive version.

The .tar.gz-compressed files for all of the versions total 7.9M. The box itself takes up ~362K. To be useful it also requires a "base" version package, syn 2.0.67 - which is only 517K. That's only ~879K, as opposed to 7.9M: almost a 9x improvement.

The natural concern is that the cost of this space saving is slow access times. For example, to access the latest version of syn this way, one has to go through all the versions between the base and latest. Thankfully, this isn't as bad as it sounds; on the older laptop I tested on, this took ~0.75 seconds. With more intelligent boxing strategies (e.g. providing "bridge" diffs between minor/major versions), you could decrease that time, and mitigate any added inefficiencies from things like large file sizes or many versions.

The use-case I originally had in mind was for a package manager, like Cargo, deduplicating dependencies. Instead of having several versions of the same package downloaded separately, one could use this boxing strategy. It could (potentially) also serve as the basis for a VCS, though I've not looked into this possibility at all.

Migrating existing software to rpkg is trivial: simply add an rpkg.toml file with a few lines of mostly-static information (see below). The files themselves are self-contained and easy to host. The format is simple and flexible. Only the bare minimum of restrictions are applied (e.g. to the version field). In short, rpkg could hopefully be used in any number of applications.

#Terminology

The following terminology will be used throughout the documentation and source code of rpkg, so it's worth understanding.

Term Usage
Package The full source code of a piece of software
Bundle A compressed package
Box A compressed diff (or series of diffs) between two versions of a package
Manifest The full TOML metadata of a package, i.e. rpkg.toml
Slug The stripped-down metadata shipped with a bundle
Digest The stripped-down metadata shipped with a box

#Package layout

An rpkg package can be any directory with a manifest (rpkg.toml) in the root. The manifest contains the following information:

Field Required Description
name true The name of the package.
owner true The owner of the package.
version true The current version of the package.
license false The license(s) of the package.
homepage false The homepage of the package.
repository false The source repository (usually git) of the package.

Any data not expected will be ignored, but not removed.

#Field restrictions

The name field must be non-empty, and comprised of only:

  • Letters (a-z and A-Z)
  • Numbers (0-9)
  • Dashes (-)
  • Underscores (_)

The version field must be non-empty, and comprised of only:

  • Numbers (0-9)
  • Periods (.)
  • Dashes (-)
  • Underscores (_)
  • Letters (a-z and A-Z)

#Bundle layout

A bundle is an ordinary GZip file, ideally with the file extension .rpkg. The data is split into the following sections:

  • The ASCII bytes rpkg
  • The total length of the slug (64-bit, little-endian, unsigned)
  • The name of the package, followed by a single newline
  • The version of the package, followed by a single newline
  • The owner of the package

#Box layout

A box is an ordinary GZip file with the file extension .rbox. The data is divided into three sections. First is a 64-bit little-endian unsigned integer giving the length of the following section. Second is a stripped-down version of the manifest (called a digest), giving all the information required to use the box. The digest's contents are as follows:

  • The ASCII bytes rbox
  • The total length of the digest (64-bit, little-endian, unsigned)
  • The name of the package, followed by a single newline
  • The length of the owner of the package (ditto)
  • The owner of the package
  • The number of versions in the box (ditto)
  • The versions contained (in order), separated by singular newlines, with the format:
    • Source version
    • A single space
    • Target version

The remaining data is a sequence of diffs, each preceded by a 64-bit little-endian unsigned integer describing the length of the diff in bytes. Here's an example of a 2-diff box (N being the end of the digest, and M being the end of the first diff):

Byte range Description
0-7 Digest length
0-N Digest
N+1-N+8 Diff length
N+9-M Diff
M+1-M+8 Diff length
M+9-end Diff

#Diff layout

The diffs are in bsdiff format, generated by the bsdiff Rust crate. They are the diff between two tar archives of the package.

#License

This software is licensed under the MIT license OR the Apache 2.0 license, at your choice. It makes use of the bsdiff crate, which is licensed under the 2-clause BSD license, located in this repository at LICENSE-BSDIFF.

Do not follow this link