Sync for remote work (introduced Field)
Fixed example in README (bad data)
Modified README + added LICENSE_BSDIFF (thanks git)
rpkg is a software packaging solution, written entirely in Rust. It uses bsdiff delta compression and zlib to provide efficient and powerful distribution of updates, while also supporting the traditional "ship a directory" method.
rpkg works on the theory of "bundles" and "boxes". Bundles are a more traditional form of package: a compressed tar archive, with minimal metadata prepended.
However, boxes are more unique: they're a compressed series of diffs between versions of a package. One could ship their entire software history as a bundle and a box, where the box provides diffs between each version.
Currently, it is shipped as a single crate, meaning library users are stuck with the dependencies of the CLI; this will be fixed with the first proper release.
(the following tests were performed via rpkg-utils
)
This ends up saving a lot of space. I downloaded the latest 30 versions of syn
(2.0.67 through 2.0.96) and bundled each of them, then created a box with diffs
between each consecutive version.
The .tar.gz
-compressed files for all of the versions total 7.9M. The box
itself takes up ~362K. To be useful it also requires a "base" version package,
syn
2.0.67 - which is only 517K. That's only ~879K, as opposed to 7.9M:
almost a 9x improvement.
The natural concern is that the cost of this space saving is slow access times.
For example, to access the latest version of syn
this way, one has to go
through all the versions between the base and latest. Thankfully, this isn't as
bad as it sounds; on the older laptop I tested on, this took ~0.75 seconds.
With more intelligent boxing strategies (e.g. providing "bridge" diffs between
minor/major versions), you could decrease that time, and mitigate any added
inefficiencies from things like large file sizes or many versions.
The use-case I originally had in mind was for a package manager, like Cargo, deduplicating dependencies. Instead of having several versions of the same package downloaded separately, one could use this boxing strategy. It could (potentially) also serve as the basis for a VCS, though I've not looked into this possibility at all.
Migrating existing software to rpkg is trivial: simply add an rpkg.toml
file
with a few lines of mostly-static information (see below). The files themselves
are self-contained and easy to host. The format is simple and flexible. Only the
bare minimum of restrictions are applied (e.g. to the version field). In short,
rpkg could hopefully be used in any number of applications.
The following terminology will be used throughout the documentation and source code of rpkg, so it's worth understanding.
Term | Usage |
---|---|
Package | The full source code of a piece of software |
Bundle | A compressed package |
Box | A compressed diff (or series of diffs) between two versions of a package |
Manifest | The full TOML metadata of a package, i.e. rpkg.toml |
Slug | The stripped-down metadata shipped with a bundle |
Digest | The stripped-down metadata shipped with a box |
An rpkg package can be any directory with a manifest (rpkg.toml
) in the root.
The manifest contains the following information:
Field | Required | Description |
---|---|---|
name |
true | The name of the package. |
owner |
true | The owner of the package. |
version |
true | The current version of the package. |
license |
false | The license(s) of the package. |
homepage |
false | The homepage of the package. |
repository |
false | The source repository (usually git) of the package. |
Any data not expected will be ignored, but not removed.
The name
field must be non-empty, and comprised of only:
a-z
and A-Z
)0-9
)-
)_
)The version
field must be non-empty, and comprised of only:
0-9
).
)-
)_
)a-z
and A-Z
)A bundle is an ordinary GZip file, ideally with the file extension .rpkg
. The
data is split into the following sections:
rpkg
A box is an ordinary GZip file with the file extension .rbox
. The data is
divided into three sections. First is a 64-bit little-endian unsigned integer
giving the length of the following section. Second is a stripped-down version of
the manifest (called a digest), giving all the information required to use the
box. The digest's contents are as follows:
rbox
The remaining data is a sequence of diffs, each preceded by a 64-bit little-endian unsigned integer describing the length of the diff in bytes. Here's an example of a 2-diff box (N being the end of the digest, and M being the end of the first diff):
Byte range | Description |
---|---|
0-7 | Digest length |
0-N | Digest |
N+1-N+8 | Diff length |
N+9-M | Diff |
M+1-M+8 | Diff length |
M+9-end | Diff |
The diffs are in bsdiff format, generated by the bsdiff
Rust crate. They are
the diff between two tar archives of the package.
This software is licensed under the MIT license OR the Apache 2.0 license, at
your choice. It makes use of the bsdiff
crate, which is licensed under the
2-clause BSD license, located in this repository at LICENSE-BSDIFF
.