fs: ensure upload never replaces an existing blob
When a file is uploaded, it is first uploaded to .uploads/<uuid>, then
moved to .blobs/<digest>.
The .blobs/<digest> file is then hardlinked to the actual file path
into the job.
If however the same file is uploaded more than once, without checking
that its digest already exists on the server, then any existing
.blobs/<digest> file is replaced with the same newly uploaded
file. As a consequence, the existing hardlinks become orphaned.
What's more, if the most recent job is deleted afterwards, the
associated .blobs/<digest> file will have no other hardlink to it, so
it will be deleted at the next run of the cleanup task.
This is catastrophic for containers, because it breaks container
pull. Indeed "docker pull", for example, downloads its layers
from .blobs/<digest>. If the blob has been cleaned up, the pull
command will fail with the "unknown blob" error message, even if the
missing blob can be seen under the job's container format.
The problem can be easily reproduced by running multiple parallel
"dlrepo-cli upload" of the same big file into different jobs.
To avoid this, let's check if the blob already exists before moving
the uploaded file into it. If yes, just keep the existing blob and
remove the newly uploaded one.
Fixes: bd1c23893882 ("server: add filesystem api")
Signed-off-by: Julien Floret <julien.floret@6wind.com>
Acked-by: Thomas Faivre <thomas.faivre@6wind.com>
Acked-by: Robin Jarry <robin@jarry.cc>