~rjarry/dlrepo

998ec62a1d224b62d2e4123cd0867d218eaa8c77 — Julien Floret 11 months ago 44f9f8e
fs: ensure upload never replaces an existing blob

When a file is uploaded, it is first uploaded to .uploads/<uuid>, then
moved to .blobs/<digest>.
The .blobs/<digest> file is then hardlinked to the actual file path
into the job.
If however the same file is uploaded more than once, without checking
that its digest already exists on the server, then any existing
.blobs/<digest> file is replaced with the same newly uploaded
file. As a consequence, the existing hardlinks become orphaned.

What's more, if the most recent job is deleted afterwards, the
associated .blobs/<digest> file will have no other hardlink to it, so
it will be deleted at the next run of the cleanup task.
This is catastrophic for containers, because it breaks container
pull. Indeed "docker pull", for example, downloads its layers
from .blobs/<digest>. If the blob has been cleaned up, the pull
command will fail with the "unknown blob" error message, even if the
missing blob can be seen under the job's container format.

The problem can be easily reproduced by running multiple parallel
"dlrepo-cli upload" of the same big file into different jobs.

To avoid this, let's check if the blob already exists before moving
the uploaded file into it. If yes, just keep the existing blob and
remove the newly uploaded one.

Fixes: bd1c23893882 ("server: add filesystem api")
Signed-off-by: Julien Floret <julien.floret@6wind.com>
Acked-by: Thomas Faivre <thomas.faivre@6wind.com>
Acked-by: Robin Jarry <robin@jarry.cc>
1 files changed, 7 insertions(+), 3 deletions(-)

M dlrepo/fs/__init__.py
M dlrepo/fs/__init__.py => dlrepo/fs/__init__.py +7 -3
@@ 439,6 439,10 @@ async def _check_and_move(src: Path, dst: Path, digest: str):
        except OSError:
            pass
        raise ValueError(f"Received data does not match digest: {digest}")
    dst.parent.mkdir(mode=0o755, parents=True, exist_ok=True)
    src.rename(dst)
    dst.chmod(0o644)
    if dst.exists():
        # never replace an existing blob, we would lose any existing hard links
        src.unlink()
    else:
        dst.parent.mkdir(mode=0o755, parents=True, exist_ok=True)
        src.rename(dst)
        dst.chmod(0o644)