~cnx/anage

5c614436c6f89b66f750dc47686bfbbb1957024f — Nguyễn Gia Phong 8 months ago 7c24c8c
Finish the paper
3 files changed, 71 insertions(+), 13 deletions(-)

M anage.py
M paper/anage.pdf
M paper/anage.tex
M anage.py => anage.py +2 -7
@@ 74,15 74,10 @@ def dependencies(edges, packages):

def dependents(edges, packages):
    """Return implicit dependents of given packages."""
    egdes, result, queue = defaultdict(set), set(), deque(packages)
    egdes = defaultdict(set)
    for k, v in edges.items():
        for i in v: egdes[i].add(k)
    while queue:
        v = queue.popleft()
        if v in result: continue
        result.add(v)
        queue.extend(egdes[v])
    return result
    return dependencies(egdes, packages)


@group(context_settings={'help_option_names': ('-h', '--help')})

M paper/anage.pdf => paper/anage.pdf +0 -0
M paper/anage.tex => paper/anage.tex +69 -6
@@ 47,8 47,8 @@ specific for each and every task.  In production, such boilerplate
introduces additional complexity as well as latency,
which negatively affect productivity.

As a proper dependency resolver is now baked into `pip` (since release 20.3),
the package installer for Python, it is no longer non-trivial to support
As a proper dependency resolver is now baked into \verb|pip| (since release
20.3), the package installer for Python, it is no longer non-trivial to support
common package management use cases such as autoremoval of orphan dependencies.
This project aimed to provide a proof of concept for future carry-out
of \href{https://github.com/pypa/pip/issues/5823}{such feature}.


@@ 79,10 79,7 @@ to work on this project in particular.

This paper is licensed under
a \href{https://creativecommons.org/licenses/by-sa/4.0/}{Creative Commons
Attribution-ShareAlike 4.0 International License}, while
the proof-of-concept software is available on SourceHut under
a \href{https://www.gnu.org/licenses/agpl-3.0.en.html}{GNU Affero
General Public License version 3} or later.
Attribution-ShareAlike 4.0 International License}.
\pagebreak

\section{Objective}


@@ 225,6 222,72 @@ As \eqref{c11n} is false, the reverse which is \eqref{rm} is true.
\end{proof}

\section{Implementation}
\subsection{Obtaining Dependency Graph}
Through the standard library \verb|importlib.metadata|,
local dependency information can be obtained trivially as follows:
\begin{verbatim}
from collections import defaultdict
from importlib.metadata import distributions

from packaging.requirements import Requirement

def dependency_graph():
    vertices, edges = set(), defaultdict(set)
    for distribution in distributions():
        d = distribution.metadata['Name']
        vertices.add(d)
        for r in distribution.requires or []:
            requirement = Requirement(r)
            marker = requirement.marker
            if marker is None or marker.evaluate({'extra': ''}):
                edges[d].add(requirement.name)
    return vertices, edges
\end{verbatim}

We then define the functions $I$ and $J$ above
as \verb|dependencies| and \verb|dependents| respectively:

\begin{verbatim}
from collections import deque

def dependencies(edges, packages):
    result, queue = set(), deque(packages)
    while queue:
        v = queue.popleft()
        if v in result: continue
        result.add(v)
        queue.extend(edges[v])
    return result

def dependents(edges, packages):
    egdes = defaultdict(set)
    for k, v in edges.items():
        for i in v: egdes[i].add(k)
    return dependencies(egdes, packages)
\end{verbatim}

Manually installed packages $M$ are stored in a text file
specific to the environment.  The packages to be removed
$V \setminus K = V \setminus I(M \setminus J(R))$ are computed as follows:
\begin{verbatim}
manual = set(file.read_text().strip().split())    # M
must_remove = dependents(edges, distributions)    # J(R)
must_keep = manual.difference(must_remove)        # M \ J(R)
should_keep = dependencies(edges, must_keep)      # K = I(M \ J(R))
should_remove = vertices.difference(should_keep)  # V \ K
\end{verbatim}

Operations with side-effects are outsourced to \verb|pip|.
The reference implementation can be found on the Python Package Index
under the name \href{https://pypi.org/project/anage}{anage}.

\section{Conclusion}
Through the abstraction given by graph theory, we was able to deduced
a rather unexpectedly simple method of managing automatically installed
distribution packages that is mathematically proven.
Although the proof-of-concept was not production-ready (as in,
it did not fully comply with all packaging specification such as
\href{https://www.python.org/dev/peps/pep-0508/#extras}{\emph{extras}}
and was lacking certain package managements features), we are confident
about the tool will eventually become helpful with future development.
\end{document}