1ff1746272a97d9c58d2e6a8936592f90fd5cd47 — Nguyễn Gia Phong 8 months ago 1f42cc7
Migrate GSoC 2020 check-ins
A blog/gsoc2020/checkin20200601.md => blog/gsoc2020/checkin20200601.md +45 -0
@@ 0,0 1,45 @@
rss = "GSoC 2020: First Check-In"
date = Date(2020, 6, 1)
@def tags = ["pip", "gsoc"]

# First Check-In

Hi everyone, I am McSinyx, a Vietnamese undergraduate student
who loves [free software][].  This summer I am working with
the maintainers and the contributors of `pip` to make
the package manager {{pip 825 "download in parallel"}}.

## What did I do during the community bonding period?

Aside from bonding with `pip`'s maintainers and contributors as well as
with my mentors, I was also experimenting on the theoretical and technical
obstacles blocking this GSoC project.  Pradyun Gedam (a mentor of mine)
suggested making [a proof of concept][] to determine if parallel downloading
can play nicely with ResolveLib_'s abstraction and we are reviewing it
together.  On the technical side, we `pip`'s committers are exploring
{{pip 8169 "available options for parallelization"}} and I made an attempt to
{{pip 8320 "make use of Python's standard worker pool in a portable way"}}.

## Did I get stuck anywhere?

Yes, of course!  Neither of the experiments above is finished as of
this moment.  Though, I am optimistic that the issues will not be
real blockers and we will figure that out in the next few days.

## What is coming up next?

As planned, this week I am going to refactor the package downloading code
in `pip`.  The main purpose is to decouple the networking code from
the package preparation operation and make sure that it is thread-safe.

In addition, I am also continuing mentioned experiments to have a better
confidence on the future of this GSoC project.

To other GSoC students, mentors and admins reading this, I am wishing
you all good health and successful projects this summer!

[free software]: https://www.gnu.org/philosophy/free-sw.html
[a proof of concept]: https://gist.github.com/McSinyx/513dbff71174fcc79f1cb600e09881af
[ResolveLib]: https://pypi.org/project/resolvelib

A blog/gsoc2020/checkin20200615.md => blog/gsoc2020/checkin20200615.md +45 -0
@@ 0,0 1,45 @@
rss = "GSoC 2020: Second Check-In"
date = Date(2020, 6, 15)
@def tags = ["pip", "gsoc"]

# Second Check-In

Hi everyone and may the odds ever in your favor, especially during this
tough time!

## What did I do last week?

Not as much I wished, apparently (-:

* Finalizing {{pip 8411 "the refactoring patch"}}
  of `operations.prepare.prepare_linked_requirement`
* {{pip 8423 "Nitpicking some logging calls"}}.  This (as well as the next one)
  was to fill up the time my brain not being as productive as I want it to XD
* {{pip 8423 "Beginning to migrate"}} from `%`- to `{}`-style logging.
  The amount of tests failing due to this was way beyond my imagination,
  but I got functional tests for `pip install` and unit tests passing now!
* {{pip 8442 "Mocking up a working partial wheel download during
  dependency resolution"}} for [the new resolver][].

## Did I get stuck anywhere?

Yes, of course!  {{pip 8320 "Parallel maps"}} are still stalling
as well as other small PRs listed above.  The failure related to
`logging` are still making me pulling my hair out and the proof of
concept for partial wheel downloading is too ugly even for a PoC.
I imagine that I will have a lot of clean up to do this week (yay!).

## What is coming up next?

I'm trying get the multi-{threading,processing} facilities merged ASAP
to start rolling it out in practice.  The first thing popping out of my
head is to get back {{pip 7962 "the multi-threaded"}} `pip list -o`.

The other experimental improvement (this phrase does not sound right!)
I would like to get done is the partial wheel download.  It would be
really nice if I can get both included as `unstable-feature`'s
in {{pip 7628#issuecomment-636319539 "the upcoming beta release of pip 20.2"}}.

[the new resolver]: http://www.ei8fdb.org/thoughts/2020/05/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts/

A blog/gsoc2020/checkin20200629.md => blog/gsoc2020/checkin20200629.md +44 -0
@@ 0,0 1,44 @@
rss = "GSoC 2020: Third Check-In"
date = Date(2020, 6, 29)
@def tags = ["pip", "gsoc"]

# Third Check-In

Holla, holla, holla!  Last seven days has not been a really productive week
for me, though I think there are still some nice things to share with
you all here!  The good news is that I've finish my last leçon as a somophore,
the bad news is that I have a bunch of upcoming tests, mainly in the form
of group projects and/or presentation (phew!).  Enough about me,
let's get back to `pip`:

## What did I do last week?

Not much, actually )-:

* Write some tests for {{pip 8467 "the HTTP range mapping for wheel"}}.
* {{pip 8504 "Try to bring back"}} multithreaded `pip list --outdated`
  and `--uptodate`, as {{pip 8320 "the parallel"}} `map` was merged
  earlier today.
* Nitpick {{pip 8332}}
  (yep it's a new low for me to include this to the list (-:).

## Did I get stuck anywhere?

Not exactly, since I didn't do much d-;  [Many of my PRs][] are stalling though.
On one hand the maintainers of `pip` are all volunteers working in
their free time, on the other hand I don't think I have tried hard enough
to get their attention on my PRs.

## What is coming up next?

I'll try my best getting the following merged upstream before
{{pip 8206 "the upcoming beta release"}}:

* Parallel networking for `pip list`: {{pip 8504}}
* Lazy wheel for dependency information: {{pip 8467}}, {{pip 8411}}
  (to determine if hashing is required) and {{pip 8467#issuecomment-648717032
  "a new patch introducing this as an unstable feature"}}

[Many of my PRs]: https://github.com/pulls?q=is:open+is:pr+author:McSinyx+repo:pypa/pip+sort:updated-desc

A blog/gsoc2020/checkin20200713.md => blog/gsoc2020/checkin20200713.md +35 -0
@@ 0,0 1,35 @@
rss = "GSoC 2020: Fourth Check-In"
date = Date(2020, 7, 13)
@def tags = ["pip", "gsoc"]

# Fourth Check-In

Hello there! I'm having my second year's last exam tomorrow,
but it [feels like summer][] already!  I've been finalizing quite a few things
to get them ready for pip 20.2b2.

## What did I do last week?

I've spent most of the time on getting {{pip 8532 "the opt-in"}} for obtaining
dependency information via lazy wheels ready.  It will be available as
`--use-feature=fast-deps` and only has effect when
`--use-feature=2020-resolver` also presents.

While waiting for reviews and suggestions, I made some patches for
internal cleansing, namely {{pip 8568}}, {{pip 8571}} and {{pip 8578}}.
Some of the similar patches I made earlier were also merged last week:
{{pip 8456}} and {{pip 8538}}.

## Did I get stuck anywhere?

Not really, everything was going as expected for me.

## What is coming up next?

After {{pip 8532}}, I'll work on the parallel download of the postponed wheels.
My main current concern is with how the download progress will be reported
to the users, but I think I'll figure it out soon.

[feels like summer]: https://www.youtube.com/watch?v=F1B9Fk_SgI0

A blog/gsoc2020/checkin20200727.md => blog/gsoc2020/checkin20200727.md +37 -0
@@ 0,0 1,37 @@
rss = "GSoC 2020: Fifth Check-In"
date = Date(2020, 7, 27)
@def tags = ["pip", "gsoc"]

# Fifth Check-In

Hello and I hope y'all are still doing well!

## What did I do last week?

I was not really productive last week—most of the following tickets are fillers
to make use of the spare cycles I had when I was still trying to figure out
the way to implement the main work.

* Finalize the `--use-feature=fast-deps` flag ({{pip 8588}})
* Improve mocking of environment variables in the test suit ({{pip 8614}})
* Finalize the fix for verbose/quiet options specified via
  configuration files and environment variables ({{pip 8578}})
* Clean up a tiny bit in the resolver internal API ({{pip 8629}})
* Start working on seperating the download of wheels
  from dependency resolution ({{pip 8638}})

## Did I get stuck anywhere?

I'm struggling on refactoring the code to support separate download.
`pip`'s codebase was not intended for this and thus there are
many execution paths and other details entangled around the relevant area.

## What is coming up next?

`pip` 20.2 is going to be released within the next few days with
`--use-feature=fast-deps` included and I'm mentally prepare to fix
any undiscovered problem.  At the same time, I will continue working
on {{pip 8638}} and hopefully get it done soon enough to begin drafting
download parallelization strategies, mostly with the UI.

A blog/gsoc2020/checkin20200810.md => blog/gsoc2020/checkin20200810.md +33 -0
@@ 0,0 1,33 @@
rss = "GSoC 2020: Sixth Check-In"
date = Date(2020, 8, 10)
@def tags = ["pip", "gsoc"]

# Sixth Check-In

Hello there!

## What did I do last week?

It has been a quite fun week for me, given the current state of
development and the newly dicovered bugs thanks to pip 20.2 release:

* Initiate discussion with the maintainers of pip on isolating
  networking code for late download in parallel ({{pip 8697}})
* Discuss the UI of parallel download ({{pip 8698}})
* Log debug information relating lazy wheel decision ({{pip 8710}})
* Disable caching for range requests ({{pip 8716}})
* Dedent late download logs ({{pip 8722}})
* Add a hook for batch downloading (third attempt I think) ({{pip 8737}})
* Test hash checking for fast-deps ({{pip 8743}})

## Did I get stuck anywhere?

Not exactly, everything is going smoothly and I'm feeling awesome!

## What is coming up next?

I'll try to solve {{pip 8697}} and {{pip 8698}} within the next few days.
I am optimistic that the parallel download prototype will be done
within this week.

A blog/gsoc2020/checkin20200824.md => blog/gsoc2020/checkin20200824.md +26 -0
@@ 0,0 1,26 @@
rss = "GSoC 2020: Final Check-In"
date = Date(2020, 8, 24)
@def tags = ["pip", "gsoc"]

# Final Check-In

Hello there!

## What did I do last week?

Not much, but seemingly implementation-wise I have finished my GSoC project:

* Finish the implementation of wheels' parallel download ({{pip 8771}})
* Help make `pip`'s CI green again ({{pip 8790}})
* Reformat a few spots in user guide ({{pip 8795}})

## Did I get stuck anywhere?

I got sick, but I am recovering now!

## What is coming up next?

I will try to spend the time I got left within the scope of GSoC
to {{pip 8720 "improve cache usage of the fast-deps feature"}}.

A blog/gsoc2020/index.md => blog/gsoc2020/index.md +151 -0
@@ 0,0 1,151 @@
rss = "GSoC 2020 final report"
date = Date(2020, 8, 31)
@def tags = ["fun", "pip", "gsoc"]

# Google Summer of Code 2020

In the summer of 2020, I worked with the contributors of `pip`, trying
to improve the networking performance of the package manager.  Admittedly, at
the end of the [internship][] period, [the benchmark said otherwise][benchmark];
though I really hope the clean-up and minor fixes I happened to be doing
to the codebase over the summer, in addition to the implementation of parallel
utils and lazy wheel, might actually help the project.

Personally, I learned a lot: not just about Python packaging and
networking stuff, but also on how to work with others.  I am really
grateful to {{github pradyunsg}} (my mentor), {{github chrahunt}},
{{github uranusjr}}, {{github pfmoore}}, {{github brainwane}},
{{github sbidoul}}, {{github xavfernandez}}, {{github webknjaz}},
{{github jaraco}}, {{github deveshks}}, {{github gutsytechster}},
{{github dholth}}, {{github dstufft}}, {{github cosmicexplorer}}
and {{github ofek}}.  While this feels like a long shout-out list,
it really isn't.  These people are the maintainers, the contributors of `pip`
and/or other Python packaging projects, and more importantly, they have been
more than helpful, encouraging and patient to me throughout my every activities,
showing me the way when I was lost, fixing me when I was wrong, putting up with
my carelessness and showing me support across different social media.

To best serve the community, below I have tried my best to document
what I have done, how I've done it and why I've done it for over
the last three months.  At the time of writing, some work is still in progress,
so these also serve as a reference point for myself and others to reason
about decisions in relevant topics.


## The Main Story

The storyline can be divided into the following four main acts.

### Act One: Parallelization Utilities

In this first act, I ensured the portibility of parallelization
measures for later use in the final act.  Multithreading and multiprocessing
`map` were properly fellback on platforms without full support.

* {{pip 8320}}: Add utilities for parallelization (close {{pip 8169}})
* {{pip 8538}}: Make `utils.parallel` tests tear down properly
* {{pip 8504}}: Parallelize `pip list --outdated` and `--uptodate`
  (using {{pip 8320}})

### Act Two: Lazy Wheels

As proposed by {{github cosmicexplorer}} in {{pip 7819}}, it is possible to only
download a portion of a wheel to obtain metadata during dependency resolution.
Not only that this would reduce the total amount of data to be transmitted over
the network in case the resolver needs to perform heavy backtracking, but also
it would create a synchronization point at the end of the resolution progress
where parallel downloading can be applied to the needed wheels (some wheels
solely serve their metadata during dependency backtracking and are not needed
by the users).

* {{pip 8467}}: Add utitlity to lazily acquire wheel metadata over HTTP
* {{pip 8584}}: Revise lazy wheel and its tests
* {{pip 8681}}: Make range requests closer to chunk size (help {{pip 8670}})
* {{pip 8716}} and {{pip 8730}}: Disable caching for range requests

### Act Three: Late Downloading

During this act, the main works were refactoring to integrate the *lazy wheel*
into `pip`'s codebase and clean up the way for download parallelization.

* {{pip 8411}}: Refactor `operations.prepare.prepare_linked_requirement`
* {{pip 8629}}: Abstract away `AbstractDistribution`
  in higher-level resolver code
* {{pip 8442}}, {{pip 8532}} and {{pip 8588}} (later reworked by
  {{github chrahunt}} in {{pip 8685}}): Use lazy wheel to obtain
  dependency information for the new resolver
* {{pip 8743}}: Test hash checking for `fast-deps`
* {{pip 8804}}: Check download directory before making range requests

### Act Four: Batch Downloading in Parallel

The final act is mostly about the UI of the parallel download.
My work involved around how the progress should be displayed
and how other relevant information should be reported to the users.

* {{pip 8710}}: Revise method fetching metadata using lazy wheels
* {{pip 8722}}: Dedent late download logs (fix {{pip 8721}})
* {{pip 8737}}: Add a hook for batch downloading
* {{pip 8771}}: Parallelize wheel download

The Side Quests

In order to keep the wheel turning (no pun intended) and avoid wasting time
waiting for the pull requests above to be reviewed, I decided to create
even more PRs (as I am typing this, many of the patches listed below
are nowhere near being merged).

* {{pip 7878}}: Fail early when install path is not writable
* {{pip 7928}}: Fix rst syntax in Getting Started guide
* {{pip 7988}}: Fix tabulate col size in case of empty cell
* {{pip 8137}}: Add subcommand alias mechanism
* {{pip 8143}}: Make mypy happy with beta release automation
* {{pip 8248}}: Fix typo and simplify ireq call
* {{pip 8332}}: Add license requirement to `_vendor/README.rst`
* {{pip 8423}}: Nitpick logging calls
* {{pip 8435}}: Use str.format style in logging calls
* {{pip 8456}}: Lint `src/pip/_vendor/README.rst`
* {{pip 8568}}: Declare constants in configuration.py as such
* {{pip 8571}}: Clean up `Configuration.unset_value` and nit `__init__`
* {{pip 8578}}: Allow verbose/quiet level to be specified
  via config files and environment variables
* {{pip 8599}}: Replace tabs by spaces for consistency
* {{pip 8614}}: Use `monkeypatch.setenv` to mock environment variables
* {{pip 8674}}: Fix `tests/functional/test_install_check.py`,
  when run with new resolver
* {{pip 8692}}: Make assertion failure give better message
* {{pip 8709}}: List downloaded distributions before exiting (fix {{pip 8696}})
* {{pip 8759}}: Allow py2 deprecation warning from setuptools
* {{pip 8766}}: Use the new resolver for test requirements
* {{pip 8790}}: Mark tests using remote svn and hg as xfail
* {{pip 8795}}: Reformat a few spots in user guide

## The Plot Summary

Every Monday throughout the Summer of Code, I summarized what I had done
in the week before in the form of either a short blog or an (even shorter)
check-in.  These write-ups often contain handfuls of popular culture references
and was originally hosted on [Python GSoC][].

* [{{fill title blog/gsoc2020/checkin20200601}}](/blog/gsoc2020/checkin20200601)
* [{{fill title blog/gsoc2020/blog20200609}}](/blog/gsoc2020/blog20200609)
* [{{fill title blog/gsoc2020/checkin20200615}}](/blog/gsoc2020/checkin20200615)
* [{{fill title blog/gsoc2020/blog20200622}}](/blog/gsoc2020/blog20200622)
* [{{fill title blog/gsoc2020/checkin20200629}}](/blog/gsoc2020/checkin20200629)
* [{{fill title blog/gsoc2020/blog20200706}}](/blog/gsoc2020/blog20200706)
* [{{fill title blog/gsoc2020/checkin20200713}}](/blog/gsoc2020/checkin20200713)
* [{{fill title blog/gsoc2020/blog20200720}}](/blog/gsoc2020/blog20200720)
* [{{fill title blog/gsoc2020/checkin20200727}}](/blog/gsoc2020/checkin20200727)
* [{{fill title blog/gsoc2020/blog20200803}}](/blog/gsoc2020/blog20200803)
* [{{fill title blog/gsoc2020/checkin20200810}}](/blog/gsoc2020/checkin20200810)
* [{{fill title blog/gsoc2020/blog20200817}}](/blog/gsoc2020/blog20200817)
* [{{fill title blog/gsoc2020/checkin20200824}}](/blog/gsoc2020/checkin20200824)
* [{{fill title blog/gsoc2020/blog20200831}}](/blog/gsoc2020/blog20200831)

[internship]: https://summerofcode.withgoogle.com/archive/2020/projects/6238594655584256
[benchmark]: /blog/gsoc2020/blog20200831/#the_benchmark
[Python GSoC]: https://blogs.python-gsoc.org/en/mcsinyxs-blog/