952f08b9b74f51546df93696d83f919b0f479d65 — Scott Colby 8 months ago ac2fb69
Add git modification time post.
1 files changed, 45 insertions(+), 0 deletions(-)

A content/git_times.md
A content/git_times.md => content/git_times.md +45 -0
@@ 0,0 1,45 @@
Title: Git Doesn't Set the Modification Time on Files at Checkout
Date: 2021-02-15

The last time I updated this blog, I noticed that the last updated time on the
[about page]({filename}pages/about.md) had changed to the last build time. My site
generator, [Pelican](https://blog.getpelican.com/), uses the source file's modified
time for this value, and this was being set by the operating system of the build
worker when the source repository was being cloned.

As explained in the [Git FAQ](https://git.wiki.kernel.org/index.php/GitFaq#Why_isn.27t_Git_preserving_modification_time_on_files.3F),
there are good reasons to not preserve the modification times, such as making sure
that build systems like `make` work as expected. Linus Torvalds also explains this
in a slightly more boisterous fashion in [this thread](https://web.archive.org/web/20120518150852/http://kerneltrap.org/mailarchive/git/2007/3/5/240536).

In Pelican's use case, this behavior also makes sense: it typically will only
rebuild existing output files whose source files have changed. Because my build
process starts from a clean slate each time, there won't be any existing files,
so this optimization is useless. For similar reasons, the modification time on all
the files will be the time that the clone took place.

I decided that, from the perspective of the build system, the modification time
should match the commit time.  To implement this, I wrote a one-liner to execute
as part of the build process that `touch`es the content files with their most
recent commit time before generating the site:

git ls-tree -r -z --name-only HEAD content/ \
  | xargs -0 -I {} -- \
    git log --date='format:%Y%m%d%H%M.%S' \
      --format='format:%ad%x00{}%x00' -1 -- {} \
  | xargs -0 -n 2 -- touch -t

This pipeline prints the names of the files in the `content` directory known to
Git separated by null bytes, uses `xargs` and `git log` to find the last modified
timestamp (the `-1` argument means only use the most recent revision) and prints
it in a form that can be used as arguments to `touch`, and then finally uses `xargs`
again to `touch` each of the files with the appropriate time.

As you can see, `git log`’s `--format` option allows a pretty wide set of
output formats; the file name is not among the available placeholders. To work
around this, I used `xargs -I {}` to literally place the file name in the format
option.  As a result, for example using the file `foo.md`, the `git log` invocation
will see a format string like `format:%ad%x00foo.md%x00`. The `%x00` placeholders
are literal nulls for the subsequent `xargs -0` command.