Notice: This script does not handle new files which are not yet tracked by git.
This script is an attempt to improve the process of importing CSV data with hledger. The default deduplication scheme uses a
.latest.FILE.csv which records how many transactions occurred on the most recent date. This lets you run
hledger import on the same CSV file after new data has been appended without duplicates.
The problem is that banks provide malformed data which doesn't just append. Sometimes transactions are inserted into the past, leading to diffs like this:
+2022-12-06,vendor1,-10.00 +2022-12-05,vendor2,-20.00 +2022-12-05,vendor3,-42.00 +2022-12-03,vendor4,-50.58 2022-12-01,vendor5,-25.24 +2022-12-01,vendor6,-36.00 2022-11-30,vendor7,-12.07 2022-11-30,vendor8,-17.76
hledger import inserts the
2022-12-01,vendor5,-25.24 xact twice in the journal file and does not insert the
This script ignores the
.latest.FILE.csv file and instead uses
git diff to figure out which lines have been added. It ignores moved lines and warns if lines have been removed.
.latest.FILE.csv approach, this script is not idempotent. To prevent duplicate entries, it assumes that that your journal contains the same number of entries as the CSV you're importing from. If
$CSV_LENGTH, it exits. If your CSV file has additional lines before or after the data, you can pass that number of lines as the optional fourth positional argument
$CSV_LENGTH_OFFSET, which defaults to 0.
hledger-import-new-xact requires hledger, git, perl, and awk.
First, ensure that git ignores DOS newlines, since the
git diff | perl will add troublesome escape sequences. You can do this by adding the following to
.gitattributes inside your journal directory:
git add --renormalize . to regenerate all files without DOS newlines.
hledger-import-new-xact.sh file.journal import.csv import.rules 1