~mcf/mupdf

d138e675 — Robin Watts 8 months ago master
Bug 703379: Fix conversion from Indexed Separation colorspace.
475129d6 — Robin Watts 8 months ago
Fix MuPDF-gl chapter/page handling.

Internally, fz_locations are numbered from 0. Externally we number from
1. Adjust when parsing.
e27ceb2b — Robin Watts 8 months ago
OSS-Fuzz 29728: Avoid buffer overflow.

Don't access past the end of the xref_index.
d0b9c2be — Robin Watts 8 months ago
Coverity 216668: Avoid null deref.

Probably never happens in practise as we don't go looking for objects
that don't exist, but this is safer.
2536d128 — Robin Watts 8 months ago
PDF OCR: Support vertical writing

Ensure that we give reasonable output for vertical writing.
I can't actually get Tesseract to give me vertical writing in
general, but testing this with a file with a single column of
Japanese gives reasonable results.

At some point, we can look at using a font definition with
/WMode 1 to maybe simplify this.
330747f6 — Robin Watts 8 months ago
Update mupdf-gl to accept "chapter:page" as well as just "page".
97038c55 — Robin Watts 8 months ago
Bug 703392: Fix PDFDocument.insertPage to match C code.

Negative numbers are defined to work, as is INT_MAX. The
JNI code should not complain about them.
ea5f445b — Robin Watts 8 months ago
Bug 701640: Fix linearisation offset problem.

When linearising, we write objects out in one pass to establish offsets.
We then create the hint objects, and do another pass to properly
write the file out.

During the first pass, we were creating a hint object without a
stream. The stream was then created before the second pass was
performed. Accordingly, the overall length of that object changes.
The code allowed for the change in the size of the stream, but
was failing to allow for it changing from not having a stream at all
to having a stream. (i.e. it allowed for the stream changing size,
but did not allow for the addition of "stream\n" and "endstream\n".)

Due to the 'slack' inherent in writing objects with default large
numbers in the first pass, and those numbers being reduced to smaller
values in the second pass, we were getting away with this in most
cases. The addition of an ASCIIHexDecode filter was tipping us over
the edge.

The fix is simply to ensure we create the object with a stream to
start with.
e626a51b — Robin Watts 8 months ago
Bug 703312: Update coding-overview.html to strengthen locking requirements.
cee7cefc — Robin Watts 8 months ago
Bug 703366: Fix double free of object during linearization.

This appears to happen because we parse an illegal object from
a broken file and assign it to object 0, which is defined to
be free.

Here, we fix the parsing code so this can't happen.
fb9eb332 — Robin Watts 8 months ago
Tweak pdf_update_stream to work better with undo.

Set the new length in the object before we update the stream.
Setting the new length has the effect of moving/copying the
old object/old stream into the journal. Previously we were
setting the stream, THEN moving the old version which meant
the new stream was copied in to accompany the old object.
62107800 — Robin Watts 8 months ago
Fix mutool usage messages not to leak.

Mostly this is to avoid me running "mutool whatever" to get the
usage message and then having to wade through pages of Memento
leaked block information to actually find what I was looking for.
18698d15 — Robin Watts 8 months ago
mutool create: Add some exception handling.

Avoids leaks in failure cases.
3f800020 — Julian Smith 8 months ago
Bug 703388: document that fz_drop_context() should not be called inside fz_try/always/catch blocks.
16def897 — Robin Watts 9 months ago
Improve OCR handling of R2L text.

Previously, tesseract was handing us the chars for a word in R2L order
and we were outputting them in L2R order, meaning that cut/paste would
reverse words.
2ef13ca6 — Robin Watts 9 months ago
Add JNI bindings for undo/redo.
89082bb5 — Robin Watts 9 months ago
Fix 'wonky bboxes' issue in OCRd text.

When we save a 'High Security Redaction' version of a document
(even without any redactions in)  the resultant file often
gives 'wonky' selection boxes when viewed in Acrobat (or similar).

This is because the OCR routine will return different word bboxes
for different words, and baselines won't line up.

Accordingly, we now collect the recognised words up as lines before
flushing them to the stream, and use a line bbox rather than a
word bbox to base the positions on.

This code currently relies on text being horizontal. Vertical mode
text will probably break it. The existing code relies on both
L2R and horizontal text, so we are no worse off with this.

We can cope with R2L and vertical text when we have some examples.
ecb7223d — Robin Watts 9 months ago
Bug 701919: Add page number to man page for mupdf.

Thanks to Daniel Lublin.
19057a95 — Robin Watts 9 months ago
Bug 702898: Fix SVG output clipping issues.

When we define a pattern, we do so using the SVG symbol mechanism.
Some SVG renderers, by default, make a clipbox around the rendered
symbol. This is no good for contents of patterns that are offset
and should still appear correctly due to the repeats in the patterns.

Accordingly, we make every such symbol have the style="overflow:visible"
attribute.
84ea3f93 — Robin Watts 9 months ago
Bug 703171: Fix SEVG in fz_subsample_pixmapARM

The code for dealing with an image less high than the row reduction
factor was broken. Fixed here.
Next