~mcf/mupdf

ref: e27ceb2b0e64b9a56ba79d844ea96553d87dc113 mupdf/thirdparty/tesseract.txt -rw-r--r-- 909 bytes
e27ceb2b — Robin Watts OSS-Fuzz 29728: Avoid buffer overflow. 1 year, 5 months ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
If you want to build with Tesseract functionality, you will
need both the tesseract and leptonica repositories checked out
here.

Tesseract can be fetched by:

  git clone https://git.ghostscript.com/thirdparty-tesseract.git tesseract
  cd tesseract
  git checkout artifex
  cd ..

Leptonica can be fetched by:

  git clone https://git.ghostscript.com/thirdparty-leptonica.git leptonica
  cd leptonica
  git checkout artifex
  cd ..

You will also need a suitable set of traineddata for the languages you
wish to run. Only the LSTM engine (the latest and most accurate engine)
is built into Tesseract, so the traineddata contained within the
repository itself is no good.

Suitable data can be retrieved from either:

  https://github.com/tesseract-ocr/tessdata_best

or

  https://github.com/tesseract-ocr/tessdata_fast

e.g.

  wget https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata