bbkh: replace `big` with `little`, major bugfix...
... some cleanup.
index: replace lsm-db with plyvel...
... except for pypi.okvslite part because I am lazy.
pool_for_each_par_map: keep memory footprint under control.
typofix: try leveldb via plyvel
typofix: do not use a lock to serialize...
... instead use the progress callback. That will avoid the
multi-processus lock, and the open, start transaction, commit,
close...
typofix: try to use multiple process to create index.
typofix: try bigram + trigram...
... way too slow.
typofix - varia
- do not hash digits (trial), and punctutation found in package names,
- rename magic numbers, and first try at documentation,
- try to fix merkletree traversal left-right instead or right left,
- remave spacy, and replace unknown chars with space,
- fix strinc: bytes[n] return an integer, bytes[n:] return bytes,
- improve forward and backward search: score at most 10 times the
limit of candidates, keep those with a fuzzywuzzy ration superior to
65,
- use bbk.search inside sudopython-typofix.py.
Note: chars with string.digits give better results for an unknown
reason.
varia
- simplify regex because strings are lower case,
- remove commented code,
- use summary and description,
- only store documents that have at least one stem.
ZERO is easier to the mind than -infinity
poetry export -f requirements.txt --output requirements.txt --dev
LICENSE: remove it, it must be GPLv3.
combinatorix with query parser.