~ft/dav1d

3b14f949 — Sigrid Solveig Haflínudóttir 1 year, 7 months ago
add top mkfile
a99ae524 — Sigrid Solveig Haflínudóttir 1 year, 8 months ago
plan9: fix compilation on 386 and arm
51464beb — Sigrid Solveig Haflínudóttir 1 year, 8 months ago
mkfile: use C builtins when not building on amd64
4cc0cfd4 — Sigrid Solveig Haflínudóttir 1 year, 9 months ago
updated the port
2e3fd4d6 — Sigrid Haflínudóttir 1 year, 9 months ago
Merge remote-tracking branch 'upstream/master'
802790f1 — Martin Storsjö 1 year, 9 months ago
arm32: loopfilter: NEON implementation of loopfilter for 16 bpc

This operates on 4 pixels as a time, while the arm64 version
operated on 8 pixels at a time.

As the registers only fit one single 4 pixel wide slice (with one
single set of input parameters and mask bits), the high level
logic for calculating those input parameters is done with GPRs
and scalar instructions instead of SIMD as in the other implementations.
2a448fde — Martin Storsjö 1 year, 9 months ago
arm64: loopfilter16: Fix conditions for skipping parts of the filtering

As the arm64 16 bpc loopfilter operates on a 8 pixel region at a time,
inspect 2 bits (corresponding to 4 pixels each) from these registers,
as we also shift them down by 2 bits at the end of the loop.

This should allow skipping the loopfilter altogether (or using a
smaller filter) in more cases.
c1a5e445 — Martin Storsjö 1 year, 9 months ago
arm32: loopfilter: Fix a misindented/aligned operand
b252334a — Martin Storsjö 1 year, 9 months ago
arm: loopfilter: Compare L != 0 before doing a splat
78d27b7d — Henrik Gramner 1 year, 9 months ago
x86: Rewrite wiener SSE2/SSSE3/AVX2 asm

The previous implementation did two separate passes in the horizontal
and vertical directions, with the intermediate values being stored
in a buffer on the stack. This caused bad cache thrashing.

By interleaving the horizontal and vertical passes in combination
with a ring buffer for storing only a few rows at a time the
performance is improved by a significant amount.

Also split the function into 7-tap and 5-tap versions. The latter is
faster and fairly common (always for chroma, sometimes for luma).
3497c4c9 — Henrik Gramner 1 year, 9 months ago
x86: Rename looprestoration_ssse3.asm to looprestoration_sse.asm

It contains both SSE2 and SSSE3 code.
2737c05e — Henrik Gramner 1 year, 9 months ago
Add miscellaneous minor wiener optimizations

Combine horizontal and vertical filter pointers into a single parameter
when calling the wiener DSP function.

Eliminate the +128 filter coefficient handling where possible.
fdf1570e — Henrik Gramner 1 year, 9 months ago
Use smaller data types for wiener filter coefficients

Reduces memory usage by 96 bytes per sb.
6f7e5cb3 — Henrik Gramner 1 year, 9 months ago
Simplify msac subexp decoding
f0f73b4c — Henrik Gramner 1 year, 9 months ago
fuzzer: Test calling dav1d_picture_unref() after dav1d_close()

Covers the use case of keeping a reference to a Dav1dPicture
after closing the decoder.
135286f4 — Henrik Gramner 1 year, 9 months ago
Fix use of references to buffers after calling dav1d_close()

9057d286 had the side effect of causing references to buffers allocated
using memory pools to no longer be valid after closing the decoder.

Restore this functionality by making buffer pools reference counted.
e705519d — Martin Storsjö 1 year, 9 months ago
arm32: looprestoration: NEON implementation of SGR for 10 bpc

Checkasm numbers:           Cortex A7         A8       A53       A72       A73
selfguided_3x3_10bpc_neon:   919127.6   717942.8  565717.8  404748.0  372179.8
selfguided_5x5_10bpc_neon:   640310.8   511873.4  370653.3  273593.7  256403.2
selfguided_mix_10bpc_neon:  1533887.0  1252389.5  922111.1  659033.4  613410.6

Corresponding numbers for arm64, for comparison:

                                                Cortex A53       A72       A73
selfguided_3x3_10bpc_neon:                        500706.0  367199.2  345261.2
selfguided_5x5_10bpc_neon:                        361403.3  270550.0  249955.3
selfguided_mix_10bpc_neon:                        846172.4  623590.3  578404.8
e1be33b9 — Martin Storsjö 2 years ago
arm32: looprestoration: Prepare for 16 bpc by splitting code to separate files

looprestoration_common.S contains functions that can be used as is
with one single instantiation of the functions for both 8 and 16 bpc.
This file will be built once, regardless of which bitdepths are enabled.

looprestoration_tmpl.S contains functions where the source can be shared
and templated between 8 and 16 bpc. This will be included by the separate
8/16bpc implementaton files.
c58e9d57 — Martin Storsjö 1 year, 9 months ago
arm: looprestoration16: Fix comments referring to pixels as bytes

A number of other similar comments were updated to say pixels when
the 16 bpc code was written originally, but these were missed.
Next