gpu,gpu/shaders: [compute] decode sRGB texels in shader when EXT_sRGB is missing

This change avoids the hard dependency on GPU support for sRGB encoded
textures in the compute renderer.

With this change and the previously added CPU fallback, Gio no longer
rely on any GPU functionality outside the OpenGL ES 2.0 level.

Fixes gio#49
Fixes gio#154
Fixes gio#97
Fixes gio#36
Fixes gio#172

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu/internal/opengl: use the linear colorspace when EXT_sRGB is missing

The SRGBFBO emulates a framebuffer in the sRGB colorspace. However, some
low-end devices may not have EXT_sRGB support to store framebuffer content in

This change handles missing EXT_sRGB support by falling back to the linear RGB colorspace.
Falling back loses color precision but is better than failing.

Updates gio#49
Updates gio#154
Updates gio#97
Updates gio#36
Updates gio#172

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu/internal/opengl: detect sRGB triple in terms of srgbaTripleFor

Refactor in preparation for relaxing sRGB format requirements.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu/internal/driver: introduce and use FeatureSRGB

No functional changes; a follow-up will implement graceful fallback in
the compute renderer.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu/internal/driver: rename TextureFormatSRGB to TextureFormatSRGBA

The format implies an alpha channel; name it accordingly.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
.builds: upgrade to FreeBSD 13

Signed-off-by: Elias Naur <mail@eliasnaur.com>
widget,widget/material: remove Color field from Icon

Icons are meant to be shared among multiple widgets, but their Color
state may end up with unexpected values after use. Replace the state
with and explicit argument to Layout.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] add CPU fallback

This change adds a CPU fallback for devices that don't support the old
renderer nor have GPU support for compute programs.

Most of the hard work is implemented in the gioui.org/cpu module. It
uses the SwiftShader project with light modification to output
statically compiled CPU .o files for each compute program.

The CPU fallback only covers Linux and Android on arm, arm64, amd64
architectures. There is no fundamental reason support can't be extended
to other platforms:

- macOS and iOS are probably easy, but it's likely that virtually every
  device has GPU support for compute shaders.
- Windows needs a Cgo-less port, or a build constraint to require a C
  compiler (Gio core doesn't).
- FreeBSD and OpenBSD are probably also easy to do because they're so
  similar to Linux.
- The 386 binaries didn't work properly in my tests, so fixes to
  SwiftShader is probably needed. However, I expect virtually every
  Intel device can run amd64 binaries.

Updates gio#49
Fixes gio#228

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu,gpu/internal: generate hashes of compute programs

The CPU fallback for the compute renderer is contained in a separate
module for space reasons, but the CPU binaries must exactly match the
compute programs. However, there is no way to express that constraint
in go.mod.

This change generates hashes of every compute program so that a
following change can verify the CPU binaries match the programs.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] compute and store clipping path hashes during construction

The hash of the clipping paths that affect drawing operations are computed
and used to quickly determine that two operations are not equal, the
most likely outcome of a comparison.

However, for paths that are constructed once and cached computing the
hash at every frame is wasteful. This is especially true for text, which
is both cached and also among the largest paths in a frame.

This change moves the hashing to op/clip.Path construction time, and
stores the hash in the ops list so it won't be re-computed at every use.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] speed up path comparisons with op keys

To re-use previously cached layers, the compute renderer must know
whether two drawing operations are equal. In the case two operations are
not equal, a fast hash comparison will most likely fail. In the case two
equal operations with complicated clipping paths, the comparison of the
path data is expensive.

This change adds support for fast ops.Key comparisons, where two paths
are equal if their ops.Key are. This is an optimization that kicks in
for text rendering, where glyph clipping shapes are re-used across

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] re-use layers that differ only in integer offsets

To re-use drawing operations common to two layers, every operation must
exactly match, including their transformations. However, layers that
differ only by an integer offset can be re-used because rendering does
not depend on the absolute integer offset. This is important in the very
common case of scrolling otherwise static UI content.

This change separate the integer offset from drawing operations and
relaxes the layer cache to match layers that differ only in integer

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] add function for separating integer offsets from transforms

Refactor only; separateTransform is needed in the following change.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] cache and re-use drawing operations from the previous frame

The compute renderer is more expensive to run than the old renderer on
low-end GPUs, and even more so on CPUs. To ensure good performance
regardless of the end-user device, this change implements automatic
re-use of content rendered in the frame before the current.

The basic idea is that every drawing operation (PaintOp), along with its
transform and clipping, can be hashed and efficiently looked up. A naïve
caching approach is then to rasterize every operation to separate
sections of several large texture atlases, turning a cache hit into a
very cheap texture copy.

However, for scenes with lots of overlapping operations, the resulting
texture memory from separating the operations would be much larger than
the memory for just the window framebuffer.

So instead of caching individual operations, this change caches layers,
which are sequences of drawing operations. It starts by putting all
operations into a single layer. Then, if the subsequent frame re-uses a
sub-sequence of that larger layer, it is split.

For example, consider a UI similar to the kitchen sample:

Hello, Gio


<Line Editor>

<Button> <Button> <Button>


<Checkbox> <Toggle>

In the first frame, all of the drawing operations comprising the UI will
be stored and cached in a single layer. In the second frame the
progress bar will have moved and the renderer splits the UI into three
layers: layer A for everything up to (but not including) the progress
bar, layer B with just the progress bar, and layer C for the rest. Note
that nothing has been re-used yet. In the third frame, the progress bar
moves again, and this time layer A and C can be copied from the cache
only the progress bar needs redrawing through the compute programs.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] clear viewport through glClear, not through compute

The performance difference is negligible, but is useful when the compute
pipeline can skip rendering to empty tiles.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] unify resource cleanup

Rename all resource release methods to "Release", and release all
resources with a slice and loop.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu: [compute] add compute renderer specific decoding of ops

Until now, the two renderers have shared structures and code for
decoding drawing ops and convert them to GPU-friendly structures.

However, the decoder is tailored to the old renderer and use
structures that poorly fits the new compute renderer.

This change copies the decoder and specializes the copy for the compute
renderer, avoiding a round-trip through the old renderer decoder.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu/internal,internal/gl: add support for strided texture uploads

The CPU fallback of the compute renderer needs to upload subtextures
from a larger image.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
gpu/internal/d3d11: stub BlitFramebuffer for D3D11

The compute renderer doesn't run on Windows yet, but the d3d11 backend needs
the method to satisfy the driver interface.

Signed-off-by: Elias Naur <mail@eliasnaur.com>
go.*: update dependencies, bump go.mod Go version to 1.16

Signed-off-by: Elias Naur <mail@eliasnaur.com>