~javiljoen/lttb-numpy

lttb-numpy/README.rst -rw-r--r-- 4.0 KiB
9463159eJA Viljoen Remove builds.sr.ht script 1 year, 2 months ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
================
lttb |pypi| |ci|
================

Numpy implementation of Steinarsson’s *Largest-Triangle-Three-Buckets* algorithm
for downsampling time series–like data
while retaining the overall shape and variability in the data

LTTB is well suited to filtering time series data for visual representation,
since it reduces the number of *visually redundant* data points,
resulting in smaller file sizes and faster rendering of plots.

Note that it is not a technique for statistical aggregation,
cf. regression models or non-parametric curve fitting / smoothing.

This implementation is based on the original JavaScript code at
https://github.com/sveinn-steinarsson/flot-downsample
and Sveinn Steinarsson’s 2013 MSc thesis
*Downsampling Time Series for Visual Representation.*

Licence: MIT


Usage
=====

Install the ``lttb`` package into your (virtual) environment::

   $ pip install lttb


The function ``lttb.downsample()`` can then be used in your Python code:

.. code:: python

   import numpy as np
   import lttb

   # Generate an example data set of 100 random points:
   #  - column 0 represents time values (strictly increasing)
   #  - column 1 represents the metric of interest: CPU usage, stock price, etc.
   data = np.array([range(100), np.random.random(100)]).T

   # Downsample it to 20 points:
   small_data = lttb.downsample(data, n_out=20)
   assert small_data.shape == (20, 2)

A test data set is provided in the source repo in ``tests/timeseries.csv``.
It was downloaded from http://flot.base.is/ and converted from JSON to CSV.

This is what it looks like, downsampled to 100 points:

.. image:: https://git.sr.ht/~javiljoen/lttb-numpy/blob/master/tests/timeseries.png


Input validation
----------------

By default, ``downsample()`` checks that the input data satisfies the following constraints:

- it is a two-dimensional array of two columns;
- the values in the first column are strictly increasing; and
- there are no missing (NaN) values in the data.

These checks can be skipped (e.g. if you know that your data will always meet these conditions),
or additional checks can be added (e.g. that the time values must be evenly spaced),
by passing in a different list of validation functions, e.g.:

.. code:: python

   # No input validation:
   small_data = lttb.downsample(data, n_out=20, validators=[])

   # Stricter check on x values:
   from lttb.validators import *
   small_data = lttb.downsample(data, n_out=20, validators=[has_two_columns, x_is_regular])


History
=======

0.3.1 / 2020-10-14
------------------

- All modules and functions now have docstrings.
- [dev] The library is now also tested against Python 3.9.
  No changes to the code were required.
- [dev] The project is now hosted on SourceHut;
  the links on the PyPI page have been updated.
- [dev] CI testing has been migrated from Travis to builds.sr.ht.

0.3.0 / 2020-09-15
------------------

- Validation of input data is now configurable.
- New default: ``downsample()`` raises ``ValueError`` if input data contains NaN values.
  This can be disabled by removing ``contains_no_nans()`` from the list of validators.
- [dev] Imports are now sorted with isort.

0.2.2 / 2020-01-08
------------------

- ``setup.py`` was fixed so that this package can be installed in Python 2 again.

0.2.1 / 2019-11-25
------------------

- [dev] Versions are now managed with ``setuptools_scm`` rather than ``bumpversion``.
- [dev] The code is formatted with Black.

0.2.0 / 2018-02-11
------------------

- Performance improvements
- Released on PyPI (on 2019-11-06)

0.1.0 / 2017-03-18
------------------

- Initial implementation


Contributors
============

- JA Viljoen – original Numpy implementation
- Guillaume Bethouart – performance improvements
- Jens Krüger – fix for py27


.. |pypi| image:: https://img.shields.io/pypi/v/lttb?color=blue
   :target: https://pypi.org/project/lttb/

.. |ci| image:: https://builds.sr.ht/~javiljoen/lttb-numpy.svg
   :alt: builds.sr.ht status
   :target: https://builds.sr.ht/~javiljoen/lttb-numpy?