~hrbrmstr/tdigest

115597431d41351ed84a0652dee0f53145072438 — hrbrmstr 5 years ago 6b6b443
cran woes
M .Rbuildignore => .Rbuildignore +2 -2
@@ 3,8 3,7 @@
^.*\.Rproj$
^\.Rproj\.user$
^\.travis\.yml$
^README\.*Rmd$
^README\.*html$
^README.*$
^CONDUCT\.md$
^NOTES\.*Rmd$
^NOTES\.*html$


@@ 17,3 16,4 @@
^\.gitlab-ci\.yml$
^appveyor\.yml$
^cran-comments\.md$
^CRAN-RELEASE$

A CRAN-RELEASE => CRAN-RELEASE +2 -0
@@ 0,0 1,2 @@
This package was submitted to CRAN on 2019-07-21.
Once it is accepted, delete this file and tag the release (commit d9fc48424d).

M DESCRIPTION => DESCRIPTION +10 -9
@@ 12,15 12,15 @@ Authors@R: c(
           comment = "Original C+ code; <https://github.com/ajwerner/tdigest>")
  )
Maintainer: Bob Rudis <bob@rud.is>
Description: The t-digest construction algorithm uses a variant of 1-dimensional 
    k-means clustering to produce a very compact data structure that allows 
    accurate estimation of quantiles. This t-digest data structure can be used 
Description: The 't-digest' construction algorithm uses a variant of 1-dimensional 
    'k-means' clustering to produce a very compact data structure that allows 
    accurate estimation of quantiles. This 't-digest' data structure can be used 
    to estimate quantiles, compute other rank statistics or even to estimate 
    related measures like trimmed means. The advantage of the t-digest over 
    previous digests for this purpose is that the t-digest handles data with 
    full floating point resolution. With small changes, the t-digest can handle 
    related measures like trimmed means. The advantage of the 't-digest' over 
    previous digests for this purpose is that the 't-digest' handles data with 
    full floating point resolution. With small changes, the 't-digest' can handle 
    values from any ordered set for which we can compute something akin to a mean. 
    The accuracy of quantile estimates produced by t-digests can be orders of 
    The accuracy of quantile estimates produced by 't-digests' can be orders of 
    magnitude more accurate than those produced by previous digest algorithms.
URL: https://gitlab.com/hrbrmstr/tdigest
BugReports: https://gitlab.com/hrbrmstr/tdigest/issues


@@ 29,7 29,8 @@ Encoding: UTF-8
License: MIT + file LICENSE
Suggests:
    testthat,
    covr
    covr,
    spelling
Depends:
    R (>= 3.5.0)
Imports: 


@@ 37,4 38,4 @@ Imports:
    stats
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.1.1

Language: en-US

M NEWS.md => NEWS.md +1 -1
@@ 3,7 3,7 @@
* `length()` and `[` implemented for `tdigest` objects

0.2.0
* Added input vaildity checks
* Added input validity checks
* Added `quantile()` function S3 implementation for `tdigest` objects
* Added examples
* Added more tests

M R/create.R => R/create.R +1 -1
@@ 37,7 37,7 @@ tdigest <- function(vec, compression=100) {
  .Call("Rtdig", vec=vec, compression=compression)
}

#' Calcuate sample quantiles from a t-digest
#' Calculate sample quantiles from a t-digest
#'
#' @param td t-digest object
#' @param probs numeric vector of probabilities with values in range 0:1

M README.Rmd => README.Rmd +1 -1
@@ 117,5 117,5 @@ cloc::cloc_pkg_md()

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). 
Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.

M README.md => README.md +13 -13
@@ 39,7 39,7 @@ The following functions are implemented:
  - `td_quantile_of`: Return the quantile of the value
  - `td_total_count`: Total items contained in the t-digest
  - `td_value_at`: Return the value at the specified quantile
  - `tquantile`: Calcuate sample quantiles from a t-digest
  - `tquantile`: Calculate sample quantiles from a t-digest

## Installation



@@ 64,7 64,7 @@ library(tdigest)

# current version
packageVersion("tdigest")
## [1] '0.2.0'
## [1] '0.3.0'
```

### Basic (Low-level interface)


@@ 146,21 146,21 @@ microbenchmark::microbenchmark(
)
## Unit: microseconds
##        expr       min         lq        mean    median         uq       max neval cld
##     tdigest     8.227     9.4895    21.66915    12.509    33.6245    69.111   100  a 
##  r_quantile 53792.878 54695.9560 56684.11386 55361.924 57719.2745 99458.184   100   b
##     tdigest     7.943     9.4015    20.94626    11.957    32.9395    48.487   100  a 
##  r_quantile 52305.639 53309.4185 55386.25517 54038.227 56644.9055 94300.294   100   b
```

## tdigest Metrics

| Lang         | \# Files | (%) | LoC |  (%) | Blank lines |  (%) | \# Lines |  (%) |
| :----------- | -------: | --: | --: | ---: | ----------: | ---: | -------: | ---: |
| C            |        3 | 0.3 | 347 | 0.66 |          45 | 0.36 |       26 | 0.11 |
| R            |        5 | 0.5 | 136 | 0.26 |          31 | 0.25 |      135 | 0.58 |
| Rmd          |        1 | 0.1 |  36 | 0.07 |          40 | 0.32 |       45 | 0.19 |
| C/C++ Header |        1 | 0.1 |  10 | 0.02 |          10 | 0.08 |       28 | 0.12 |
| Lang         | \# Files |  (%) | LoC |  (%) | Blank lines |  (%) | \# Lines |  (%) |
| :----------- | -------: | ---: | --: | ---: | ----------: | ---: | -------: | ---: |
| C            |        3 | 0.27 | 350 | 0.65 |          46 | 0.36 |       26 | 0.11 |
| R            |        6 | 0.55 | 139 | 0.26 |          31 | 0.24 |      135 | 0.58 |
| Rmd          |        1 | 0.09 |  36 | 0.07 |          40 | 0.31 |       45 | 0.19 |
| C/C++ Header |        1 | 0.09 |  10 | 0.02 |          10 | 0.08 |       28 | 0.12 |

## Code of Conduct

Please note that this project is released with a [Contributor Code of
Conduct](CONDUCT.md). By participating in this project you agree to
abide by its terms.
Please note that this project is released with a Contributor Code of
Conduct. By participating in this project you agree to abide by its
terms.

M cran-comments.md => cran-comments.md +9 -1
@@ 11,9 11,17 @@

* This is a new release.

- README warning fixed.
- Zero sized array warning fixed

Hey CRAN team members! There is no rush 
to get this processed whatsoever so if 
there are package authors who have 
asked for any acceleration of their 
package evaluation please do not hesitate
to put them ahead of this in the queue.
\ No newline at end of file
to put them ahead of this in the queue.

It's passed more than a few gauntlets
(that bullet list at the top is legit) but
if I missed anything I apologize in advance
and will work to correct ASAP.
\ No newline at end of file

M man/tquantile.Rd => man/tquantile.Rd +2 -2
@@ 3,7 3,7 @@
\name{tquantile}
\alias{tquantile}
\alias{quantile.tdigest}
\title{Calcuate sample quantiles from a t-digest}
\title{Calculate sample quantiles from a t-digest}
\usage{
tquantile(td, probs)



@@ 22,7 22,7 @@ tquantile(td, probs)
a numeric vector
}
\description{
Calcuate sample quantiles from a t-digest
Calculate sample quantiles from a t-digest
}
\examples{
set.seed(1492)

M src/tdigest.c => src/tdigest.c +1 -1
@@ 30,7 30,7 @@ struct td_histogram {
     double merged_count;
     double unmerged_count;

     node_t nodes[0];
     node_t nodes[];
};

static bool is_very_small(double val) {

A tests/spelling.R => tests/spelling.R +3 -0
@@ 0,0 1,3 @@
if(requireNamespace('spelling', quietly = TRUE))
  spelling::spell_check_test(vignettes = TRUE, error = FALSE,
                             skip_on_cran = TRUE)