~hrbrmstr/sergeant

2ed19e008d9e502eae63e16e37b073c5c9d908be — Bob Rudis 3 years ago 23b9db7
pre-CRAN flight check on Travis
5 files changed, 42 insertions(+), 52 deletions(-)

M .Rbuildignore
M DESCRIPTION
D DESCRIPTION.txt
M README.Rmd
A cran-comments.md
M .Rbuildignore => .Rbuildignore +1 -0
@@ 10,3 10,4 @@
^codecov\.yml$
^apache-drill-1\.10\.0\.tar\.gz$
^cdh4-repository_1\.0_all\.deb$
^cran-comments\.md$

M DESCRIPTION => DESCRIPTION +15 -14
@@ 5,30 5,31 @@ Authors@R: c(person("Bob", "Rudis", email = "bob@rud.is", role = c("aut", "cre")
             person("Edward", "Visel", email = "edward.visel@gmail.com", role = "ctb"))
Description: 'Apache Drill' is a low-latency distributed query engine designed to enable 
    data exploration and analytics on both relational and non-relational datastores, 
    scaling to petabytes of data. Methods are provided that enable working with 'Apache Drill'
    instances via the 'REST API', the 'JDBC' interface, 'DBI' 'methods' and 'dplyr'/'dbplyr'.
    scaling to petabytes of data. Methods are provided that enable working with 'Apache' 
    'Drill' instances via the 'REST' 'API', 'JDBC' interface (optional), 'DBI' 'methods'
    and using 'dplyr'/'dbplyr' idioms.
Depends:
    R (>= 3.1.2),
    DBI,
    DBI (>= 0.7),
    dplyr (>= 0.7.0),
    dbplyr
    dbplyr (>= 1.1.0)
URL: https://github.com/hrbrmstr/sergeant
BugReports: https://github.com/hrbrmstr/sergeant/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
    httr,
    jsonlite,
    htmltools,
    readr,
    purrr,
    httr (>= 1.2.1),
    jsonlite (>= 1.5.0),
    htmltools (>= 0.3.6),
    readr (>= 1.1.1),
    purrr (>= 0.2.2),
    scales (>= 0.4.1),
    utils,
    scales,
    methods
Suggests:
    RJDBC,
    rJava,
    testthat,
    covr
    RJDBC (>= 0.2-5),
    rJava (>= 0.9-8),
    testthat (>= 1.0.2),
    covr (>= 3.0.0)
RoxygenNote: 6.0.1

D DESCRIPTION.txt => DESCRIPTION.txt +0 -35
@@ 1,35 0,0 @@
Package: sergeant
Title: Tools to Transform and Query Data with 'Apache' 'Drill'
Version: 0.5.0
Authors@R: c(person("Bob", "Rudis", email = "bob@rud.is", role = c("aut", "cre")),
             person("Edward", "Visel", email = "edward.visel@gmail.com", role = "ctb"))
Description: 'Apache Drill' is a low-latency distributed query engine designed to enable 
    data exploration and analytics on both relational and non-relational datastores, 
    scaling to petabytes of data. Methods are provided that enable working with 'Apache Drill'
    instances via the 'REST API', the JDBC interface, 'DBI' 'methods' and 'dplyr'/'dbplyr'.
Depends:
    R (>= 3.0.0),
    DBI,
    dplyr (>= 0.7.0),
    dbplyr
URL: http://github.com/hrbrmstr/sergeant
BugReports: https://github.com/hrbrmstr/sergeant/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports:
    httr,
    jsonlite,
    htmltools,
    readr,
    purrr,
    utils,
    scales,
    methods
Suggests:
    RJDBC,
    rJava,
    testthat
RoxygenNote: 6.0.1
Remotes:
    tidyverse/dbplyr

M README.Rmd => README.Rmd +3 -3
@@ 20,7 20,7 @@ knitr::opts_chunk$set(

Drill + `sergeant` is (IMO) a nice alternative to Spark + `sparklyr` if you don't need the ML components of Spark (i.e. just need to query "big data" sources, need to interface with parquet, need to combine disparate data source types — json, csv, parquet, rdbms - for aggregation, etc). Drill also has support for spatial queries.

I find writing SQL queries to parquet files with Drill on a local linux or macOS workstation to be more performant than doing the data ingestion work with R (for large or disperate data sets). I also work with many tiny JSON files on a daily basis and Drill makes it much easier to do so. YMMV.
I find writing SQL queries to parquet files with Drill on a local linux or macOS workstation to be more performant than doing the data ingestion work with R (especially for large or disperate data sets). I also work with many tiny JSON files on a daily basis and Drill makes it much easier to do so. YMMV.

You can download Drill from <https://drill.apache.org/download/> (use "Direct File Download"). I use `/usr/local/drill` as the install directory. `drill-embedded` is a super-easy way to get started playing with Drill on a single workstation and most of my workflows can get by using Drill this way. If there is sufficient desire for an automated downloader and a way to start the `drill-embedded` server from within R, please file an issue.



@@ 36,8 36,8 @@ The following functions are implemented:

**`DBI`**

- As complete of an R `DBI` driver has been implemented using the Drill REST API, mostly to facilitate the `dplyr` interface. Use the `RJDBC` driver interface if you need more `DBI` functionality.
- This also means that SQL functions unique to Drill have also been "implemented" (i.e. made accessible to the `dplyr` interface). If you have custom Drill SQL functions that need to be implemented please file an issue on GitHub.
- A "just enough" feature complete R `DBI` driver has been implemented using the Drill REST API, mostly to facilitate the `dplyr` interface. Use the `RJDBC` driver interface if you need more `DBI` functionality.
- This also means that SQL functions unique to Drill have also been "implemented" (i.e. made accessible to the `dplyr` interface). If you have custom Drill SQL functions that need to be implemented please file an issue on GitHub. Many should work without it, but some may require a custom interface. 

**`RJDBC`**


A cran-comments.md => cran-comments.md +23 -0
@@ 0,0 1,23 @@
## Test environments
* local OS X install, R 3.4.1
* ubuntu 12.04 (on travis-ci), R 3.4.1
* win-builder (devel and release)

## R CMD check results

0 errors | 0 warnings | 1 note

* This is a new release.

## Reverse dependencies

This is a new release, so there are no reverse dependencies.

---

* I have run R CMD check on the NUMBER downstream dependencies.
  (Summary at ...). 
  
* FAILURE SUMMARY

* All revdep maintainers were notified of the release on RELEASE DATE.