@@ 6,7 6,7 @@
Status](https://travis-ci.org/hrbrmstr/sergeant.svg?branch=master)](https://travis-ci.org/hrbrmstr/sergeant)
[](https://codecov.io/gh/hrbrmstr/sergeant)
-[](https://cran.r-project.org/package=sergeant)
+[](https://cran.r-project.org/package=sergeant)
# 💂 sergeant
@@ 14,26 14,12 @@ Tools to Transform and Query Data with ‘Apache’ ‘Drill’
## \*\* IMPORTANT \*\*
-Version 0.7.0 splits off the JDBC interface into a separate package
-`sergeant.caffeinated`
-([sr.ht](https://git.sr.ht/~hrbrmstr/sergeant);
+Version 0.7.0 (a.k.a. the main branch) splits off the JDBC interface
+into a separate package `sergeant.caffeinated`
([GitLab](https://gitlab.com/hrbrmstr/sergeant-caffeinated);
[GitHub](https://github.com/hrbrmstr/sergeant-caffeinated)).
-If you want to try all the new features coming in 0.8.0 please install from the 0.8.0 branch via:
-
-``` r
-# sr.ht
-devtools::install_git("https://git.sr.ht/~hrbrmstr/sergeant", ref="0.8.0")
-
-# GitLab
-devtools::install_git("https://gitlab.com/hrbrmstr/sergeant", ref="0.8.0")
-
-# GitHub
-devtools::install_git("https://github.com/hrbrmstr/sergeant", ref="0.8.0")
-```
-
-## Description
+I\# Description
Drill + `sergeant` is (IMO) a streamlined alternative to Spark +
`sparklyr` if you don’t need the ML components of Spark (i.e. just need
@@ 133,14 119,28 @@ function mappings.
# Installation
``` r
+install.packages("sergeant", repos = "https://cinc.rud.is")
+# or
+devtools::install_git("https://git.rud.is/hrbrmstr/sergeant.git")
+# or
+devtools::install_git("https://git.sr.ht/~hrbrmstr/sergeant")
+# or
+devtools::install_gitlab("hrbrmstr/sergeant")
+# or
devtools::install_github("hrbrmstr/sergeant")
```
+\`\`{r echo=FALSE, message=FALSE, warning=FALSE, error=FALSE}
+options(width=120)
+
+````
+
## Usage
### `dplyr` interface
-``` r
+
+```r
library(sergeant)
library(tidyverse)
@@ 198,30 198,32 @@ arrange(db, desc(employee_id)) %>% print(n = 20)
## # Source: table<cp.`employee.json`> [?? x 20]
## # Database: DrillConnection
## # Ordered by: desc(employee_id)
-## employee_id full_name first_name last_name position_id position_title store_id department_id birth_date hire_date
-## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
-## 1 999 Beverly … Beverly Dittmar 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 2 998 Elizabet… Elizabeth Jantzer 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 3 997 John Swe… John Sweet 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 4 996 William … William Murphy 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 5 995 Carol Li… Carol Lindsay 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 6 994 Richard … Richard Burke 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 7 993 Ethan Bu… Ethan Bunosky 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 8 992 Claudett… Claudette Cabrera 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 9 991 Maria Te… Maria Terry 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 10 990 Stacey C… Stacey Case 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 11 99 Elizabet… Elizabeth Horne 18 Store Tempora… 6 18 1976-10-05 1997-01-…
-## 12 989 Dominick… Dominick Nutter 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 13 988 Brian Wi… Brian Willeford 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 14 987 Margaret… Margaret Clendenen 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 15 986 Maeve Wa… Maeve Wall 17 Store Permane… 8 17 1914-02-02 1998-01-…
-## 16 985 Mildred … Mildred Morrow 16 Store Tempora… 8 16 1914-02-02 1998-01-…
-## 17 984 French W… French Wilson 16 Store Tempora… 8 16 1914-02-02 1998-01-…
-## 18 983 Elisabet… Elisabeth Duncan 16 Store Tempora… 8 16 1914-02-02 1998-01-…
-## 19 982 Linda An… Linda Anderson 16 Store Tempora… 8 16 1914-02-02 1998-01-…
-## 20 981 Selene W… Selene Watson 16 Store Tempora… 8 16 1914-02-02 1998-01-…
-## # … with more rows, and 6 more variables: salary <chr>, supervisor_id <chr>, education_level <chr>,
-## # marital_status <chr>, gender <chr>, management_role <chr>
+## employee_id full_name first_name last_name position_id position_title
+## <chr> <chr> <chr> <chr> <chr> <chr>
+## 1 999 Beverly … Beverly Dittmar 17 Store Permane…
+## 2 998 Elizabet… Elizabeth Jantzer 17 Store Permane…
+## 3 997 John Swe… John Sweet 17 Store Permane…
+## 4 996 William … William Murphy 17 Store Permane…
+## 5 995 Carol Li… Carol Lindsay 17 Store Permane…
+## 6 994 Richard … Richard Burke 17 Store Permane…
+## 7 993 Ethan Bu… Ethan Bunosky 17 Store Permane…
+## 8 992 Claudett… Claudette Cabrera 17 Store Permane…
+## 9 991 Maria Te… Maria Terry 17 Store Permane…
+## 10 990 Stacey C… Stacey Case 17 Store Permane…
+## 11 99 Elizabet… Elizabeth Horne 18 Store Tempora…
+## 12 989 Dominick… Dominick Nutter 17 Store Permane…
+## 13 988 Brian Wi… Brian Willeford 17 Store Permane…
+## 14 987 Margaret… Margaret Clendenen 17 Store Permane…
+## 15 986 Maeve Wa… Maeve Wall 17 Store Permane…
+## 16 985 Mildred … Mildred Morrow 16 Store Tempora…
+## 17 984 French W… French Wilson 16 Store Tempora…
+## 18 983 Elisabet… Elisabeth Duncan 16 Store Tempora…
+## 19 982 Linda An… Linda Anderson 16 Store Tempora…
+## 20 981 Selene W… Selene Watson 16 Store Tempora…
+## # … with more rows, and 10 more variables: store_id <chr>,
+## # department_id <chr>, birth_date <chr>, hire_date <chr>, salary <chr>,
+## # supervisor_id <chr>, education_level <chr>, marital_status <chr>,
+## # gender <chr>, management_role <chr>
mutate(db, position_title = tolower(position_title)) %>%
mutate(salary = as.numeric(salary)) %>%
@@ 244,7 246,7 @@ mutate(db, position_title = tolower(position_title)) %>%
## 9 6 4
## 10 36 2
## # … with 102 more rows
-```
+````
### REST API
@@ 258,57 260,60 @@ drill_version(dc)
## [1] "1.15.0"
drill_storage(dc)$name
-## [1] "cp" "dfs" "drilldat" "hbase" "hdfs" "hive" "kudu" "mongo" "my" "s3"
+## [1] "cp" "dfs" "drilldat" "hbase" "hdfs" "hive"
+## [7] "kudu" "mongo" "my" "s3"
drill_query(dc, "SELECT * FROM cp.`employee.json` limit 100")
## # A tibble: 100 x 16
-## employee_id full_name first_name last_name position_id position_title store_id department_id birth_date hire_date
-## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
-## 1 1 Sheri No… Sheri Nowmer 1 President 0 1 1961-08-26 1994-12-…
-## 2 2 Derrick … Derrick Whelply 2 VP Country Ma… 0 1 1915-07-03 1994-12-…
-## 3 4 Michael … Michael Spence 2 VP Country Ma… 0 1 1969-06-20 1998-01-…
-## 4 5 Maya Gut… Maya Gutierrez 2 VP Country Ma… 0 1 1951-05-10 1998-01-…
-## 5 6 Roberta … Roberta Damstra 3 VP Informatio… 0 2 1942-10-08 1994-12-…
-## 6 7 Rebecca … Rebecca Kanagaki 4 VP Human Reso… 0 3 1949-03-27 1994-12-…
-## 7 8 Kim Brun… Kim Brunner 11 Store Manager 9 11 1922-08-10 1998-01-…
-## 8 9 Brenda B… Brenda Blumberg 11 Store Manager 21 11 1979-06-23 1998-01-…
-## 9 10 Darren S… Darren Stanz 5 VP Finance 0 5 1949-08-26 1994-12-…
-## 10 11 Jonathan… Jonathan Murraiin 11 Store Manager 1 11 1967-06-20 1998-01-…
-## # … with 90 more rows, and 6 more variables: salary <chr>, supervisor_id <chr>, education_level <chr>,
-## # marital_status <chr>, gender <chr>, management_role <chr>
+## employee_id full_name first_name last_name position_id position_title
+## <chr> <chr> <chr> <chr> <chr> <chr>
+## 1 1 Sheri No… Sheri Nowmer 1 President
+## 2 2 Derrick … Derrick Whelply 2 VP Country Ma…
+## 3 4 Michael … Michael Spence 2 VP Country Ma…
+## 4 5 Maya Gut… Maya Gutierrez 2 VP Country Ma…
+## 5 6 Roberta … Roberta Damstra 3 VP Informatio…
+## 6 7 Rebecca … Rebecca Kanagaki 4 VP Human Reso…
+## 7 8 Kim Brun… Kim Brunner 11 Store Manager
+## 8 9 Brenda B… Brenda Blumberg 11 Store Manager
+## 9 10 Darren S… Darren Stanz 5 VP Finance
+## 10 11 Jonathan… Jonathan Murraiin 11 Store Manager
+## # … with 90 more rows, and 10 more variables: store_id <chr>,
+## # department_id <chr>, birth_date <chr>, hire_date <chr>, salary <chr>,
+## # supervisor_id <chr>, education_level <chr>, marital_status <chr>,
+## # gender <chr>, management_role <chr>
drill_query(dc, "SELECT COUNT(gender) AS gctFROM cp.`employee.json` GROUP BY gender")
drill_options(dc)
## # A tibble: 179 x 6
-## name value defaultValue accessibleScopes kind optionScope
-## <chr> <chr> <chr> <chr> <chr> <chr>
-## 1 debug.validate_iterators FALSE false ALL BOOLE… BOOT
-## 2 debug.validate_vectors FALSE false ALL BOOLE… BOOT
-## 3 drill.exec.functions.cast_empty_string_to_null FALSE false ALL BOOLE… BOOT
-## 4 drill.exec.hashagg.fallback.enabled FALSE false ALL BOOLE… BOOT
-## 5 drill.exec.hashjoin.fallback.enabled FALSE false ALL BOOLE… BOOT
-## 6 drill.exec.memory.operator.output_batch_size 16777216 16777216 SYSTEM LONG BOOT
-## 7 drill.exec.memory.operator.output_batch_size_avail_mem_fac… 0.1 0.1 SYSTEM DOUBLE BOOT
-## 8 drill.exec.storage.file.partition.column.label dir dir ALL STRING BOOT
-## 9 drill.exec.storage.implicit.filename.column.label filename filename ALL STRING BOOT
-## 10 drill.exec.storage.implicit.filepath.column.label filepath filepath ALL STRING BOOT
+## name value defaultValue accessibleScopes kind optionScope
+## <chr> <chr> <chr> <chr> <chr> <chr>
+## 1 debug.validate_i… FALSE false ALL BOOL… BOOT
+## 2 debug.validate_v… FALSE false ALL BOOL… BOOT
+## 3 drill.exec.funct… FALSE false ALL BOOL… BOOT
+## 4 drill.exec.hasha… FALSE false ALL BOOL… BOOT
+## 5 drill.exec.hashj… FALSE false ALL BOOL… BOOT
+## 6 drill.exec.memor… 16777… 16777216 SYSTEM LONG BOOT
+## 7 drill.exec.memor… 0.1 0.1 SYSTEM DOUB… BOOT
+## 8 drill.exec.stora… dir dir ALL STRI… BOOT
+## 9 drill.exec.stora… filen… filename ALL STRI… BOOT
+## 10 drill.exec.stora… filep… filepath ALL STRI… BOOT
## # … with 169 more rows
drill_options(dc, "json")
## # A tibble: 10 x 6
-## name value defaultValue accessibleScopes kind optionScope
-## <chr> <chr> <chr> <chr> <chr> <chr>
-## 1 store.hive.maprdb_json.optimize_scan_with_native_reader FALSE false ALL BOOLEAN BOOT
-## 2 store.json.all_text_mode TRUE false ALL BOOLEAN SYSTEM
-## 3 store.json.extended_types TRUE false ALL BOOLEAN SYSTEM
-## 4 store.json.read_numbers_as_double FALSE false ALL BOOLEAN BOOT
-## 5 store.json.reader.allow_nan_inf TRUE true ALL BOOLEAN BOOT
-## 6 store.json.reader.print_skipped_invalid_record_number TRUE false ALL BOOLEAN SYSTEM
-## 7 store.json.reader.skip_invalid_records TRUE false ALL BOOLEAN SYSTEM
-## 8 store.json.writer.allow_nan_inf TRUE true ALL BOOLEAN BOOT
-## 9 store.json.writer.skip_null_fields TRUE true ALL BOOLEAN BOOT
-## 10 store.json.writer.uglify TRUE false ALL BOOLEAN SYSTEM
+## name value defaultValue accessibleScopes kind optionScope
+## <chr> <chr> <chr> <chr> <chr> <chr>
+## 1 store.hive.maprdb… FALSE false ALL BOOL… BOOT
+## 2 store.json.all_te… TRUE false ALL BOOL… SYSTEM
+## 3 store.json.extend… TRUE false ALL BOOL… SYSTEM
+## 4 store.json.read_n… FALSE false ALL BOOL… BOOT
+## 5 store.json.reader… TRUE true ALL BOOL… BOOT
+## 6 store.json.reader… TRUE false ALL BOOL… SYSTEM
+## 7 store.json.reader… TRUE false ALL BOOL… SYSTEM
+## 8 store.json.writer… TRUE true ALL BOOL… BOOT
+## 9 store.json.writer… TRUE true ALL BOOL… BOOT
+## 10 store.json.writer… TRUE false ALL BOOL… SYSTEM
```
## Working with parquet files
@@ 375,7 380,7 @@ select columns[2] as city, columns[4] as lon, columns[3] as lat
| Lang | \# Files | (%) | LoC | (%) | Blank lines | (%) | \# Lines | (%) |
| :--- | -------: | ---: | ---: | ---: | ----------: | ---: | -------: | ---: |
| R | 18 | 0.95 | 1212 | 0.96 | 349 | 0.86 | 716 | 0.89 |
-| Rmd | 1 | 0.05 | 54 | 0.04 | 56 | 0.14 | 92 | 0.11 |
+| Rmd | 1 | 0.05 | 56 | 0.04 | 55 | 0.14 | 90 | 0.11 |
## Code of Conduct
@@ 19,7 19,7 @@ options(sergeant.bigint.warnonce = FALSE)
[](https://doi.org/10.5281/zenodo.1248912)
[](https://travis-ci.org/hrbrmstr/sergeant)
[](https://codecov.io/gh/hrbrmstr/sergeant)
-[](https://cran.r-project.org/package=sergeant)
+[](https://cran.r-project.org/package=sergeant)
# 💂 sergeant
@@ 29,21 29,7 @@ Tools to Transform and Query Data with 'Apache' 'Drill'
Version 0.7.0 (a.k.a. the main branch) splits off the JDBC interface into a separate package `sergeant.caffeinated` ([GitLab](https://gitlab.com/hrbrmstr/sergeant-caffeinated); [GitHub](https://github.com/hrbrmstr/sergeant-caffeinated)).
-If you want to try all the new features coming in 0.8.0 please install from the 0.8.0 branch via:
-
-```{r eval=FALSE}
-# sr.ht
-devtools::install_git("https://git.sr.ht/~hrbrmstr/sergeant", ref="0.8.0")
-
-# GitLab
-devtools::install_git("https://gitlab.com/hrbrmstr/sergeant", ref="0.8.0")
-
-# GitHub
-devtools::install_git("https://github.com/hrbrmstr/sergeant", ref="0.8.0")
-```
-
-
-## Description
+I# Description
Drill + `sergeant` is (IMO) a streamlined alternative to Spark + `sparklyr` if you don't need the ML components of Spark (i.e. just need to query "big data" sources, need to interface with parquet, need to combine disparate data source types — json, csv, parquet, rdbms - for aggregation, etc). Drill also has support for spatial queries.
@@ 107,11 93,10 @@ Note that a number of Drill SQL functions have been mapped to R functions (e.g.
# Installation
-```{r eval=FALSE}
-devtools::install_github("hrbrmstr/sergeant")
-```
-
-```{r echo=FALSE, message=FALSE, warning=FALSE, error=FALSE}
+```{r einstall-ex, results='asis', echo = FALSE}
+hrbrpkghelpr::install_block()
+````
+``{r echo=FALSE, message=FALSE, warning=FALSE, error=FALSE}
options(width=120)
```