~hrbrmstr/metis-jars

ref: 03ae097b959b0cd83c6b945bdf51b5b704eeb136 metis-jars/README.Rmd -rw-r--r-- 3.2 KiB
03ae097bboB Rudis info abt signals 2 years ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
output: rmarkdown::github_document
editor_options: 
  chunk_output_type: console
---

# `metis`

Helpers for Accessing and Querying Amazon Athena

Including a lightweight RJDBC shim.

In Greek mythology, Metis was Athena's "helper".

## Description

Still fairly beta-quality level but getting there.

The goal will be to get around enough of the "gotchas" that are preventing raw RJDBC Athena connections from "just working" with `dplyr` v0.6.0+ and also get around the [`fetchSize` problem](https://www.reddit.com/r/aws/comments/6aq22b/fetchsize_limit/) without having to not use `dbGetQuery()`.

The `AthenaJDBC42_2.0.2.jar` JAR file is included out of convenience but that will likely move to a separate package as this gets closer to prime time if this goes on CRAN.

NOTE that the updated driver *REQUIRES JDK 1.8+*.

See the **Usage** section for an example.

## IMPORTANT

Since R 3.5 (I don't remember this happening in R 3.4.x) signals sent from interrupting Athena JDBC calls crash the R interpreter. You need to set the `-Xrs` option to avoid signals being passed on to the JVM owner. That has to be done _before_ `rJava` is loaded so you either need to remember to put it at the top of all scripts _or_ stick this in your local `~/.Rprofile` and/or sitewide `Rprofile`:

```r
if (!grepl("-Xrs", getOption("java.parameters", ""))) {
  options(
    "java.parameters" = paste0(
      c(getOption("java.parameters", default = NULL), "-Xrs"), 
      collapse=" "
    )
  )
}
```
## What's Inside The Tin?

The following functions are implemented:

Easy-interface connection helper:

- `athena_connect`	Make a JDBC connection to Athena

Custom JDBC Classes:

- `Athena`:	AthenaJDBC (make a new Athena con obj)
- `AthenaConnection-class`:	AthenaJDBC
- `AthenaDriver-class`:	AthenaJDBC
- `AthenaResult-class`:	AthenaJDBC

Custom JDBC Class Methods:

- `dbConnect-method`:	AthenaJDBC
- `dbExistsTable-method`:	AthenaJDBC
- `dbGetQuery-method`:	AthenaJDBC
- `dbListFields-method`:	AthenaJDBC
- `dbListTables-method`:	AthenaJDBC
- `dbReadTable-method`:	AthenaJDBC
- `dbSendQuery-method`:	AthenaJDBC

Pulled in from other `cloudyr` pkgs: 

- `read_credentials`:	Use Credentials from .aws/credentials File
- `use_credentials`:	Use Credentials from .aws/credentials File

## Installation

```{r eval=FALSE}
devtools::install_github("hrbrmstr/metis")
```

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```

## Usage

```{r message=FALSE, warning=FALSE, error=FALSE}
library(metis)
library(tidyverse)

# current verison
packageVersion("metis")
```

```{r message=FALSE, warning=FALSE, error=FALSE}
use_credentials("default")

athena_connect(
  default_schema = "sampledb", 
  s3_staging_dir = "s3://accessible-bucket",
  log_path = "/tmp/athena.log",
  log_level = "DEBUG"
) -> ath

dbListTables(ath, schema="sampledb")

dbExistsTable(ath, "elb_logs", schema="sampledb")

dbListFields(ath, "elb_logs", "sampledb")

dbGetQuery(ath, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>% 
  type_convert() %>% 
  glimpse()
```

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.