~hrbrmstr/urlscan

ref: af7b473a760184beb3065541beec2c18408a1101 urlscan/README.Rmd -rw-r--r-- 1.5 KiB
af7b473aboB Rudis added submit function 2 years ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
output: rmarkdown::github_document
---

# urlscan

Analyze Websites and Resources They Request

## Description

WIP

The <urlscan.io> service provides an 'API' enabling analysis of 
websites and the resources they request. Much like the 'Inspector' of your 
browser, <urlscan.io> will let you take a look at the individual resources 
that are requested when a site is loaded. Tools are provided to search
public <urlscans.io> scan submissions/results and submit URLs for scanning.

## What's Inside The Tin

The following functions are implemented:

- `urlscan_search`: Perform a urlscan.io query
- `urlscan_result`:	Retrieve detailed results for a given scan ID
- `urlscan_submit`:	Submit a URL for scanning

## Installation

```{r eval=FALSE}
devtools::install_github("hrbrmstr/urlscan")
```

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```

## Usage

```{r message=FALSE, warning=FALSE, error=FALSE}
library(urlscan)

# current verison
packageVersion("urlscan")
```

```{r}
library(tidyverse)

x <- urlscan_search("domain:r-project.org")

bind_cols(
  select(x$results$task, -options) %>% 
    mutate(user_agent = x$results$task$options$useragent)
  ,x$results$stats, 
  x$results$page
) %>% 
  mutate(id = x$results$`_id`) %>% 
  mutate(result_api_url = x$results$result) %>% 
  tbl_df() -> xdf

xdf

glimpse(xdf)

ures <- urlscan_result(xdf$id[2], TRUE, TRUE)

str(ures$scan_result, 2)

magick::image_write(ures$screenshot, "img/shot.png")
```

![](img/shot.png)