~hrbrmstr/urlscan

ref: 4d8ceac4f9e24a72e4f366ae192dc24b617e8eb9 urlscan/README.Rmd -rw-r--r-- 1.6 KiB
4d8ceac4boB Rudis v0.2.0 1 year, 8 months ago
                                                                                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
output: rmarkdown::github_document
editor_options: 
  chunk_output_type: console
---

# urlscan

Analyze Websites and Resources They Request

## Description

The <urlscan.io> service provides an 'API' enabling analysis of 
websites and the resources they request. Much like the 'Inspector' of your 
browser, <urlscan.io> will let you take a look at the individual resources 
that are requested when a site is loaded. Tools are provided to search
public <urlscans.io> scan submissions/results and submit URLs for scanning.

## What's Inside The Tin

The following functions are implemented:

- `urlscan_search`: Perform a urlscan.io query
- `urlscan_result`:	Retrieve detailed results for a given scan ID
- `urlscan_submit`:	Submit a URL for scanning

## Installation

```{r eval=FALSE}
devtools::install_git("https://git.sr.ht/~hrbrmstr/urlscan")
# or
devtools::install_gitlab("hrbrmstr/urlscan")
# or
devtools::install_github("hrbrmstr/urlscan")
```

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```

## Usage

```{r message=FALSE, warning=FALSE, error=FALSE}
library(urlscan)
library(tidyverse) # for demos

# current verison
packageVersion("urlscan")
```

```{r}
x <- urlscan_search("domain:r-project.org")

as_tibble(x$results$task) %>% 
  bind_cols(as_tibble(x$results$page)) %>% 
  mutate(
    time = anytime::anytime(time),
    id = x$results$`_id`
  ) %>%
  arrange(desc(time)) %>% 
  select(url, country, server, ip, id) -> xdf

ures <- urlscan_result(xdf$id[2], include_dom = TRUE, include_shot = TRUE)

ures

magick::image_write(ures$screenshot, "img/shot.png")
```

![](img/shot.png)