~deadjakk/censys-scraper

It scrapes... censys
e18e1188 — deadjakk 2 months ago
added requirements file... hope that's all of them
7efdee44 — deadjakk 2 months ago
fixed arg requirement
53d67d60 — deadjakk 2 months ago
init

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~deadjakk/censys-scraper
read/write
git@git.sr.ht:~deadjakk/censys-scraper

You can also use your local clone with git send-email.

#Censys.io Subdomain Scraper

I haven't seen a scraper for censys.io that doesn't require an API key as of yet, so I made one. Of course, if you would like more than the first 1000 results, you should pay for an API key. Uses pyppeteer library to automate the scraping of https://censys.io for the purposes of discovering subdomains. Will wait for rate-limiting. Output is written to scraper-output.txt, nothing fancy.

Sample command: python3 ./censys_scrape.py --username USERNAME --password PASSWORD --domain censys.io --page 40

Help output:

python3 ./censys_scrape.py  --help
usage: censys_scrape.py [-h] [--page PAGE] --domain DOMAIN --username USERNAME --password PASSWORD [--filename FILENAME]

optional arguments:
  -h, --help           show this help message and exit
  --page PAGE          (optional)number of pages to scrape, otherwise this will be parsed from page
  --domain DOMAIN      domain for which to find subdomains
  --username USERNAME  username to login to censys
  --password PASSWORD  password to login to censys
  --filename FILENAME  (optional)name of output file, defaults to scraper-output.txt

Bugs:

  • Regex pattern is sub-par, will fix later.