Simple web crawler written in Go.
add github mirror
Merge from develop


browse  log 



You can also use your local clone with git send-email.


This is a toy web crawler written in Go.

It crawls a single host (i.e. anything.com) and outputs a site map in JSON format.

For each page crawled, it distinguishes between links to other pages, links to assets, broken links, and remote links (i.e. someotherhost.com).

#Example Usage and Output

❯ bin/docrawler https://goregex.com/

D.O. Crawler 1.0  Copyright (c) 2015 Stephen Waits <steve@waits.net>  2015-02-17

    "URL": "https://goregex.com/",
    "Title": "GoRegEx.com | Go Regular Expression Tester",
    "Links": [
    "Assets": [
    "Broken": null,
    "Remote": [


  • The main crawler loop is in docrawler.go:docrawl().
  • From there I maintain a hash of links we've already crawled.
  • Each link gets an httpItem{} struct instanced, which holds its crawl state.
  • A number of crawler goroutines are fired off in the beginning so that we can control precisely how many http fetches happen at a single time. This number is configurable via command line parameter.
  • These goroutines listen for *httpItem{} on one channel, crawl it, fill out its results, and send it back to docrawl() on another channel.
  • docrawl() is predominantly a for-select loop, selecting on a ticker (which is used to update status to the console and check for crawl completion), and on the "work finished" channel that the crawlers send data back on.

#How do I get set up?

Clone this repository, fetch the dependencies, and build:

hg clone https://bitbucket.org/swaits/docrawler/
cd docrawler
make vendor && make

This will fmt, lint, vet, and build the source into an executable at bin/docrawler.

In order to run the tests:

make test

If you'd like to automatically run tests any time a file is changed:

make autotest

To see stats about the code, install cloc (with brew install cloc on OS X) and then:

make stats

To view at the test coverage reports in your browser:

make cover

To deploy, just copy the executable bin/docrawler to a destination of your choosing.

Developed on OS X. Also tested on Windows 7 under MinGW+bash.

#Why a Makefile? Aren't those old???

  • It's clean.
  • It's portable.
  • It lets me package the app in a way that's trivially easy for anyone to build.
  • It makes it so I can easily vendor third-party packages outside of the main ./src tree.
  • Everyone is vendoring their own way in Go. I'm not in love with any particular style. But this is simple enough for this case.

#Who do I talk to?

Written by Stephen Waits. Please contact me at mailto:steve@waits.net with any questions.