~lown/openorb

A curated search engine for Atom and RSS feeds.
Adds Templating for new config items
Added commented out example of new config options
feat: better error handling for problematic feed URLs

refs

main
browse  log 

clone

read-only
https://git.sr.ht/~lown/openorb
read/write
git@git.sr.ht:~lown/openorb

You can also use your local clone with git send-email.

#OpenOrb

OpenOrb is a curated search engine for Atom and RSS feeds. You can use my own instance here.

A screenshot of the OpenOrb search interface, showing a search for 'python'

An OpenOrb instance is configured with a list of feeds to search - what was once called a blogroll - and indexes this list periodically. In this way, OpenOrb provides a window into the content a specific person or community cares about, with the benefit of making this content searchable and therefore more accessible. OpenOrb is designed to be the opposite of Google and other black box, monolithic search engines - it's open source, configurable, personal, and predictable.

OpenOrb uses an extremely simple search engine, mostly adapted from Alex Molas's superb 'search engine in 80 lines of Python', which uses BM25 and doesn't handle wildcards, boolean operators, or stemming/lemmatization (yet). I might add some of these features in the future, but I wrote OpenOrb in a single day so I'm keeping it simple for now!

The success of a search on OpenOrb suffers from the same limitations as RSS in general - people and software have different, and sometimes weird, opinions on how to present structured data. If a feed doesn't include full post content (which it should!), then OpenOrb won't be able to index it. If a feed has a cut-off limit for how many posts are in it, OpenOrb will only know about the ones that it had time to save. Embrace the limitations of the messy open web, and tell your friends to stop deleting posts from their feeds.

#Setup

OpenOrb can be run as a Docker Compose stack. First, clone the repo:

git clone https://git.sr.ht/~lown/openorb
cd openorb

In the data folder, rename the config.example.toml file to a config.toml file and edit it to your liking. Start OpenOrb by running:

docker compose up -d

OpenOrb will save its index.db SQLite database in the data folder. If you edit the config file, restart the stack by running the same command.

#Crawling

The OpenOrb Docker stack sets up a cron job which curls the /api/crawl endpoint every hour. Anyone can POST to that endpoint to immediately schedule a crawl, but to prevent overloading an OpenOrb instance, the crawl won't run more often than once every 15 minutes. You could potentially DDoS an OpenOrb instance by spamming the /api/crawl endpoint with a bot, but that also goes for any other endpoint - exposing the /api/crawl endpoint is simple and relatively low-risk.

If you're running OpenOrb without Docker, you may want to set up your own cron job based on the example in the /cron directory.