~ols/veri-docs

Documentation for veri, the FOSS search enging

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~ols/veri-docs
read/write
git@git.sr.ht:~ols/veri-docs

You can also use your local clone with git send-email.

#Veri

The FOSS search engine

#What is Veri?

Veri aims to be a FOSS search engine that can be deployed by anyone to scrape, index, and search a given subset of URLs and their immediate connections.

The workflow of Veri will be as follows

  1. A list of "tier 1" URLs is kept up to date by the instance operator
  2. Veri will periodically scrape those URLs to get a list of "tier 2" URLs
  3. Veri will periodically scrape the entire list of URLs to obtain page contents and metadata, to be stored in a database

#Why "Veri"?

There are two reasons why the name veri was chosen:

  • veri is the Latin word for truth, reality, or fact
  • veri is the Turkish word for data

#About the Project

#Goals

The goals of the project are as follows:

  • To be deployable by anyone to create their own specific-interest search engine
  • To have an understanding of www, gemini, and gopher schemes
  • To be modular, so that any of the individual components can be deployed without the others
  • To be a good citizen of the Internet, respecting robots.txt and configurable User-Agent to provide contact details for the instance

#Components

The majority of these are either a work in progress or non-existent

#veri-links

Generate a list of links from a given list of links.

It will have various flavours, albeit with as much shared code as possible, which are:

  • veri-links-www
  • veri-links-gemini
  • veri-links-gopher

The results will be written to a database, along with whether the link was a direct link or a discovered link

#veri-scrape

For scraping a list of links provided to extract content and metadata, including:

  • URL
  • Title
  • Author
  • Content length
  • Summary
  • HTML (for www sites) content
  • Plain text content

It will have various flavours, albeit with as much shared code as possible, which are:

  • veri-scrape-www
  • veri-scrape-gemini
  • veri-scrape-gopher

The results will be written to a database

#veri-index

An indexer that will create a Full-Text Search-capable inverted index from the database of entries

#veri-search

For retriving ranked entries

#veri-web

A way of submitting a search query and displaying results on the web

#veri-gemini

A way of submitting a search query and displaying results over gemini

#veri-gopher

A way of submitting a search query and displaying results over gopher

#veri-proxy

A way to view a site through the veri search interface, rather than visit directly, that is able to proxy www, gemini, or gopher content to any of the other protocols.

#Get involved

Post on the ~ols/veri-discuss mailing list for discussion or send patches to the ~ols/veri-devel list.

#Roadmap

Until such time as sr.ht has a nice Kanban board feature, you can see the roadmap at Trello


Full documentation (WIP) is here

Logo re-coloured from here, shared under CC BY-SA 4.0