~ghost08/ratt

RSS all the things!
refactor: limit item getting goroutines with a semaphore
refactor: some lua state things
fix: run lua functions sequentially

refs

master
browse  log 

clone

read-only
https://git.sr.ht/~ghost08/ratt
read/write
git@git.sr.ht:~ghost08/ratt

You can also use your local clone with git send-email.

#ratt

RSS all the things!

ratt is a tool for converting websites to rss/atom feeds. It uses lua config files which define the extraction of the feed data by using css selectors or lua functions.

Config files are in lua format:

--for automatic extraction, ratt checks all config files and matches the regex
ratt.add(
	--regex
	"https://github.com/trending",
	--css selectors table
	{
		--settings for all http requests for the website
		httpsettings = {
			cookie = {},
			header = {},
			useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36"
		},
		--css selectors to get the feed data
		feed = {
			title = ".h1",
		},
		--css selectors to get item data
		item = {
			--the item container
			container = "article.Box-row",
			--selector can be a function which get's the item container selection object
			title = function(sel, _)
				return sel:find("h1.h3 > a[data-hydro-click]"):text():gsub("%s+", "")
			end,
			link = function(sel, _)
				return "https://github.com" .. sel:find("a[data-hydro-click]"):attr("href")
			end,
			description = "p.color-fg-muted",
		}
	}
)

#Configs

Config files are lua files. ratt has some confs embedded. When calling eg: ratt https://1337x.to/top-100 ratt will try to find the config for the website url, it searches the embedded config files, the current directory and in ~/.config/ratt/*.lua.

#Installation

First install go, git and scdoc, then:

git clone https://git.sr.ht/~ghost08/ratt
cd ratt
sudo make install

Install on Arch Linux from AUR with your favorite helper:

yay -S ratt-git

#Issues

File bugs and TODOs through the issue tracker or send an email to ~ghost08/ratt@todo.sr.ht. For general discussion, use the mailing list: ~ghost08/ratt@lists.sr.ht.

#Usage

Just calling ratt with the url of the web page.

ratt https://github.com/trending/go

#Documentation

man ratt.5

#What will I do with this RSS feed?

That's a very good question. I'm happy you asked :)

You might feed the feed directly to photon, which is a modern RSS/Atom reader. photon will play you the media from your feed. It uses mpv and youtube-dl to automaticaly play videos, download torrents, view images and much more :)

So try this out:

ratt https://1337x.to/top-100 | photon -

photon 1337x screenshot

#Lua

If a css selector isn't enough to select the needed data, every feed and item attribute can be a lua function.

The function gets two arguments by default:

sel is the selection object of the feed/item container on which it can be queried for the selectors

index number of the item processed

The Lua script will get some modules to help with the extraction:

goquery is a module imported by default and it is a subset of the famous goquery library

gojq is a module imported by default, it is the gojq) library

ratt will take the return value of the Lua function and insert it as the data of the feed/item. When a error has occured, just use the error function.

For more documentation see ratt(5)

#examples

Calling another link, parsing it to a goquery.Document and querying the new doc:

item = {
  --select the item container html element
  container = ".table-list-wrap tbody tr",
  --select the title element in the item container
  title = "a:nth-child(2)",
  --lua script
  link = function(sel, _)
    --sel is the item container element, find <a/>
    a = sel:find("a:nth-child(2)")
	--get the href attribute of <a/> and make a item url link from it
    itemURL = "https://1337x.to" .. a:attr("href")
	--request and parse the document
    doc, err = goquery.newDocFromURL(itemURL)
    if err ~= nil then
	  --return error if the request was unsuccesfull
      error(err)
    end
	--find the item link you want
    link = doc:find("ul li a[onclick]"):first():attr("href")
	--trim space characters
    link = link:gsub("%s+", "")
	--and finally print the link out so ratt can include it in the item.link
    return link
  end,
}

You can also parse and query json data, with the help of the awesome gojq) library:

feed = {
  title = ".title",
  description = function(sel, _)
    --find the <script> element where the json data is
    script = sel:find("script"):first():text()
    index = script::find("var myJsonData =")
    --cut of the "var myJsonData =" prefix
    jsonData = script:sub(index+16)
    --parse a gojq query, that will find the obj["description'] value
    query, err = gojq.parse(".description")
    if err ~= nil then
      error(err)
    end
    --expecting that the input data is a map/object (otherwise if it's a array use runArray)
    desc, err = query.runMap(jsonData)
    if err ~= nil then
      error(err)
    end
    return desc[1]["description"]
  end,
}

Check the confs dir for other examples.

#Contribution

ratt needs config files for it to run. I really rely on the community to create configs for all the sites!

So please create config files, send them here, then everybody can make the world RSS again!