refactor: limit item getting goroutines with a semaphore
refactor: some lua state things
fix: run lua functions sequentially
RSS all the things!
ratt is a tool for converting websites to rss/atom feeds. It uses lua config files which define the extraction of the feed data by using css selectors or lua functions.
Config files are in lua format:
--for automatic extraction, ratt checks all config files and matches the regex
ratt.add(
--regex
"https://github.com/trending",
--css selectors table
{
--settings for all http requests for the website
httpsettings = {
cookie = {},
header = {},
useragent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36"
},
--css selectors to get the feed data
feed = {
title = ".h1",
},
--css selectors to get item data
item = {
--the item container
container = "article.Box-row",
--selector can be a function which get's the item container selection object
title = function(sel, _)
return sel:find("h1.h3 > a[data-hydro-click]"):text():gsub("%s+", "")
end,
link = function(sel, _)
return "https://github.com" .. sel:find("a[data-hydro-click]"):attr("href")
end,
description = "p.color-fg-muted",
}
}
)
Config files are lua files. ratt has some confs embedded. When calling eg: ratt https://1337x.to/top-100
ratt will try to find the config for the website url, it searches the embedded config files, the current directory and in ~/.config/ratt/*.lua
.
First install go
, git
and scdoc
, then:
git clone https://git.sr.ht/~ghost08/ratt
cd ratt
sudo make install
Install on Arch Linux from AUR with your favorite helper:
yay -S ratt-git
File bugs and TODOs through the issue tracker or send an email to ~ghost08/ratt@todo.sr.ht. For general discussion, use the mailing list: ~ghost08/ratt@lists.sr.ht.
Just calling ratt with the url of the web page.
ratt https://github.com/trending/go
man ratt.5
That's a very good question. I'm happy you asked :)
You might feed the feed directly to photon, which is a modern RSS/Atom reader. photon will play you the media from your feed. It uses mpv and youtube-dl to automaticaly play videos, download torrents, view images and much more :)
So try this out:
ratt https://1337x.to/top-100 | photon -
If a css selector isn't enough to select the needed data, every feed and item attribute can be a lua function.
The function gets two arguments by default:
sel
is the selection object of the feed/item container on which it can be queried for the selectors
index
number of the item processed
The Lua script will get some modules to help with the extraction:
goquery
is a module imported by default and it is a subset of the famous goquery library
gojq
is a module imported by default, it is the gojq) library
ratt will take the return value of the Lua function and insert it as the data of the feed/item. When a error has occured, just use the error
function.
For more documentation see ratt(5)
Calling another link, parsing it to a goquery.Document and querying the new doc:
item = {
--select the item container html element
container = ".table-list-wrap tbody tr",
--select the title element in the item container
title = "a:nth-child(2)",
--lua script
link = function(sel, _)
--sel is the item container element, find <a/>
a = sel:find("a:nth-child(2)")
--get the href attribute of <a/> and make a item url link from it
itemURL = "https://1337x.to" .. a:attr("href")
--request and parse the document
doc, err = goquery.newDocFromURL(itemURL)
if err ~= nil then
--return error if the request was unsuccesfull
error(err)
end
--find the item link you want
link = doc:find("ul li a[onclick]"):first():attr("href")
--trim space characters
link = link:gsub("%s+", "")
--and finally print the link out so ratt can include it in the item.link
return link
end,
}
You can also parse and query json data, with the help of the awesome gojq) library:
feed = {
title = ".title",
description = function(sel, _)
--find the <script> element where the json data is
script = sel:find("script"):first():text()
index = script::find("var myJsonData =")
--cut of the "var myJsonData =" prefix
jsonData = script:sub(index+16)
--parse a gojq query, that will find the obj["description'] value
query, err = gojq.parse(".description")
if err ~= nil then
error(err)
end
--expecting that the input data is a map/object (otherwise if it's a array use runArray)
desc, err = query.runMap(jsonData)
if err ~= nil then
error(err)
end
return desc[1]["description"]
end,
}
Check the confs dir for other examples.
ratt needs config files for it to run. I really rely on the community to create configs for all the sites!
So please create config files, send them here, then everybody can make the world RSS again!