Update dataset
Add Martian Mono font
CLANCY MODE
You can play it here: one tøp song
1024. That's how many words there are that appear in only one twenty øne piløts song, give or take (see next section for more details).
In this game you will be shown words from the list, and asked to guess which song each one is from. (Feel free not to answer 🙂)
For more information, please read game footer
make
automates all of the arrows in this diagram. Every time you update
a lyrics file, txt file or source code, run make
to regenerate datasets
and static site.
This repo does not host the lyrics. Instead, you'll download and curate
the lyric library yourself. Name each file lyrics/Album Title/Track Title.txt
.
What this repo does offer is a semi-automatic pipeline for Data
Crunching™. allwords.py
takes the lyrics, breaks them into words, and
for every word that has ever appeared in any song in any form, it stores
all its occurrences in allwords.json
. This file contains 2k+ entries.
onesong.py
takes allwords.json
and does rest of the work. You supply
it with three files:
false_inflections.txt
, which prevents the algorithm from confusing
"even" with "evening"irregular_verbs.txt
, from Wiktionary: Appendix: English irregular
verbsdenylist.txt
, which serves as a final filter. Each line is a regex,
and a word that matches any of them is excluded from onesong.json
.and it generates onesong.json
, which is the dataset we need for the site
generator.
onesong.py
also prints a list of inflections it detected and thus
excluded. If you find a false positive, just add them to denylist.txt
.
mkjs.py
converts onesong.json
into a JavaScript file, which is just
a giant array variable that index.js
accesses.
mkhtml.py
uses Jinja to
render words.jinja
into words.html
with data from onesong.json
.
The following files are subject to the MIT license:
Additionally,
denylist.txt
is in the public domainfalse_inflections.txt
is in the public domainirregular_verbs.txt
is CC BY-SA
4.0diagram.{odg,png}
are in the public domainMakefile
is in the public domainI don't have any lawyer friends but what I know is no one can own non-trademarked words in the English language. On this ground, all words in the datasets are in the public domain ("adidas" might be an exception), but the lyrics in the form of full lines are owned by TØP and/or FBR.
I think album and track titles are uncopyrightable, so album.js
would be
public domain, but don't quote me on this.