~quf/fivewords

finds sets of five five-letter words with 25 distinct letters
further improve set operations
improve set operations (20sec -> 10sec)

refs

trunk
browse  log 

clone

read-only
https://git.sr.ht/~quf/fivewords
read/write
git@git.sr.ht:~quf/fivewords

You can also use your local clone with git send-email.

tries to find sets of five english-language words (assuming ASCII letters only) without any duplicate letters. inspired by matt parker's video on it: https://www.youtube.com/watch?v=_-AfhLQfb6w

uses words_alpha.txt from https://github.com/dwyl/english-words

on my machine (16 cpu cores), this code needs around 5 seconds to compile (excluding the time needed to fetch the dependencies over the network) and around 9 additional seconds to find 831 solutions.

to run the code yourself, install rust/cargo, then run:

$ cargo run --release

previous work includes a program by matt parker's, which according to him takes roughly a month to find all solutions, and a program by five_clique which according to Matt Parker takes roughly 15 minutes.

the code uses the following techniques for performance:

  • the word list is filtered to contain only five letter words with no repeated letter.
  • words are translated to bitsets represented as 32 bit integers ('a' in the word means the zero'th bit is set, 'b' in the word means the first bit is set, etc.). this makes set operations extremely fast.
  • five nested loops with early bailout (e.g. if the first two words have a letter in common, there's no point in testing any additional word)
  • the words/bitsets are sorted such that words with the most common letters are first. this means the outermost loops will tend to find duplicate letters earlier in the outer loops, greatly reducing wasted effort.
  • for each word, a list of compatible words (i.e. no letter in common) is precomputed before attempting to find solutions, reducing the the number of candidates during the search.
  • to ensure that no duplicate sets (same words in different order) are found and to further reduce the number of candidates, only sets where the words are in ascending order are considered.
  • parallelising the outer loop reduces the wall clock time by roughly a factor of 6 (on my machine).

license: ¯\_(ツ)_/¯