~swaits/megapermute

a fast hypothesis test using a (mega)permutation test
84e0f382 — Stephen Waits 1 year, 2 months ago
add plain language results
df7306bc — Stephen Waits 1 year, 2 months ago
refactor main loop for 25% better performance, update README
2eb323fe — Stephen Waits 1 year, 2 months ago
update README with installation instructions

refs

main
browse  log 

clone

read-only
https://git.sr.ht/~swaits/megapermute
read/write
git@git.sr.ht:~swaits/megapermute

You can also use your local clone with git send-email.

#megapermute

A million permutation hypothesis test utility.

#Installation

  1. If you don't already have rust installed, install it.
  2. Then run: cargo install --git https://git.sr.ht/~swaits/megapermute

#Usage

megapermute is a command-line utility which reads two input files, control.dat and treatment.dat. Each is a text file containing a list of numbers, treated internally as 64-bit floating point numbers.

  • control.dat: the control sample
  • treatment.dat: the treatment sample

With these two files in the current directory, running megapermute computes the empircal difference in means and p-value for the null hypothesis, that H_0 is equivalent to H_a. You can think of p-value like the probability that the null hypothesis is true.

#Example

Given the following two input files:

~/tmp
❯ bat control.dat
───────┬───────────────────────
       │ File: control.dat
───────┼───────────────────────
   152
   2104
   3146
   410
   551
   630
   740
   827
   946
───────┴───────────────────────

~/tmp
❯ bat treatment.dat
───────┬───────────────────────
       │ File: treatment.dat
───────┼───────────────────────
   194
   2197
   316
   438
   599
   6141
   723
───────┴──────────────────────

Running megapermute should yield results like this:

~/tmp
❯ megapermute
                 mu_control = 56.22222222222222
               mu_treatment = 86.85714285714286
(mu_treatment - mu_control) = 30.63492063492064
                    p-value = 0.139556

#Performance

My goal with this was to take advantage of multiple cores by using rayon iterators to compute the permutation hypothesis test quickly. As of January 2022, this application can do one million permutations on the example dataset listed above in 39.4 ms (N=69, SD=2.6 ms), which is on the order of 25M permutations/second.

~/tmp
❯ hyperfine megapermute
Benchmark 1: megapermute
  Time (mean ± σ):      39.4 ms ±   2.6 ms    [User: 499.3 ms, System: 10.2 ms]
  Range (min … max):    35.1 ms …  48.4 ms    69 runs

Benchmark machine info:

~/tmp
❯ neofetch
                    'c.          stephenwaits@SWaits-Mac-01.local
                 ,xNMM.          --------------------------------
               .OMMMMo           OS: macOS 11.6.1 20G224 x86_64
               OMMM0,            Host: MacBookPro16,1
     .;loddo:' loolloddol;.      Kernel: 20.6.0
   cKMMMMMMMMMMNWMMMMMMMMMM0:    Uptime: 31 days, 21 hours, 23 mins
 .KMMMMMMMMMMMMMMMMMMMMMMMWd.    Packages: 149 (brew)
 XMMMMMMMMMMMMMMMMMMMMMMMX.      Shell: zsh 5.8
;MMMMMMMMMMMMMMMMMMMMMMMM:       Resolution: 3840x2160, 1920x1080
:MMMMMMMMMMMMMMMMMMMMMMMM:       DE: Aqua
.MMMMMMMMMMMMMMMMMMMMMMMMX.      WM: Amethyst
 kMMMMMMMMMMMMMMMMMMMMMMMMWd.    Terminal: WezTerm
 .XMMMMMMMMMMMMMMMMMMMMMMMMMMk   CPU: Intel i9-9880H (16) @ 2.30GHz
  .XMMMMMMMMMMMMMMMMMMMMMMMMK.   GPU: Intel UHD Graphics 630, AMD Radeon Pro 5500M
    kMMMMMMMMMMMMMMMMMMMMMMd     Memory: 21514MiB / 32768MiB
     ;KMMMMMMMWXXWMMMMMMMk.
       .cooc,.    .,coo:.