A tool for modeling web navigation data into various order Markov chains and deriving statistical data.
b0516676 — Dimos Dimakakos 2 years ago
Docs: update README
95d45081 — Dimos Dimakakos 2 years ago
Fix: add option for skipping input lines
eea495df — Dimos Dimakakos 2 years ago


browse  log 



You can also use your local clone with git send-email.

#Modeling web navigation data into Markov chains of varying order

The only dependency of the tool is the tidyverse library.

This tool has the main entry point simulation:

simulation <- function(input, k, states, topics, skip) { ... }
  1. input: a file of navigation data, where each line is a navigation path of numbers that describe the topics that where chosen.
  2. k: the upper limit of Markov chain order, for which to model the data into
  3. states: the number of states (topics) that appear in the dataset
  4. topics: an array of length states, that maps index to name
  5. skip: the number of lines to ignore from input

Topics and skip arguments provide default values for the analysis of the msnbc anonymous navigation dataset that can be found here.

It returns three tibbles:

  1. frequencies of topics
  2. loglikelihood of topics
  3. results that contain loglikelihood ratio statistics, AIC and BIC

There are also some helper functions provided for the creation of graphs.

#Common Lisp implementation

The Lisp implementation is faster but not as polished as the R one. Proceed with care!