Flawed Methodology

Professor Hasofer writes: about the lack of a clear specification of the alternative hypothesis in the WRR paper. Although there is not an explicit statement of the alternative hypothesis in the WRR paper, it is clear from the way the test statistic is formed that the alternative hypothesis is that the distribution of distances between ELSs of the appellations and the dates is shifted to the left, that is to smaller values. This alternative hypothesis is very different from what he suggests: that "the encoder has actually put the appropriate dates nearest to each of the names according to some distance measure."

Professor Simon writes:

"The probabilities quoted for the word clusters are computed by methods contrary to the accepted laws of probability and used in situations where it is essentially impossible to assign meaningful probabilities."

He explains this:

"Mr. Witztum’s calculation rely on multiplying together lots of not so large numbers ... assuming independence ... in situations where independence is not a valid assumption."

Indeed it is true that the statistic calculated by WRR involves the multiplication of fractions to form the statistic. And it is true that this would be the right thing to do if there were independence. For in that case the distribution of the statistic would be known and the Monte Carlo trials involved in the WRR experiment would not be necessary. But the fractions being multiplied do not represent probabilities of independent events and therefore the distribution of the product is not known and this is the reason why the WRR experiment must use Monte Carlo trials to establish a p-value. In the context of the Monte Carlo trials, the statistic formed is acting like a score function and as such the results of the experiment do not involve any independence assumptions.

