Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn
This is a research paper whose authors are Belinda Phipson , Gordon K Smyth,
Abstract:
Permutation tests are among the most extensively used statistical procedures in current genomic research, and they include randomly permuting the sample or gene labels to assign p-values to test statistics. However, permutation p-values reported in the genomic literature are frequently estimated improperly, understating the number of permutations by roughly 1/m. When Monte Carlo simulation is used to assign p-values, the same thing frequently happens. Although the p-value understatement is generally small in absolute terms, the repercussions in a multiple testing situation might be severe.
The underestimation stems from the intuitive but incorrect notion of using permutation to estimate the tail probability of the test statistic. Instead, we propose that permutation should be understood as yielding an exact discrete null distribution. The relevant literature, some of which is likely to have been relatively unavailable to the genetic community, is reviewed and summarised.
Conclusions:
When permutations are selected at random, a computing approach is established for accurate p-values. The approach works for any number of permutations and samples. Some basic guidelines are offered for the practical application of permutation testing.