dperm             package:exactRankTests             R Documentation

_D_i_s_t_r_i_b_u_t_i_o_n _o_f _P_e_r_m_u_t_a_t_i_o_n _T_e_s_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Density, distribution function and quantile function  for the
     distribution of permutation tests using the Shift-Algorithm by
     Streitberg and R\"ohmel.

_U_s_a_g_e:

     dperm(x, scores, m, paired=NULL, tol = 0.01, fact=NULL)
     pperm(q, scores, m, paired=NULL, tol = 0.01, fact=NULL)
     pperm2(q, scores, m, paired=NULL, tol = 0.01, fact=NULL)
     qperm(p, scores, m, paired=NULL, tol = 0.01, fact=NULL)
     rperm(n, scores, m)

_A_r_g_u_m_e_n_t_s:

    x, q: vector of quantiles.

       p: vector of probabilities.

  scores: ranks, midranks or real valued scores of the observations  of
          the `x' (first `m' elements) and `y' sample. 

       m: sample size of the `x' sample. If `m = length(x)' scores of
          paired observations are assumed.

  paired: logical. Indicates if paired observations are used. Needed to
          discriminate between a paired problem and the distribution of
          the total sum of the scores (which has mass 1 at the point
          `sum(scores)')

     tol: real. real valued scores are mapped into integers by
          multiplication. Make sure that the absolute difference
          between the "true" quantile and the approximated quantile is
          less than `tol'. This might not be possible due to
          memory/time limitations.

    fact: real. If `fact' is given, real valued scores are mapped into
          integers using `fact' as factor. `tol' is ignored.

       n: number of observations.

_D_e_t_a_i_l_s:

     The exact distribution of the sum of the first `m' scores is
     evaluated using the Shift-Algorithm by Streitberg and R\"ohmel
     under the hypothesis of exchangeability (or, equivalent, the
     hypothesis that all permutations of the scores are equally
     likely). The algorithm is able to deal with tied scores, so the
     conditional distribution can be evaluated. 

     The algorithm is defined for positive integer valued scores only. 
     There are two ways dealing with real valued scores.  First, one
     can try to find integer valued scores that lead to quantiles which
     differ not more than `tol' from the quantiles induced by the
     original scores. This can be done as follows.  

     Without loss of generality let a_i > 0 denote real valued scores
     and f a positive factor. Let R_i = a_i - round(f cdot a_i).   Then 


    sum_{i=1}^m f cdot a_i = sum_{i=1}^m round(f cdot a_i) - R_i.


     Clearly, the maximum difference between sum_{i=1}^m f cdot a_i and
     sum_{i=1}^n round(f cdot a_i) is given by |sum_{i=1}^m R_{(i)}| or
     |sum_{i=m+1}^N R_{(i)}|, respectively. Therefore one searches for
     f with 


    max(|sum_{i=1}^m R_{(i)}|, |sum_{i=m+1}^N R_{(i)}|)/f <= tol.


     If f induces more that 20.000 columns in the Streitberg-R\"ohmel
     Shift-Algorithm, f is restricted to the largest integer that does
     not. 

     The second idea is to map the scores itself into {1, ..., N}. This
     induces additional ties, but the shape of the scores is very
     similar. That means we do not try to approximate something but use
     a different test (with integer scores), serving for the same
     purpose (due to a similar shape of the scores). 

     Exact two-sided p-values are computed as suggested in the
     StatXact-4  manual, page 209, equation (9.32) and equation (8.19),
     p. 165 (paired case).

_V_a_l_u_e:

     `dperm' gives the density, `pperm' gives the distribution function
     and `qperm' gives the quantile function. `pperm2' can be used for
     the calculation of two-sided p-values. `rperm' is just a one-line
     wrapper to `sample'.

_A_u_t_h_o_r(_s):

     Torsten Hothorn <Torsten.Hothorn@rzmail.uni-erlangen.de>

_R_e_f_e_r_e_n_c_e_s:

     Bernd Streitberg and Joachim R\"ohmel (1986).  Exact Distributions
     For Permutations and Rank Tests:  An Introduction to Some Recently
     Published Algorithms.  Statistical Software Newsletter 12, No. 1,
     10-17.

     Bernd Streitberg and Joachim R\"ohmel (1987) Exakte Verteilungen
     f\"ur Rang- und Randomisierungstests  im allgemeinen
     $c$-Stichprobenfall. EDV in Medizin und Biologie 18, No. 1, 12-19.

_E_x_a_m_p_l_e_s:

     # exact one-sided p-value of the Wilcoxon test for a tied sample

     x <- c(0.5, 0.5, 0.6, 0.6, 0.7, 0.8, 0.9)
     y <- c(0.5, 1.0, 1.2, 1.2, 1.4, 1.5, 1.9, 2.0)
     r <- rank(c(x,y))
     pperm(sum(r[seq(along=x)]), r, 7)

     # Compare the exact algorithm as implemented in ctest and the
     # Streitberg-Roehmel for untied samples

     # Wilcoxon:

     n <- 10
     x <- rnorm(n, 2)
     y <- rnorm(n, 3)
     r <- rank(c(x,y))

     # exact distribution using Streitberg-Roehmel

     dwexac <- dperm((n*(n+1)/2):(n^2 + n*(n+1)/2), r, n)
     su <- sum(dwexac)           # should be something near 1 :-)
     su
     if (su != 1) stop("sum(dwexac) not equal 1")

     # exact distribution using dwilcox

     dw <- dwilcox(0:(n^2), n, n)

     # compare the two distributions:

     plot(dw, dwexac, main="Wilcoxon", xlab="dwilcox", ylab="dperm")      
     # should give a "perfect" line

     # Wilcoxon signed rank test

     n <- 10
     x <- rnorm(n, 5)
     y <- rnorm(n, 5)
     r <- rank(abs(x - y))
     pperm(sum(r[x - y > 0]), r, length(r))
     wilcox.test(x,y, paired=T, alternative="less")
     psignrank(sum(r[x - y > 0]), length(r))

     # Ansari-Bradley

     n <- 10
     x <- rnorm(n, 2, 1)
     y <- rnorm(n, 2, 2)

     # exact distribution using Streitberg-Roehmel

     r <- rank(c(x,y))
     sc <- pmin(r, 2*n - r +1)
     dabexac <- dperm(0:(n*(2*n+1)/2), sc, n)
     sum(dabexac)
     tr <- which(dabexac > 0)

     # exact distribution using dansari (wrapper to ansari.c in ctest)

     dab <- dansari(0:(n*(2*n+1)/2), n, n)

     # compare the two distributions:

     plot(dab[tr], dabexac[tr], main="Ansari", xlab="dansari", ylab="dperm")

     # real scores are allowed (but only result in an approximation)
     # e.g. v.d. Waerden test

     n <- 10
     x <- rnorm(n)
     y <- rnorm(n)
     N <- length(x) + length(y)
     r <- rank(c(x,y))
     scores <- qnorm(r/(N+1))
     X <- sum(scores[seq(along=x)])  # <- v.d. Waerden normal quantile statistic

     # critical value, two-sided test

     abs(qperm(0.025, scores, length(x)))

     # p-values

     p1 <- pperm2(X, scores, length(x))

     # generate integer valued scores with the same shape as normal quantile
     # scores, this no longer v.d.Waerden, but something very similar

     scores <- scores - min(scores)
     scores <- round(scores*N/max(scores))

     X <- sum(scores[seq(along=x)])
     p2 <- pperm2(X, scores, length(x))

     # compare p1 and p2

     p1 - p2

     # the blood pressure example from StatXact:

     treat <- c(94, 108, 110, 90)
     contr <- c(80, 94, 85, 90, 90, 90, 108, 94, 78, 105, 88)

     # compute the v.d. Waerden test and compare the results to StatXact:

     r <- rank(c(contr, treat))
     sc <- qnorm(r/16)
     X <- sum(sc[seq(along=contr)])
     round(pperm(X, sc, 11), 4)      # == 0.0462 (StatXact)
     round(pperm2(X, sc, 11), 4)     # == 0.0799 (StatXact)

     # the alternative method returns:

     sc <- sc - min(sc)
     sc <- round(sc*16/max(sc))
     X <- sum(sc[seq(along=contr)])

     round(pperm(X, sc, 11), 4)      # compare to 0.0462 
     round(pperm2(X, sc, 11), 4)     # compare to 0.0799

