coding               package:twostage               R Documentation

_c_o_m_b_i_n_e_s _t_w_o _o_r _m_o_r_e _s_u_r_r_o_g_a_t_e/_a_u_x_i_l_i_a_r_y _v_a_r_i_a_b_l_e_s _i_n_t_o _a _v_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     recodes a matrix of categorical variables into a vector which
     takes 
 a unique value for each combination 


     BACKGROUND

     From the matrix Z of first-stage covariates, this function creates
     
 a vector which takes a unique value for each combination as
     follows:

       z1  z2  z3  new.z
        0   0   0      1
        1   0   0      2
        0   1   0      3
        1   1   0      4
        0   0   1      5
        1   0   1      6
        0   1   1      7
        1   1   1      8

     If some of the combinations do not exist, the function will adjust
      accordingly: for example if the combination (0,1,1) is absent
     above,
 then (1,1,1) will be coded as 7. 


     The values of this new.z are reported as `new.z' in the printed
     output 
 (see `value' below) 


     This function should be run on second stage data prior to using
     the ms.nprev function, as it illustrates the order in which the
     call
 to ms.nprev expects the first-stage sample sizes to be
     provided.

_U_s_a_g_e:


     coding(y=y,x=x,z=z,return=F)

_A_r_g_u_m_e_n_t_s:

     REQUIRED ARGUMENTS

       y: response variable (should be binary 0-1)

       x: matrix of predictor variables for regression model

       z: matrix of any surrogate or auxiliary variables which must be
          categorical 


          OPTIONAL ARGUMENTS

  return: logical value; if it's TRUE(T) the original surrogate
 or
          auxiliary variables and the re-coded auxilliary 
 variables
          will be returned.   
 The default is False (F). 
 

_V_a_l_u_e:

     This function does not return any values except if `return'=T. 


     If used with only second stage (i.e. complete) data, it will print
     the 
 following:
 

  ylevel: the distinct values (or levels) of response variable

z1 ... zi: the distinct values of first stage variables 
 z1 ... zi

   new.z: recoded first stage variables. Each value represents a unique
          combination of 
 first stage variable values.

      n2: second stage sample sizes in each (`ylevel',`new.z') stratum. 


          If used with combined first and second stage data (i.e. with
          NA for 
 missing values), in addition to the above items, the
          function will also print the following:

      n1: first-stage sample sizes in each (`ylevel',`new.z') stratum.

_R_e_f_e_r_e_n_c_e_s:

     Reilly,M. 1996. Optimal sampling strategies for 
 two-stage
     studies. Amer. J. Epidemiol. 
 143:92-100

_S_e_e _A_l_s_o:

     `ms.nprev',`fixed.n',`budget'
 `precision', `cass1',`cass2'

_E_x_a_m_p_l_e_s:



     The CASS2 data set in Reilly (1996) has 2 categorical first-stage 
     variables in columns 2 (sex) and 3 (categorical weight). The predictor 
     variables are  column 2 (sex) and columns 4-9 and the response variable 
     is in column 1 (mort). See help(cass2) for further details. 

     The commands
     data(cass2)
     coding(x=cass2[,c(2,4:9)],y=cass2[,1], z=cass2[,2:3])

     give the following coding scheme and first-stage and second-stage 
     sample sizes (n1 and n2 respectively)

     [1] "For calls requiring n1 or prev as input, use the following order"
           ylevel sex wtcat new.z n2
      [1,]      0   0     1     1 10
      [2,]      0   1     1     2 10
      [3,]      0   0     2     3 10
      [4,]      0   1     2     4 10
      [5,]      0   0     3     5 10
      [6,]      0   1     3     6 10
      [7,]      1   0     1     1  8
      [8,]      1   1     1     2 10
      [9,]      1   0     2     3 10
     [10,]      1   1     2     4 10
     [11,]      1   0     3     5 10
     [12,]      1   1     3     6 10


