meanscore             package:meanscore             R Documentation

_M_e_a_n _S_c_o_r_e _M_e_t_h_o_d _f_o_r _M_i_s_s_i_n_g _C_o_v_a_r_i_a_t_e _D_a_t_a _i_n _L_o_g_i_s_t_i_c _R_e_g_r_e_s_s_i_o_n _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     Weighted logistic regression using the Mean Score method

_U_s_a_g_e:


             meanscore(y=y,x=x,z=z,factor=NULL,print.all=FALSE)

_A_r_g_u_m_e_n_t_s:

       y: response variable (binary 0-1)

       x: matrix of predictor variables, one column
 of which contains
          some missing values (NA)

       z: matrix of the surrogate or auxiliary variables 
 which must
          be categorical

  factor: optional factor variables; if the columns of the matrix of
          predictor variables have names, supply these names, 
          otherwise supply the column numbers. MEANSCORE will fit 
          separate coefficients for each level of the factor variables.

_D_e_t_a_i_l_s:

     The response, predictor and surrogate variables 
 must be numeric.
     The function will automatically
 call the CODING function to
     recode the z matrix 
 to give a `new.z' vector which takes a
     unique value
 for each combination (type help(`coding') for
     further
 particulars), as follows:

       z1  z2  z3  new.z
        0   0   0      1
        1   0   0      2
        0   1   0      3
        1   1   0      4
        0   0   1      5
        1   0   1      6
        0   1   1      7
        1   1   1      8

     The values of this new.z are reported as `new.z' see 
 `coding'.

_V_a_l_u_e:

     A list called "parameters" containing the following 
 will be
     returned:

     est: the vector of estimates of the regression coefficients

      se: the vector of standard errors of the estimates

       z: Wald statistic for each coefficient

  pvalue: 2-sided p-value (H0: coeff=0) 


          when print.all = TRUE, it will also return the following
          lists:

    Ihat: the Fisher information matrix

   varsi: variance of the score for each (ylevel,zlevel) stratum

_R_e_f_e_r_e_n_c_e_s:

     Reilly,M and M.S. Pepe. 1995. A mean score method for missing and
     auxiliary 
     covariate data in regression models. Biometrika 82:299-314

_S_e_e _A_l_s_o:

     `ms.nprev',`coding',
 `ectopic',`simNA',`glm'.

_E_x_a_m_p_l_e_s:



     THE SIMULATED DATASET EXAMPLE

     We use the simulated dataset which is stored in the matrix simNA.
     You can load the dataset using:

     data(simNA) 

     help (simNA)
     #gives a detailed description of the data.

     To analyze this data using the meanscore function:

     meanscore(y=simNA[,1],z=simNA[,2],x=simNA[,3])

     This will give the following:

     [1] "For calls to ms.nprev, input n1 or prev in the following order!!"
          ylevel z new.z  n1  n2
     [1,]      0 0     0 310 150
     [2,]      0 1     1 166  85
     [3,]      1 0     0 177  86
     [4,]      1 1     1 347 179

     $parameters
                       est         se          z    pvalue
     (Intercept) 0.0493998 0.07155138  0.6904103 0.4899362
     x           1.0188437 0.10187094 10.0013188 0.0000000

     If you extract the complete cases (n=500) to a matrix called
     "complete", using

     complete_simNA[!is.na(simNA[,3]),]

     then 
     summary(glm(complete[,1]~complete[,3], family="binomial"))

     gives the following results:

     Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
     (Intercept)    0.05258    0.09879   0.532    0.595    
     complete[, 3]  1.01942    0.12050   8.460   <2e-16 ***



     Notice that the Mean Score estimates above had smaller 
     standard errors, reflecting the additional information
     in the incomplete observations used in the analysis.
     Also note that since z is a surrogate for x, it is not 
     used in the complete case analysis.



     THE ECTOPIC DATASET EXAMPLE

     This is a real-data example of an application of Mean Score
     to a case-control study of the association between ectopic 
     pregnancy and sexually transmitted diseases (see Reilly and 
     Pepe, 1995). To learn more about the dataset, type help(ectopic). 

     The data frame called "ectopic" is in the data subfolder
     of the meanscore library. You can load the data by typing:

     data(ectopic)

     The following lines will reproduce the results presented in Table 3 
     of Reilly & Pepe (1995)

     # use gonnorhoea, contracept and sexpatr as auxiliary variables
     ectopic.z_ectopic[,3:5]

     # the auxiliary variables defined above and the chlamydia antibody status 
     # are the predictor variables in the logistic regression model          
     ectopic.x_ectopic[,2:5]    

     meanscore(x=ectopic.x,z=ectopic.z,y=ectopic[,1])


