emclust1               package:mclust               R Documentation

_B_I_C _f_r_o_m _h_i_e_r_a_r_c_h_i_c_a_l _c_l_u_s_t_e_r_i_n_g _f_o_l_l_o_w_e_d _b_y _E_M _f_o_r _a _p_a_r_a_m_e_t_e_r_i_z_e_d _G_a_u_s_s_i_a_n _m_i_x_t_u_r_e _m_o_d_e_l.

_D_e_s_c_r_i_p_t_i_o_n:

     Bayesian Information Criterion for various numbers of clusters
     computed from  hierarchical clustering followed by EM for a
     selected parameterization of Gaussian mixture models possibly with
     Poisson noise.

_U_s_a_g_e:

     emclust1(data, nclus, modelid, equal=F, noise, Vinv)

_A_r_g_u_m_e_n_t_s:

    data: matrix of observations. 

   nclus: An integer vector specifying the numbers of clusters for
          which the BIC is to be calculated. Default: 1:9 without
          noise; 0:9 with noise. 

 modelid: An integer or vector of two integers specifying the model(s)
          to be used in the hierarchical clustering and EM phases of
          the BIC calculations. The allowed values or `modelid' and
          their interpretation are as follows: `"EI"' : uniform
          spherical, `"VI"' : spherical, `"EEE"' : uniform variance, 
          `"VVV"' : unconstrained  variance, `"EEV"' : uniform shape
          and volume,  `"VEV"' : uniform shape. Default:
          `c("VVV","VVV")' (unconstrained variance for both phases) 

       k: If `k' is specified, the hierarchical clustering phase will
          use a sample of size `k' of the data in the initial
          hierarchical clustering phase. The default is to use the
          entire data set. 

   equal: Logical variable indicating whether or not the mixing
          proportions are equal in the model. The default is to assume
          they are unequal. 

   noise: A logical vector of length equal to the number of
          observations in the data, whose elements indicate an initial
          estimate of noise (indicated by `T') in the data. By default,
          `emclust1' fits Gaussian mixture models in which it is 
          assumed there is no noise. If `noise' is specified,
          `emclust1' will fit a Gaussian mixture with a Poisson term
          for noise in the EM phase. 

    Vinv: An estimate of the inverse hypervolume of the data region
          (needed only if `noise' is specified). Default : determined
          by the function `hypvol'. 

_V_a_l_u_e:

     Bayesian Information Criterion for the six mixture models and
     specified numbers of clusters. Auxiliary information returned as
     attributes.

_N_O_T_E:

     The reciprocal condition estimate returned as an attribute ranges
     in value between 0 and 1. The closer this estimate is to zero, the
     more likely it is that the corresponding EM result (and BIC) are
     contaminated by roundoff error.

_R_e_f_e_r_e_n_c_e_s:

     C. Fraley and A. E. Raftery, How many clusters? Which clustering
     method? Answers via model-based cluster analysis. Technical Report
     No. 329, Dept. of Statistics, U. of Washington (February 1998).

     R. Kass and A. E. Raftery, Bayes Factors. Journal of the American 
     Statistical Association90:773-795 (1995).

_S_e_e _A_l_s_o:

     `summary.emclust1', `emclust', `mhtree', `me'

_E_x_a_m_p_l_e_s:

     data(iris)
     emclust1(iris[,1:4], nclus=2:3, modelid = c("VVV","EEV"))

     data(chevron)
     noisevec _ rep(0, nrow(chevron))
     noisevec[chevron[,2]>60] _ 1
     bicvals _ emclust1(chevron, noise=noisevec, nclus=0:5)
     sumry _ summary(bicvals, chevron)
     plot(chevron, col=ztoc(sumry$z), pch=ztoc(sumry$z))

