emclust                package:mclust                R Documentation

_B_I_C _f_r_o_m _h_i_e_r_a_r_c_h_i_c_a_l _c_l_u_s_t_e_r_i_n_g _f_o_l_l_o_w_e_d _b_y _E_M _f_o_r _s_e_v_e_r_a_l _p_a_r_a_m_e_t_e_r_i_z_e_d _G_a_u_s_s_i_a_n _m_i_x_t_u_r_e _m_o_d_e_l_s.

_D_e_s_c_r_i_p_t_i_o_n:

     Bayesian Information Criterion for various models and numbers of
     clusters computed from hierarchical clustering followed by EM for
     several parameterizations of Gaussian mixture models possibly with
     Poisson noise.

_U_s_a_g_e:

     emclust(data, nclus, modelid, k, equal=F, noise, Vinv)

_A_r_g_u_m_e_n_t_s:

    data: matrix of observations. 

   nclus: An integer vector specifying the numbers of clusters for
          which the BIC is to be calculated. Default: 1:9 without
          noise; 0:9 with noise. 

 modelid: A vector of character strings indicating the models to be
          fitted. The allowed values or `modelid' and their
          interpretation are as follows: `"EI"' : uniform spherical,
          `"VI"' : spherical, `"EEE"' : uniform variance, `"VVV"' :
          unconstrained variance, `"EEV"' : uniform shape and volume,
          `"VEV"' : uniform shape. The default is to fit all of the
          models. 

       k: If `k' is specified, the hierarchical clustering phase will
          use a sample of size `k' of the data in the initial
          hierarchical clustering phase. The default is to use the
          entire data set. 

   equal: Logical variable indicating whether or not the mixing
          proportions are equal in the model. The default is to assume
          they are unequal. 

   noise: A logical vector of length equal to the number of
          observations in the data, whose elements indicate an initial
          estimate of noise (indicated by `T') in the data. By default,
          `emclust' fits Gaussian mixture models in which it is 
          assumed there is no noise. If `noise' is specified, `emclust'
          will fit a  Gaussian mixture with a Poisson term for noise in
          the EM phase. 

    Vinv: An estimate of the inverse hypervolume of the data region
          (needed only if `noise' is specified). Default : determined
          by function `hypvol' 

_V_a_l_u_e:

     Bayesian Information Criterion for the six mixture models and
     specified numbers of clusters. Auxiliary information returned as
     attributes.

_N_O_T_E:

     The hierarchical clustering phase uses the unconstrained model.
     The reciprocal condition estimate returned as an attribute ranges
     in value between 0 and 1. The closer this estimate is to zero, the
     more likely it is that the corresponding EM result (and BIC) are
     contaminated by roundoff error.

_R_e_f_e_r_e_n_c_e_s:

     C. Fraley and A. E. Raftery, How many clusters? Which clustering
     method? Answers via model-based cluster analysis. Technical Report
     No. 329, Dept. of Statistics, U. of Washington (February 1998).

     R. Kass and A. E. Raftery, Bayes Factors. Journal of the American 
     Statistical Association90:773-795 (1995).

_S_e_e _A_l_s_o:

     `summary.emclust', `emclust1', `mhtree', `me'

_E_x_a_m_p_l_e_s:

     data(iris)
     bicvals _ emclust(iris[,1:4], nclus=1:3, modelid=c("VVV","EEV","VEV"))

     data(chevron)
     noisevec _ rep(0, nrow(chevron))
     noisevec[chevron[,2]>60] _ 1
     bicvals _ emclust(chevron, noise=noisevec)
     sumry _ summary(bicvals, chevron)
     plot(chevron, col=ztoc(sumry$z), pch=ztoc(sumry$z))

