mona                 package:cluster                 R Documentation

_M_o_n_o_t_h_e_t_i_c _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Returns a list representing a divisive hierarchical clustering of
     a dataset with binary variables only.

_U_s_a_g_e:

     mona(x)

_A_r_g_u_m_e_n_t_s:

       x: data matrix or dataframe in which each row corresponds to an
          observation, and each column corresponds to a variable. All
          variables must be binary. A limited number of missing values
          (NAs) is allowed. Every observation must have at least one
          value different from NA. No variable should have half of its
          values missing. There must be at least one variable which has
          no missing values. A variable with all its non-missing values
          identical, is not allowed.

_D_e_t_a_i_l_s:

     `mona' is fully described in chapter 7 of Kaufman and Rousseeuw
     (1990). It is "monothetic" in the sense that each division is
     based on a single (well-chosen) variable, whereas most other
     hierarchical methods (including `agnes' and `diana') are
     "polythetic", i.e. they use all variables together.

     The `mona'-algorithm constructs a hierarchy of clusterings,
     starting with one large cluster. Clusters are divided until all
     observations in the same cluster have identical values for all
     variables. At each stage, all clusters are divided according to
     the values of one variable. A cluster is divided into one cluster
     with all observations having value 1 for that variable, and
     another cluster with all observations having value 0 for that
     variable.

     The variable used for splitting a cluster is the variable with the
     maximal total association to the other variables, according to the
     observations in the cluster to be splitted. The association
     between variables f and g is given by a(f,g)*d(f,g) -
     b(f,g)*c(f,g), where a(f,g), b(f,g), c(f,g), and d(f,g) are the
     numbers in the contingency table of f and g. [That is, a(f,g)
     (resp. d(f,g)) is the number of observations for which f and g
     both have value 0 (resp. value 1); b(f,g) (resp. c(f,g)) is the
     number of observations for which f has value 0 (resp. 1) and g has
     value 1 (resp. 0).] The total association of a variable f is the
     sum of its associations to all variables.

     This algorithm does not work with missing values, therefore the
     data are revised, e.g. all missing values are filled in. To do
     this, the same measure of association between variables is used as
     in the algorithm. When variable f has missing values, the variable
     g with the largest absolute association to f is looked up. When
     the association between f and g is positive, any missing value of
     f is replaced by the value of g for the same observation. If the
     association between f and g is negative, then any missing value of
     f is replaced by the value of 1-g for the same observation.

_V_a_l_u_e:

     an object of class `"mona"' representing the clustering. See
     `mona.object' for details.

_B_A_C_K_G_R_O_U_N_D:

     Cluster analysis divides a dataset into groups (clusters) of
     observations that are similar to each other. Hierarchical methods
     like `agnes', `diana', and `mona' construct a hierarchy of
     clusterings, with the number of clusters ranging from one to the
     number of observations. Partitioning methods like  `pam', `clara',
     and `fanny' require that the number of clusters be given by the
     user.

_R_e_f_e_r_e_n_c_e_s:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups in Data:
     An Introduction to Cluster Analysis. Wiley, New York.

     Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996): Clustering in
     an Object-Oriented Environment. Journal of Statistical Software,
     1. <URL: http://www.stat.ucla.edu/journals/jss/>

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating
     Robust Clustering Techniques in S-PLUS, Computational Statistics
     and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o:

     `mona.object', `plot.mona'.

_E_x_a_m_p_l_e_s:

     data(animals)
     ma <- mona(animals)
     ma
     ## Plot similar to Figure 10 in Struyf et al (1996)
     plot(ma)

