diana                package:cluster                R Documentation

_D_i_v_i_s_i_v_e _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Returns a list representing a divisive hierarchical clustering of
     the dataset.

_U_s_a_g_e:

     diana(x, diss = FALSE, metric = "euclidean", stand = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: data matrix or dataframe, or dissimilarity matrix, depending
          on the value of the `diss' argument.

          In case of a matrix or dataframe, each row corresponds to an
          observation, and each column corresponds to a variable. All
          variables must be numeric. Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typically the
          output of `daisy' or `dist'. Also a vector with length
          n*(n-1)/2 is allowed (where n is the number of observations),
          and will be interpreted in the same way as the output of the
          above-mentioned functions. Missing values (NAs) are not
          allowed.

    diss: logical flag: if TRUE, then `x' will be considered as a
          dissimilarity matrix. If FALSE, then `x' will be considered
          as a matrix of observations by variables.

  metric: character string specifying the metric to be used for
          calculating dissimilarities between observations. The
          currently available options are "euclidean" and "manhattan".
          Euclidean distances are root sum-of-squares of differences,
          and manhattan distances are the sum of absolute differences.
          If `x' is already a dissimilarity matrix, then this argument
          will be ignored.

   stand: logical flag: if TRUE, then the measurements in `x' are
          standardized before calculating the dissimilarities.
          Measurements are standardized for each variable (column), by
          subtracting the variable's mean value and dividing by  the
          variable's mean absolute deviation. If `x' is already a
          dissimilarity matrix, then this argument will be ignored.

_D_e_t_a_i_l_s:

     `diana' is fully described in chapter 6 of Kaufman and Rousseeuw
     (1990). It is probably unique in computing a divisive hierarchy,
     whereas most other software for hierarchical clustering is
     agglomerative. Moreover, `diana' provides (a) the divisive
     coefficient (see `diana.object') which measures the amount of
     clustering structure found; and (b) the banner, a novel graphical
     display (see `plot.diana').

     The `diana'-algorithm constructs a hierarchy of clusterings,
     starting with one large cluster containing all n observations.
     Clusters are divided until each cluster contains only a single
     observation. At each stage, the cluster with the largest diameter
     is selected. (The diameter of a cluster is the largest
     dissimilarity between any two of its observations.) To divide the
     selected cluster, the algorithm first looks for its most disparate
     observation (i.e., which has the largest average dissimilarity to
     the other observations of the selected cluster). This observation
     initiates the "splinter group". In subsequent steps, the algorithm
     reassigns observations that are closer to the "splinter group"
     than to the "old party". The result is a division of the selected
     cluster into two new clusters.

_V_a_l_u_e:

     an object of class `"diana"' representing the clustering. See
     diana.object for details.

_B_A_C_K_G_R_O_U_N_D:

     Cluster analysis divides a dataset into groups (clusters) of
     observations that are similar to each other. Hierarchical methods
     like `agnes', `diana', and `mona' construct a hierarchy of
     clusterings, with the number of clusters ranging from one to the
     number of observations. Partitioning methods like  `pam', `clara',
     and `fanny' require that the number of clusters be given by the
     user.

_R_e_f_e_r_e_n_c_e_s:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups in Data:
     An Introduction to Cluster Analysis.  Wiley, New York.

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating
     Robust  Clustering Techniques in S-PLUS, Computational Statistics
     and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o:

     `agnes', `diana.object', `daisy', `dist', `plot.diana',
     `twins.object'.

_E_x_a_m_p_l_e_s:

     data(votes.repub)
     dv <- diana(votes.repub, metric = "manhattan", stand = TRUE)
     print(dv)
     plot(dv)

     data(agriculture)
     ## Plot similar to Figure 8 in ref
     plot(diana(agriculture), ask = TRUE)

