agnes                package:cluster                R Documentation

_A_g_g_l_o_m_e_r_a_t_i_v_e _N_e_s_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Computes agglomerative hierarchical clustering of the dataset.

_U_s_a_g_e:

     agnes(x, diss = FALSE, metric = "euclidean", stand = FALSE, method = "average")

_A_r_g_u_m_e_n_t_s:

       x: data matrix or dataframe, or dissimilarity matrix, depending
          on the value of the `diss' argument.

          In case of a matrix or dataframe, each row corresponds to an
          observation, and each column corresponds to a variable. All
          variables must be numeric. Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typically the
          output of `daisy' or `dist'. Also a vector with length
          n*(n-1)/2 is allowed (where n is the number of observations),
          and will be interpreted in the same way as the output of the
          above-mentioned functions. Missing values (NAs) are not
          allowed.  

    diss: logical flag: if TRUE, then `x' will be considered as a
          dissimilarity matrix.  If FALSE, then `x' will be considered
          as a matrix of observations by variables. 

  metric: character string specifying the metric to be used for
          calculating dissimilarities between observations. The
          currently available options are "euclidean" and "manhattan".
          Euclidean distances are root sum-of-squares of differences,
          and manhattan distances are the sum of absolute differences.
          If `x' is already a dissimilarity matrix, then this argument
          will be ignored. 

   stand: logical flag: if TRUE, then the measurements in `x' are
          standardized before calculating the dissimilarities.
          Measurements are standardized for each variable (column), by
          subtracting the variable's mean value and dividing by the
          variable's mean absolute deviation.  If `x' is already a
          dissimilarity matrix, then this argument will be ignored. 

  method: character string defining the clustering method. The five
          methods implemented are "average" (group average method), 
          "single" (single linkage), "complete" (complete linkage), 
          "ward" (Ward's method), and "weighted" (weighted average
          linkage). Default is "average". 

_D_e_t_a_i_l_s:

     `agnes' is fully described in chapter 5 of Kaufman and Rousseeuw
     (1990). Compared to other agglomerative clustering methods such as
     `hclust',  `agnes' has the following features: (a) it yields the
     agglomerative coefficient (see `agnes.object') which measures the
     amount of clustering structure found; and (b) apart from the usual
     tree it also provides the banner, a novel graphical display (see
     `plot.agnes').

     The `agnes'-algorithm constructs a hierarchy of clusterings. At
     first, each observation is a small cluster by itself. Clusters are
     merged until only one large cluster remains which contains all the
     observations. At each stage the two "nearest" clusters are
     combined to form one larger cluster. For `method'="average", the
     distance between two clusters is the average of the
     dissimilarities between the points in one cluster and the points
     in the other cluster. In `method'="single", we use the smallest
     dissimilarity between a point in the first cluster and a point in
     the second cluster (nearest neighbor method).  When
     `method'="complete", we use the largest dissimilarity between a
     point in the first cluster and a point in the second cluster
     (furthest neighbor method).

_V_a_l_u_e:

     an object of class `"agnes"' representing the clustering. See
     `agnes.object' for details.

_B_A_C_K_G_R_O_U_N_D:

     Cluster analysis divides a dataset into groups (clusters) of
     observations that are similar to each other. Hierarchical methods
     like `agnes', `diana', and `mona' construct a hierarchy of
     clusterings, with the number of clusters ranging from one to the
     number of observations. Partitioning methods like `pam', `clara',
     and `fanny' require that the number of clusters be given by the
     user.

_R_e_f_e_r_e_n_c_e_s:

     Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: An
     Introduction to Cluster Analysis. Wiley, New York.

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997) Integrating
     Robust Clustering Techniques in S-PLUS, Computational Statistics
     and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o:

     `agnes.object', `daisy', `diana', `dist', `hclust', `plot.agnes', 
     `twins.object'.

_E_x_a_m_p_l_e_s:

     data(votes.repub)
     agn1 <- agnes(votes.repub, metric = "manhattan", stand = TRUE)
     agn1
     plot(agn1)

     agn2 <- agnes(daisy(votes.repub), diss = TRUE, method = "complete")
     plot(agn2)

     data(agriculture)
     ## Plot similar to Figure 7 in ref
      plot(agnes(agriculture), ask = TRUE)

