fanny                package:cluster                R Documentation

_F_u_z_z_y _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Returns a list representing a fuzzy clustering of the data into
     `k' clusters.

_U_s_a_g_e:

     fanny(x, k, diss = FALSE, metric = "euclidean", stand = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: data matrix or dataframe, or dissimilarity matrix, depending
          on the value of the `diss' argument.

          In case of a matrix or dataframe, each row corresponds to an
          observation, and each column corresponds to a variable. All
          variables must be numeric. Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typically the
          output of `daisy' or `dist'. Also a vector with length
          n*(n-1)/2 is allowed (where n is the number of observations),
          and will be interpreted in the same way as the output of the
          above-mentioned functions. Missing values (NAs) are not
          allowed.

       k: integer, the number of clusters. It is required that 0 < k <
          n/2 where n is the number of observations. 

    diss: logical flag: if TRUE, then `x' will be considered as a
          dissimilarity matrix. If FALSE, then `x' will be considered
          as a matrix of observations by variables.

  metric: character string specifying the metric to be used for
          calculating dissimilarities between observations. The
          currently available options are "euclidean" and "manhattan".
          Euclidean distances are root sum-of-squares of differences,
          and manhattan distances are the sum of absolute differences.
          If `x' is already a dissimilarity matrix, then this argument
          will be ignored.

   stand: logical flag: if TRUE, then the measurements in `x' are
          standardized before calculating the dissimilarities.
          Measurements are standardized for each variable (column), by
          subtracting the variable's mean value and dividing by  the
          variable's mean absolute deviation. If `x' is already a
          dissimilarity matrix, then this argument will be ignored.

_D_e_t_a_i_l_s:

     In a fuzzy clustering, each observation is "spread out" over the
     various clusters. Denote by u(i,v) the membership of observation i
     to cluster v. The memberships are nonnegative, and for a fixed
     observation i they sum to 1. The particular method `fanny' stems
     from chapter 4 of Kaufman and Rousseeuw (1990). Compared to other
     fuzzy clustering methods, `fanny' has the following features: (a)
     it also accepts a dissimilarity matrix; (b) it is more robust to
     the `spherical cluster' assumption; (c) it provides a novel
     graphical display, the silhouette plot (see `plot.partition').

     Fanny aims to minimize the objective function

   SUM_v (SUM_(i,j) u(i,v)^2 u(j,v)^2 d(i,j)) / (2 SUM_j u(j,v)^2)

     where n is the number of observations, k is the number of clusters
     and d(i,j) is the dissimilarity between observations i and j.

_V_a_l_u_e:

     an object of class `"fanny"' representing the clustering. See
     `fanny.object' for details.

_B_A_C_K_G_R_O_U_N_D:

     Cluster analysis divides a dataset into groups (clusters) of
     observations that are similar to each other. Partitioning methods
     like `pam', `clara', and `fanny' require that the number of
     clusters be given by the user. Hierarchical methods like `agnes',
     `diana', and `mona' construct a hierarchy of clusterings, with the
     number of clusters ranging from one to the number of observations.

_R_e_f_e_r_e_n_c_e_s:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups in Data:
     An Introduction to Cluster Analysis. Wiley, New York.

     Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996): Clustering in
     an Object-Oriented Environment. Journal of Statistical Software,
     1. <URL: http://www.stat.ucla.edu/journals/jss/>

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997). Integrating
     Robust Clustering Techniques in S-PLUS, Computational Statistics
     and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o:

     `fanny.object', `daisy', `partition.object', `plot.partition',
     `dist'.

_E_x_a_m_p_l_e_s:

     ## generate 25 objects, divided into two clusters, and 3 objects lying
     ## between those clusters. 
     x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
                cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)),
                cbind(rnorm(3,3.5,0.5), rnorm(3,3.5,0.5)))
     fannyx <- fanny(x, 2)
     fannyx
     summary(fannyx)
     plot(fannyx)

     data(ruspini)
     ## Plot similar to Figure 6 in Stryuf et al (1996)
     plot(fanny(ruspini, 5))

