pca                  package:multiv                  R Documentation

_P_r_i_n_c_i_p_a_l _C_o_m_p_o_n_e_n_t_s _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Finds a new coordinate system for multivariate data such that the
     first coordinate has maximal variance, the second coordinate has
     maximal variance subject to being orthogonal to the first, etc.

_U_s_a_g_e:

     pca(a, method=3)

_A_r_g_u_m_e_n_t_s:

       a: data matrix to be decomposed, the rows representing
          observations and the columns variables.  Missing values are
          not supported.

  method: integer taking values between 1 and 8.  `method' = 1 implies
          no   transformation of data matrix.  Hence the singular value
          decomposition (SVD)  is carried out on a sums of squares and 
          cross-products matrix.  `method' = 2 implies that the
          observations are centered to zero mean.  Hence the SVD is
          carried out on a variance-covariance matrix. `method' = 3
          (default) implies that the observations are centered to zero 
          mean, and additionally reduced to unit standard deviation. 
          In this case the  observations are standardized.  Hence the
          SVD is carried out on a correlation matrix. `method' = 4
          implies that the observations are normalized by being 
          range-divided, and then the variance-covariance matrix is
          used.  `method' = 5 implies that the SVD is carried out on a
          Kendall (rank-order) correlation matrix.  `method' = 6
          implies that the SVD is carried out on a Spearman 
          (rank-order) correlation matrix.  `method' = 7 implies that
          the SVD is carried out on the sample covariance matrix. 
          `method' = 8 implies that the SVD is carried out on the
          sample correlation matrix. 

_V_a_l_u_e:

     list describing the principal components analysis:

   rproj: projections of row points on the new axes.

   cproj: projections of column points on the new axes.

   evals: eigenvalues associated with the new axes.  These provide
          figures of merit for the `variance explained' by the new
          axes. They are usually quoted in terms of percentage of the
          total, or in terms of cumulative percentage of the total.

   evecs: eigenvectors associated with the new axes.  This orthogonal
          matrix describes the rotation.  The first column is the
          linear combination of columns of `a' defining the first
          principal component, etc.

_S_i_d_e _E_f_f_e_c_t_s:

     When carrying out a PCA of a hierarchy object, the partition is
     specified bt `lev'.  The level plus the associated number of
     groups equals the number of observations, at all times.

_N_o_t_e:

     In the case of `method' = 3, if any column point has zero standard
     deviation, then a value of 1 is substituted for the standard
     deviation.

     Up to 7 principal axes are determined.  The inherent
     dimensionality of either of the dual spaces is ordinarily
     `min(n,m)' where `n' and `m' are respectively the numbers of rows
     and columns of `a'.  The centering transformation which is part of
     `method's 2 and 3 introduces a linear dependency causing the
     inherent dimensionality to be `min(n-1,m)'.  Hence the number of
     columns returned in `rproj', `cproj', and `evecs' will be the
     lesser of this inherent  dimensionality and 7.

     In the case of `methods' 1 to 4, very small negative eigenvalues,
     if they  arise, are an artifact of the SVD algorithm used, and may
     be treated as zero.  In the case of PCA using rank-order
     correlations (`methods' 5 and 6), negative eigenvalues indicate
     that a Euclidean representation of the data is not possible.  The
     approximate Euclidean representation given by the axes  associated
     with the positive eigenvalues can often be quite adequate for
     practical interpretation of the data.

     Routine `prcomp' is identical, to within small numerical precision
     differences, to `method' = 7 here.  The examples below show how to
     transform the outputs of the present implementation (`pca') onto 
     outputs of `prcomp'.

     Note that a very large number of columns in the input data matrix
     will cause dynamic memory problems: the matrix to be diagonalized
     requires O(m^2) storage where m is the number of variables.

_M_e_t_h_o_d:

     A singular value decomposition is carried out.

_B_a_c_k_g_r_o_u_n_d:

     Principal components analysis defines the axis which provides the
     best fit to both the row points and the column points.  A second
     axis is determined which best fits the data subject to being
     orthogonal to the first.  Third and subsequent axes are similarly
     found.  Best fit is in the least squares sense.  The criterion
     which optimizes the fit of the axes to the points is, by virtue of
     Pythagoras' theorem, simultaneously a criterion which optimizes
     the variance of projections on the axes.

     Principal components analysis is often used as a data reduction
     technique.  In the pattern recognition field, it is often termed
     the Karhunen-Loeve expansion since the data matrix `a' may be
     written as a series expansion using the eigenvectors and
     eigenvalues found.

_R_e_f_e_r_e_n_c_e_s:

     Many multivariate statistics and data analysis books include a
     discussion of principal components analysis.  Below are a few
     examples: 

     C. Chatfield and A.J. Collins, `Introduction to Multivariate
     Analysis', Chapman and Hall, 1980 (a good, all-round
     introduction); 

     M. Kendall, `Multivariate Analysis', Griffin, 1980 (dated in
     relation to computing techniques, but exceptionally clear and
     concise in the treatment of many practical aspects); 

     F.H.C. Marriott, `The Interpretation of Multiple Observations',
     Academic, 1974 (a short, very readable textbook); 

     L. Lebart, A. Morineau, and K.M. Warwick, `Multivariate
     Descriptive Statistical Analysis', Wiley, 1984 (an excellent
     geometric treatment of PCA); 

     I.T. Joliffe, `Principal Component Analysis', Springer, 1980.

_S_e_e _A_l_s_o:

     `svd', `prcomp', `cancor'.

_E_x_a_m_p_l_e_s:

     data(iris)
     iris <- as.matrix(iris[,1:4])
     pcprim <- pca(iris)
     # plot of first and second principal components
     plot(pcprim$rproj[,1], pcprim$rproj[,2])
     # variance explained by the principal components
     pcprim$evals*100.0/sum(pcprim$evals)
     # In the implementation of the S function `prcomp', different results are
     # produced.  Here is how to obtain these results, using the function `pca'.
     library(mva)
     # Consider the following result of `prcomp':
     old <- prcomp(iris)
     # With `pca', one would do the following:
     new <- pca(iris, method=7)

