gscor                  package:sna                  R Documentation

_F_i_n_d _t_h_e _S_t_r_u_c_t_u_r_a_l _C_o_r_r_e_l_a_t_i_o_n_s _B_e_t_w_e_e_n _T_w_o _o_r _M_o_r_e _G_r_a_p_h_s

_D_e_s_c_r_i_p_t_i_o_n:

     `gscor' finds the product-moment structural correlation between
     the adjacency matrices of graphs indicated by `g1' and `g2' in
     stack `dat' (or possibly `dat2') given exchangeability list
     `exchange.list'.  Missing values are permitted.

_U_s_a_g_e:

     gscor(dat, dat2=NULL, g1=c(1:dim(dat)[1]), g2=c(1:dim(dat)[1]), 
         diag=FALSE, mode="digraph", method="anneal", reps=1000, 
         prob.init=0.9, prob.decay=0.85, freeze.time=25, 
         full.neighborhood=TRUE, exchange.list=rep(0, dim(dat)[2]))

_A_r_g_u_m_e_n_t_s:

     dat: A graph stack 

    dat2: Optionally, a second graph stack 

      g1: The indices of `dat' reflecting the first set of graphs to be
          compared; by default, all members of `dat' are included 

      g2: The indices or `dat' (or `dat2', if applicable) reflecting
          the second set of graphs to be compared; by default, all
          members of `dat' are included. 

    diag: Boolean indicating whether or not the diagonal should be
          treated as valid data.  Set this true if and only if the data
          can contain loops.  `diag' is `FALSE' by default. 

    mode: String indicating the type of graph being evaluated. 
          "digraph" indicates that edges should be interpreted as
          directed; "graph" indicates that edges are undirected. 
          `mode' is set to "digraph" by default. 

  method: Method to be used to search the space of accessible
          permutations; must be one of ``none'', ``exhaustive'',
          ``anneal'', ``hillclimb'', or ``mc''.  

    reps: Number of iterations for monte carlo method. 

prob.init: Initial acceptance probability for the annealing routine. 

prob.decay: Cooling multiplier for the annealing routine. 

freeze.time: Freeze time for the annealing routine. 

full.neighborhood: Should the annealer evaluate the full neighborhood
          of pair exchanges at each iteration? 

exchange.list: Information on which vertices are exchangeable (see
          below); this must be a single number, a vector of length n,
          or a nx2 matrix. 

_D_e_t_a_i_l_s:

     The structural correlation coefficient between two graphs G and H
     is defined as

          scor(G,H | L_G,L_H) = max_[L_G,L_H] cor(l(G),l(H))

     where L_G is the set of accessible permutations/labelings of G,
     l(G) is a permutation/relabeling of G, and l(G) in L_G.  The set
     of accessible permutations on a given graph is determined by the
     theoretical exchangeability of its vertices; in a  nutshell, two
     vertices are considered to be theoretically exchangeable for a
     given problem if all predictions under the conditioning theory are
     invariant to a relabeling of the vertices in question (see Butts
     and Carley (2001) for a more formal exposition).  Where no
     vertices are exchangeable, the structural correlation becomes the
     simple graph correlation.  Where all vertices are exchangeable,
     the structural correlation reflects the correlation between
     unlabeled graphs; other cases correspond to correlation under
     partial labeling.  

     The accessible permutation set is determined by the
     `exchange.list' argument, which is dealt with in the following
     manner. First, `exchange.list' is expanded to fill an nx2 matrix. 
     If `exchange.list' is a single number, this is trivially
     accomplished by replication; if `exchange.list' is a vector of
     length n, the matrix is formed by cbinding two copies together. 
     If `exchange.list' is already an nx2 matrix, it is left as-is. 
     Once the nx2 exchangeability matrix has been formed, it is
     interpreted as follows: columns refer to graphs 1 and 2,
     respectively; rows refer to their corresponding vertices in the
     original adjacency matrices; and vertices are taken to be
     theoretically exchangeable iff their corresponding exchangeability
     matrix values are identical.  To obtain an unlabeled graph
     correlation (the default), then, one could simply let
     `exchange.list' equal any single number.  To obtain the standard
     graph correlation, one would use the vector `1:n'.

     Because the set of accessible permutations is, in general, very
     large (o(n!)), searching the set for the maximum correlation is a
     non-trivial affair.  Currently supported methods for estimating
     the structural correlation are hill climbing, simulated annealing,
     blind monte carlo search, or exhaustive search (it is also
     possible to turn off searching entirely).  Exhaustive search is
     not recommended for graphs larger than size 8 or so, and even this
     may take days; still, this is a valid alternative for small
     graphs.  Blind monte carlo search and hill climbing tend to be
     suboptimal for this problem and are not, in general recommended,
     but they are available if desired.  The preferred (and default)
     option for permutation search is simulated annealing, which seems
     to work well on this problem (though some tinkering with the
     annealing parameters may be needed in order to get optimal
     performance).  See the help for `lab.optimize' for more
     information regarding these options.

     Structural correlation matrices are p.s.d., and are p.d. so long
     as no graph within the set is a linear combination of any other
     under any accessible permutation.  Their eigendecompositions are
     meaningful and they may be used in linear subspace analyses, so
     long as the researcher is careful to interpret the results in
     terms of the appropriate set of accessible labelings.  Classical
     null hypothesis tests should not be employed with structural
     correlations, and QAP tests are almost never appropriate (save in
     the uniquely labeled case).  See `cugtest' for a more reasonable
     alternative.

_V_a_l_u_e:

     An estimate of the structural correlation matrix

_W_a_r_n_i_n_g:

     The search process can be very slow, particularly for large
     graphs.  In particular, the exhaustive method is order factorial,
     and will take approximately forever for unlabeled graphs of size
     greater than about 7-9.

_N_o_t_e:

     Consult Butts and Carley (2001) for advice and examples on
     theoretical exchangeability.

_A_u_t_h_o_r(_s):

     Carter T. Butts ctb@andrew.cmu.edu

_R_e_f_e_r_e_n_c_e_s:

     Butts, C.T., and Carley, K.M.  (2001).  ``Multivariate Methods for
     Interstructural Analysis.''  CASOS Working Paper, Carnegie Mellon
     University.

_S_e_e _A_l_s_o:

     `gscov', `gcor', `gcov'

_E_x_a_m_p_l_e_s:

     #Generate two random graphs
     g.1<-rgraph(5)
     g.2<-rgraph(5)

     #Copy one of the graphs and permute it
     perm<-sample(1:5)
     g.3<-g.2[perm,perm]

     #What are the structural correlations between the labeled graphs?
     gscor(g.1,g.2,exchange.list=1:5)
     gscor(g.1,g.3,exchange.list=1:5)
     gscor(g.2,g.3,exchange.list=1:5)

     #What are the structural correlations between the underlying 
     #unlabeled graphs?
     gscor(g.1,g.2)
     gscor(g.1,g.3)
     gscor(g.2,g.3)

