hdist                  package:sna                  R Documentation

_F_i_n_d _t_h_e _H_a_m_m_i_n_g _D_i_s_t_a_n_c_e_s _B_e_t_w_e_e_n _T_w_o _o_r _M_o_r_e _G_r_a_p_h_s

_D_e_s_c_r_i_p_t_i_o_n:

     `hdist' returns the Hamming distance between the labeled graphs
     `g1' and `g2' in stack `dat' for dichotomous data, or else the
     absolute (manhattan) distance.  If `normalize' is true, this
     distance is divided by its dichotomous theoretical maximum
     (conditional on |V(G)|).

_U_s_a_g_e:

     hdist(dat, dat2=NULL, g1=c(1:dim(dat)[1]), g2=c(1:dim(dat)[1]), 
         normalize=FALSE, diag=FALSE, mode="digraph")

_A_r_g_u_m_e_n_t_s:

     dat: Data array to be analyzed.  By assumption, the first
          dimension of the array indexes the graph, with the next two
          indexing the actors.  This data need not be dichotomous, and
          missing values are allowed. 

    dat2: A second data array (optional) 

      g1: A vector indicating which graphs to compare (by default, all
          elements of `dat') 

      g2: A vector indicating against which the graphs of `g1' should
          be compared (by default, all graphs) 

normalize: Divide by the number of available dyads? 

    diag: Boolean indicating whether or not the diagonal should be
          treated as valid data.  Set this true if and only if the data
          can contain loops.  `diag' is `FALSE' by default. 

    mode: String indicating the type of graph being evaluated. 
          "digraph" indicates that edges should be interpreted as
          directed; "graph" indicates that edges are undirected. 
          `mode' is set to "digraph" by default. 

_D_e_t_a_i_l_s:

     The Hamming distance between two labeled graphs G_1 and G_2 is
     equal to |{e : (e in E(G_1) and e not in E(G_2)) or (e not in
     E(G_1) and e in E(G_2))}|.  In more prosaic terms, this may be
     thought of as the number of addition/deletion operations required
     to turn the edge set of G_1 into that of G_2.  The Hamming
     distance is a highly general measure of structural similarity, and
     forms a metric on the space of graphs (simple or directed).  Users
     should be reminded, however, that the Hamming distance is
     extremely sensitive to nodal labeling, and should not be employed
     directly when nodes are interchangeable.  The structural distance
     (Butts and Carley (2001)), implemented in `structdist', provides a
     natural generalization of the Hamming distance to the more general
     case of unlabeled graphs.

     Null hypothesis testing for Hamming distances is available via
     `cugtest', and `qaptest'; graphs which minimize the Hamming
     distances to all members of a graph set can be found by
     `centralgraph'.  For an alternative means of comparing the
     similarity of graphs, consider `gcor'.

_V_a_l_u_e:

     A matrix of Hamming distances

_N_o_t_e:

     For non-dichotomous data, the distance which is returned is simply
     the sum of the absolute edge-wise differences.

_A_u_t_h_o_r(_s):

     Carter T. Butts ctb@andrew.cmu.edu

_R_e_f_e_r_e_n_c_e_s:

     Banks, D., and Carley, K.M.  (1994).  ``Metric Inference for
     Social Networks.''  Journal of Classification, 11(1), 121-49.

     Butts, C.T., and Carley, K.M.  (2001).  ``Multivariate Methods for
     Interstructural Analysis.'' CASOS working paper, Carnegie Mellon
     University. 

     Hamming, R.W. (1950). ``Error Detecting and Error Correcting
     Codes.'' Bell System Technical Journal, 29, 147-160.

_S_e_e _A_l_s_o:

     `sdmat', `structdist'

_E_x_a_m_p_l_e_s:

     #Get some random graphs
     g<-rgraph(5,5,tprob=runif(5,0,1))

     #Find the Hamming distances
     hdist(g)

