binning                  package:sm                  R Documentation

_C_o_n_s_t_r_u_c_t _f_r_e_q_u_e_n_c_y _t_a_b_l_e _f_r_o_m _r_a_w _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Given a vector or a matrix `x', this function constructs a
     frequency table associated to appropriate intervals covering the
     range of `x'.

_U_s_a_g_e:

     binning(x, y, breaks, nbins)

_A_r_g_u_m_e_n_t_s:

    x, y: a vector or a matrix with either one or two columns.  If `x'
          is a one-dimentional matrix, this is equivalent to a vector. 

  breaks: either a vector or a matrix with two columns (depending on
          the dimension of `x'), assigning the division points of the
          axis, or the axes in the matrix case. It must not include
          `Inf',`-Inf' or `NA's, and it must span the whole range of 
          the `x' points. If `breaks' is not given, it is computed by
          dividing the range of `x' into `nbins' intervals for each of
          the axes. 

   nbins: the number of intervals on the `x' axis (in the vector case),
           or a vector of two elements with the number of intervals on
          each axes of `x' (in the matrix case). If `nbins' is not
          given, a value is computed as
          `round(log(length(x))/log(2)+1)' or using a similar
          expression in the matrix case. 

_D_e_t_a_i_l_s:

     This function is called automatically (under the default settings)
     by some of the functions of the `sm' library when the sample size
     is large, to allow handling of datasets of essentially unlimited
     size. Specifically, it is used by `sm.density', `sm.regression',
     `sm.ancova', `sm.binomial' and `sm.poisson'.

_V_a_l_u_e:

     In the vector case, a list is returned containing the following
     elements: a vector `x' of the midpoints of the bins excluding
     those with 0 frequecies,  its associated matrix `x.freq' of
     frequencies, the coodinateds of the  `midpoints', the division
     points, and the complete vector of observed  frequencies
     `freq.table' (including the 0 frequencies), and the vector
     `breaks' of division points. In the matrix case, the returned
     value is a list with the following  elements: a two-dimensional
     matrix `x' with the coordinates of the midpoints of the
     two-dimensional bins excluding those with 0 frequecies,  its
     associated matrix `x.freq' of frequencies, the coordinates of the 
     `midpoints', the matrix `breaks' of division points, and the
     observed  frequencies `freq.table' in full tabular form.

_R_e_f_e_r_e_n_c_e_s:

     Bowman, A.W. and Azzalini, A. (1997).  Applied Smoothing
     Techniques for Data Analysis: the Kernel Approach with S-Plus
     Illustrations. Oxford University Press, Oxford.

_S_e_e _A_l_s_o:

     `sm', `sm.density', `sm.regression', `sm.binomial', `sm.poisson',
     `cut', `table'

_E_x_a_m_p_l_e_s:

     # example of 1-d use
     x  <- rnorm(1000)
     xb <- binning(x)
     xb <- binning(x, breaks=seq(-4,4,by=0.5))
     # example of 2-d use
     x <- rnorm(1000)
     y <- 2*x + 0.5*rnorm(1000)
     x <- cbind(x, y)
     xb<- binning(x, nbins=12)

