lokerns                package:lokern                R Documentation

_K_e_r_n_e_l _R_e_g_r_e_s_s_i_o_n _S_m_o_o_t_h_i_n_g _w_i_t_h _L_o_c_a_l _P_l_u_g-_i_n _B_a_n_d_w_i_d_t_h

_D_e_s_c_r_i_p_t_i_o_n:

     Nonparametric estimation of regression functions and their
     derivatives with kernel regression estimators and automatically
     adapted local plug-in bandwidth function.

_U_s_a_g_e:

     lokerns(x , y, deriv = 0, n.out=300, x.out=NULL, korder=NULL,
             ihetero=FALSE, irnd=TRUE, inputb=FALSE, m1=400, xl=NULL, xu=NULL,
             s=NULL, sig=NULL, bandwidth=NULL)

_A_r_g_u_m_e_n_t_s:

       x: vector of design points, not necessarily ordered.

       y: vector of observations of the same length as x.

   deriv: order of derivative of the regression function to be
          estimated.  Only deriv=0,1,2 are allowed for automatic
          smoothing, whereas deriv=0,1,2,3,4  is possible when
          smoothing with an input bandwidth array. The default value is
          deriv=0.

   n.out: number of output design points where the function has to be
          estimated; default is `n.out=300'.

   x.out: vector of output design points where the function has to be
          estimated.  The default is an equidistant grid of n.out
          points from min(x) to max(x).

  korder: nonnegative integer giving the kernel order; it defaults to
          `korder = deriv+2' or k = nu + 2 where k - nu must be even. 
          The maximal possible values are for automatic smoothing, k <=
          4, whereas for smoothing with input bandwidth array, k <= 6.  

 ihetero: logical: if TRUE, heteroscedastic error variables are assumed
          for  variance estimation, if FALSE the variance estimation is
          optimized for  homoscedasticity. Default value is
          ihetero=FALSE. 

    irnd: logical: if `TRUE' (default), random x are assumed and the
          s-array of the convolution estimator is computed as smoothed
          quantile estimators in order to adapt this variability.  If
          FALSE, the s-array is choosen as mid-point sequences as the
          classical Gasser-Mueller estimator, this will be better for
          equidistant and fixed design. 

  inputb: logical: if true, a local input bandwidth array is used; if 
          `FALSE' (default), a data-adaptive local plug-in bandwidths
          array is calculated and used. 

      m1: integer, the number of grid points for integral approximation
          when estimating the plug-in bandwidth. The default, 400, may
          be increased if a very large number of observations are
          available. 

  xl, xu: numeric (scalars), the lower and upper bounds for integral
          approximation and variance estimation when estimating the
          plug-in bandwidth. By default (when `xl' and `xu' are not
          specified), the 87% middle part of [xmin,xmax] is used. 

       s: s-array of the convolution kernel estimator. If it is not
          given by input it is calculated as midpoint-sequence of the
          ordered design points for `irnd=FALSE' or as quantiles
          estimators of the design density for `irnd=TRUE'. 

     sig: variance of the error variables.  If it is not given by input
          or if `ihetero=TRUE' (no default) it is calculated by a
          nonparametric variance estimator.

bandwidth: local bandwidth array for kernel regression estimation.  If
          it is not given by input or if `inputb=FALSE' a data-adaptive
          local plug-in bandwidth array is used instead. 

_D_e_t_a_i_l_s:

     This function calls an efficient and fast algorithm for
     automatically adaptive nonparametric regression estimation with a
     kernel method.

     Roughly spoken, the method performs a local averaging of the
     observations when estimating the regression function. Analogously,
     one can estimate derivatives of small order of the regression
     function. Crucial for the kernel regression estimation used here
     is the choice the local bandwidth array. Too small bandwidths will
     lead to a wiggly curve, too large ones will smooth away important
     details.  The function lokerns calculates an estimator of the
     regression function or derivatives of the regression function with
     an automatically chosen local plugin bandwidth function. It is
     also possible to use a local bandwidth array which are specified
     by the user.

     Main ideas of the plugin method are to estimate the optimal
     bandwidths by estimating the asymptotically optimal mean squared
     error optimal bandwidths. Therefore, one has to estimate the
     variance for homoscedastic error variables and a functional of a
     smooth variance function for heteroscedastic error variables,
     respectively. Also, one has to estimate an integral functional of
     the squared k-th derivative of the regression function
     (k=`korder') for the global bandwidth and the squared k-th
     derivative itself for the local bandwidths. 

     Here, a further kernel estimator for this derivative is used with
     a bandwidth which is adapted iteratively to the regression
     function.  A convolution form of the kernel estimator for the
     regression function and its derivatives is used. Thereby one can
     adapt the s-array for random design. Using this estimator leads to
     an asymptotically minimax efficient estimator for fixed and random
     design.  Polynomial kernels and boundary kernels are used with a
     fast and stable updating algorithm for kernel regression
     estimation.

     More details can be found in the references and on  <URL:
     http://www.unizh.ch/biostat/Software/kernsplus.html>.

_V_a_l_u_e:

     a list including used parameters and estimator. 

       x: vector of ordered design points.

       y: vector of observations ordered with respect to x.

bandwidth: local bandwidth array which was used for kernel regression
          estimation.

   x.out: vector of ordered output design points.

     est: vector of estimated regression function or its derivative.

     sig: variance estimation which was used for calculating the
          plug-in bandwidths if ihetero=TRUE (default) and either
          inputb=FALSE (default) or irnd=TRUE (default).

   deriv: derivative of the regression function which was estimated.

  korder: order of the kernel function which was used.

      xl: lower bound for integral approximation and variance
          estimation.

      xu: upper bound for integral approximation and variance
          estimation.

       s: vector of midpoint values used for the convolution kernel
          regression estimator.

_R_e_f_e_r_e_n_c_e_s:

     All the references in `glkerns', and additionally, about local
     plug-in bandwidth estimators: 
     B. Seifert, M. Brockmann, J. Engel, and T. Gasser (1994) Fast
     algorithms for nonparametric curve estimation. J. Computational
     and Graphical Statistics 3, 192-213.

_S_e_e _A_l_s_o:

     `glkerns' for global bandwidth computation.

_E_x_a_m_p_l_e_s:

     data(cars)
     lofit <- lokerns(cars$ speed, cars$ dist)
     (sb <- summary(lofit$bandwidth))
     op <- par(fg = "gray90", tcl = -0.2, mgp = c(3,.5,0))
     plot(lofit$band, ylim=c(0,3*sb["Max."]), type="h",#col="gray90",
          ann = FALSE, axes = FALSE)

     if(R.version$major > 1 || R.version$minor >= 3.0)
     boxplot(lofit$bandwidth, add = TRUE, at = 304, boxwex = 8,
             col = "gray90",border="gray", pars = list(axes = FALSE))
     axis(4, at = c(0,pretty(sb)), col.axis = "gray")
     par(op)
     par(new=TRUE)
     plot(dist ~ speed, data = cars,
          main = "Local Plug-In Bandwidth Vector")
     lines(lofit$x.out, lofit$est, col=4)
     mtext(paste("bandwidth in [",
                 paste(format(sb[c(1,6)], dig = 3),collapse=","),
                 "];  Median b.w.=",formatC(sb["Median"])))

