ssanova                 package:gss                 R Documentation

_F_i_t_t_i_n_g _S_m_o_o_t_h_i_n_g _S_p_l_i_n_e _A_N_O_V_A _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit smoothing spline ANOVA models with cubic spline, linear
     spline, or thin-plate spline marginals for numerical variables. 
     Factors are also accepted.  The symbolic model specification via
     `formula' follows the same rules as in `lm'.

_U_s_a_g_e:

     ssanova(formula, type="cubic", data=list(), weights, subset,
             offset, na.action=na.omit, partial=NULL, method="v",
             varht=1, prec=1e-7, maxiter=30, ext=.05, order=2)

_A_r_g_u_m_e_n_t_s:

 formula: Symbolic description of the model to be fit.

    type: Type of numerical marginals to be used.  Supported are
          `type="cubic"' for cubic spline marginals, `type="linear"'
          for linear spline marginals, and `type="tp"' for thin-plate
          spline marginals.

    data: Optional data frame containing the variables in the model.

 weights: Optional vector of weights to be used in the fitting process.

  subset: Optional vector specifying a subset of observations to be
          used in the fitting process.

  offset: Optional offset term with known parameter 1.

na.action: Function which indicates what should happen when the data
          contain NAs.

 partial: Optional extra fixed effect terms in partial spline models.

  method: Method for smoothing parameter selection.  Supported are
          `method="v"' for GCV, `method="m"' for GML (REML), and
          `method="u"' for Mallow's CL.

   varht: External variance estimate needed for `method="u"'.  Ignored
          when `method="v"' or `method="m"' are specified.

    prec: Precision requirement in the iteration for multiple smoothing
          parameter selection.  Ignored when only one smoothing
          parameter is involved.

 maxiter: Maximum number of iterations allowed for multiple smoothing
          parameter selection.  Ignored when only one smoothing
          parameter is involved.

     ext: For cubic spline and linear spline marginals, this option
          specifies how far to extend the domain beyond the minimum and
          the maximum as a percentage of the range.  The default
          `ext=.05' specifies marginal domains of lengths 110 percent
          of their respective ranges.  Prediction outside of the domain
          will result in an error.  Ignored if `type="tp"' is
          specified.

   order: For thin-plate spline marginals, this option specifies the
          order of the marginal penalties.  Ignored if `type="cubic"'
          or `type="linear"' are specified.

_D_e_t_a_i_l_s:

     `ssanova' and the affiliated `methods' provide a front end to
     RKPACK, a collection of RATFOR routines for structural
     multivariate nonparametric regression via the penalized least
     squares method.  The algorithms implemented in RKPACK are of the
     orders O(n^3) in execution time and O(n^2) in memory requirement. 
     The constants in front of the orders vary with the complexity of
     the model to be fit.

     The model specification via `formula' is intuitive.  For example,
     `y~x1*x2' yields a model of the form

          y = c + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

     with the terms denoted by `"1"', `"x1"', `"x2"', and `"x1:x2"'. 
     Through the specifications of the side conditions, these terms are
     uniquely defined.  In the current implementation, f_{1} and f_{12}
     integrate to 0 on the `x1' domain for cubic spline and linear
     spline marginals, and add to 0 over the `x1' (marginal) sampling
     points for thin-plate spline marginals.

     The penalized least squares problem is equivalent to a certain
     empirical Bayes model or a mixed effect model, and the model terms
     themselves are generally sums of finer terms of two types, the
     unpenalized fixed effects and the penalized random effects. 
     Attached to every random effect there is a smoothing parameter,
     and the model complexity is largely determined by the number of
     smoothing parameters.

     The method `predict' can be used to evaluate the sum of selected
     or all model terms at arbitrary points within the domain, along
     with standard errors derived from a certain Bayesian calculation. 
     The method `summary' has a flag to request diagnostics for the
     practical identifiability and significance of the model terms.

_V_a_l_u_e:

     `ssanova' returns a list object of `class "ssanova"'.

     The method `summary' is used to obtain summaries of the fits.  The
     method `predict' can be used to evaluate the fits at arbitrary
     points, along with the standard errors to be used in Bayesian
     confidence intervals.  The methods `residuals' and `fitted.values'
     extract the respective traits from the fits.

_F_a_c_t_o_r_s:

     Factors are accepted as predictors.  When a factor has 3 or more
     levels, all terms involving it are treated as random effects, with
     the "level means" being shrunk towards each other.  The shrinking
     is done differently for nominal and ordinal factors; see
     `mkrk.factor' for details.

_N_o_t_e:

     The independent variables appearing in `formula' can be
     multivariate themselves.  In particular,
     `ssanova(y~x,"tp",order=order)' can be used to fit ordinary
     thin-plate splines in any dimension, of any order permissible, and
     with standard errors available for Bayesian confidence intervals.
     Note that thin-plate splines reduce to polynomial splines in one
     dimension.

     For univariate marginals, the additive models using `type="cubic"'
     and `type="tp"' yield identical fit through different internal
     makes.  For example, `ssanova(y~x1+x2,"cubic")' and
     `ssanova(y~x1+x2,"tp")' yield the same fit.  The same is not true
     for models with interactions, however.

     Mathematically, the domain (through `ext' for `type="cubic"') or
     the order (through `order' for `type="tp"') could be specified
     individually for each of the variables.  Such flexibility is not
     provided in our implementation, however, as it would be more a
     source for confusion than a practical utility.

_A_u_t_h_o_r(_s):

     Chong Gu, chong@stat.purdue.edu

_R_e_f_e_r_e_n_c_e_s:

     Wahba, G. (1990), Spline Models for Observational Data.
     Philadelphia: SIAM.

     Gu, C. (2000), Multivariate spline regression. In M.~G. Schimek
     (Ed.), Smoothing and Regression: Approaches, Computation and
     Application.  New York: Wiley.

_S_e_e _A_l_s_o:

     Methods `predict.ssanova', `summary.ssanova', and
     `fitted.ssanova'.

_E_x_a_m_p_l_e_s:

     ## Fit a cubic spline
     x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
     cubic.fit <- ssanova(y~x,method="m")
     ## The same fit with different internal makes
     tp.fit <- ssanova(y~x,"tp",method="m")
     ## Obtain estimates and standard errors on a grid
     new <- data.frame(x=seq(min(x),max(x),len=50))
     est <- predict(cubic.fit,new,se=TRUE)
     ## Plot the fit and the Bayesian confidence intervals
     plot(x,y,col=1); lines(new$x,est$fit,col=2)
     lines(new$x,est$fit+1.96*est$se,col=3)
     lines(new$x,est$fit-1.96*est$se,col=3)

     data(nox)
     ## Fit a tensor product cubic spline
     nox.fit <- ssanova(log10(nox)~comp*equi,data=nox)
     ## Fit a spline with cubic and nominal marginals
     nox$comp<-as.factor(nox$comp)
     nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox)
     ## Fit a spline with cubic and ordinal marginals
     nox$comp<-as.ordered(nox$comp)
     nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox)

