gam                   package:mgcv                   R Documentation

_G_e_n_e_r_a_l_i_z_e_d _A_d_d_i_t_i_v_e _M_o_d_e_l_s _u_s_i_n_g _p_e_n_a_l_i_z_e_d _r_e_g_r_e_s_s_i_o_n _s_p_l_i_n_e_s _a_n_d 
_G_C_V

_D_e_s_c_r_i_p_t_i_o_n:

     Fits the specified  generalized additive model to data. The GAM is
     represented  using one dimensional penalized regression splines
     with smoothing parameters selected by GCV (and/or ordinary
     regression  splines with fixed degrees of freedom).

_U_s_a_g_e:

      gam(formula,family=gaussian(),data=list(),weights=NULL,control=gam.control,scale=0)

_A_r_g_u_m_e_n_t_s:

 formula: A GAM formula. This is exactly like the formula for a glm
          except that smooth terms can be added to the right hand side
          of the formula (and a formula of the form `y ~ .' is not
          allowed). Smooth terms are specified by expressions of the
          form: `s(var,knots)' where `var' is the covariate which the
          smooth is a function of and `knots' is the number of knots
          for the spline representing this smooth. `knots' must be a
          number, and not a variable (i.e 10, 45, 13 are all ok, `n' is
          not). If the number of knots is not specified then 10 are
          used. 

          The formula may also include terms like  `s(x,12|f)', which
          specifies a regression spline which is not to be penalized
          and has 12 knots. Such regression splines obviously have a
          fixed number of degrees of freedom  (11 in this example).

  family: This is a family object specifying the distribution and link
          to use on fitting etc. See `glm' and `family' for more
          details.  

    data: A data frame containing the model response variable and
          covariates required by the formula. If this is missing then
          the frame from which `gam' was called is searched for the
          variables specified in the formula.

 weights: prior weights on the data. 

 control: A list as returned by `gam.control', with three user
          controllable elements: `maxit' controls maximum iterations,
          convergence tolerance is controlled by `epsilon'   and the
          third item is `trace'.

   scale: If this is zero then GCV is used for all distributions except
          Poisson and binomial where UBRE is used with scale parameter
          assumed to be 1. If this is greater than 1 it is assumed to
          be the scale parameter/variance and UBRE is used. If `scale'
          is negative  GCV  is always used (for binomial models in
          particular, it is probably worth  comparing UBRE and GCV
          results; for ``over-dispersed Poisson'' GCV is probably more
          appropriate than UBRE.)

_D_e_t_a_i_l_s:

     Each smooth model term is represented using a cubic penalized
     regression spline, or optionally an unpenalized regression spline.
      Knots  of the spline are placed evenly throughout the covariate
     values to which the term refers:  For example, if fitting 101 data
     with an 11 knot spline of `x' then there would be a knot at every
     10th (ordered)  `x' value. The use of penalized regression splines
     turns  the gam fitting problem into  a penalized glm fitting
     problem, which can be fitted using a slight modification of 
     `glm.fit' : `gam.fit'.  The penalized glm approach also allows
     smoothing parameters for all smooth terms to be  selected 
     simultaneously by GCV or UBRE. This is achieved as part of fitting
     by calling `mgcv'  within `gam.fit'.

     The parameterization used represents the spline in terms of its
     values at the knots. Connection of these values at neighbouring
     knots by sections of  cubic polynomial constrainted to join at the
     knots so as to be  continuous up to and including second
     derivative yields a natural cubic  spline through the values at
     the knots (given two extra conditions specifying  that the second
     derivative of the curve should be zero at the two end  knots).
     Other parameterizations, such as b-splines or the basis that 
     arises naturally from r.k.h.s. representation of the spline
     smoothing  problem are equivalent, but the basis used here has the
     advantage that  the parameters of the each spline term are easily
     interpretable.        

     Details of the GCV/UBRE minimization method are given in Wood
     (2000).

_V_a_l_u_e:

     The function returns an object of class `"gam"' which has the
     following elements: 

coefficients: the coefficients of the fitted model. Parametric
          coefficients are  first, followed  by coefficients for each
          spline term in turn.

residuals: the deviance residuals for the fitted model.

fitted.values: fitted model predictions of expected value for each
          datum.

  family: family object specifying distribution and link used. 

linear.predictor: fitted model prediction of link function of expected
          value for  each datum.

deviance: (unpenalized)

null.deviance: deviance for single parameter model.

 df.null: null degrees of freedom

    iter: number of iterations of IRLS taken to get convergence.

 weights: final weights used in IRLS iteration.

prior.weights: prior weights on observations.

       y: response data.

converged: indicates whether or not the iterative fitting method
          converged.

    sig2: estimated or supplied variance/scale parameter.

     edf: estimated degrees of freedom for each smooth.

boundary: did parameters end up at boundary of parameter space?

      sp: smoothing parameter for each smooth.

      df: number of knots for each smooth (one more than maximum
          degrees of freedom).

    nsdf: number of parametric, non-smooth, model terms including the
          intercept.

      Vp: estimated covariance matrix for parameters.

      xp: knot locations for each smooth. `xp[i,]' are the locations
          for the ith smooth.

 formula: the model formula.

       x: parametric design matrix columns (including intercept)
          followed by the data that form arguments of the smooths.

    call: a mode `call' object containing the call to `gam()' that
          produced  this `gam' object (useful for constructing model
          frames).

_W_A_R_N_I_N_G_S:

     The code does not check for rank defficiency of the model matrix
     -it will likely just fail instead!

     You must have more unique combinations of covariates than the
     model has total parameters. (Total parameters is sum of knots plus
     sum of non-spline terms less the number of spline terms). 

     Automatic smoothing parameter selection is not likely to work well
     when  fitting models to very few response data.

_A_u_t_h_o_r(_s):

     Simon N. Wood snw@st-and.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Gu and Wahba (1991) Minimizing GCV/GML scores with multiple
     smoothing parameters via the Newton method. SIAM J. Sci. Statist.
     Comput. 12:383-398

     Wood (2000) Modelling and Smoothing Parameter Estimation  with
     Multiple  Quadratic Penalties. JRSSB 62(2):413-428

     <URL: http://www.ruwpa.st-and.ac.uk/simon.html>

_S_e_e _A_l_s_o:

     `predict.gam' `plot.gam'

_E_x_a_m_p_l_e_s:

     library(mgcv)
     n<-200
     sig2<-4
     x0 <- runif(n, 0, 1)
     x1 <- runif(n, 0, 1)
     x2 <- runif(n, 0, 1)
     x3 <- runif(n, 0, 1)
     pi <- asin(1) * 2
     f <- 2 * sin(pi * x0)
     f <- f + exp(2 * x1) - 3.75887
     f <- f + 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 * (1 - x2)^10 - 1.396
     e <- rnorm(n, 0, sqrt(abs(sig2)))
     y <- f + e
     b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3))
     plot(b,pages=1) 
     # now fit GAM with 3df regression spline term and two penalized terms
     b1<-gam(y~s(x0,4|f)+s(x1)+s(x2,15))
     plot(b1,pages=1)
     # now simulate poisson data
     g<-exp(f/5)
     for (i in  1:length(f)) y[i]<-rpois(1,f[i])
     b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson)
     plot(b2,pages=1)

