glm                   package:base                   R Documentation

_F_i_t_t_i_n_g _G_e_n_e_r_a_l_i_z_e_d _L_i_n_e_a_r _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     `glm' is used to fit generalized linear models, specified by
     giving a symbolic description of the linear predictor and a
     description of the error distribution.

_U_s_a_g_e:

     glm(formula, family = gaussian, data, weights = NULL, subset = NULL,
         na.action, start = NULL, offset = NULL,
         control = glm.control(epsilon=0.0001, maxit=10, trace=FALSE),
         model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
         contrasts = NULL, ...)
     glm.control(epsilon = 0.0001, maxit = 10, trace = FALSE)
     glm.fit(x, y, weights = rep(1, nrow(x)),
             start = NULL, etastart  =  NULL, mustart = NULL,
             offset = rep(0, nrow(x)),
             family = gaussian(), control = glm.control(),
             intercept = TRUE)
     weights.glm(object, type = c("prior", "working"), ...)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description of the model to be fit. The details of
          model specification are given below.

  family: a description of the error distribution and link function to
          be used in the model. See `family' for details.

    data: an optional data frame containing the variables in the model.
           By default the variables are taken from
          `environment(formula)', typically the environment from which
          `glm' is called.

 weights: an optional vector of weights to be used in the fitting
          process.

  subset: an optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: a function which indicates what should happen when the data
          contain `NA's.  The default is set by the `na.action' setting
          of `options', and is `na.fail' if that is unset.  The
          ``factory-fresh'' default is `na.omit'.

   start: starting values for the parameters in the linear predictor.

etastart: starting values for the linear predictor.

 mustart: starting values for the vector of means.

  offset: this can be used to specify an a priori known component to be
          included in the linear predictor during fitting.

 control: a list of parameters for controlling the fitting process. 
          See the documentation for `glm.control' for details.

   model: a logical value indicating whether model frame should be
          included as a component of the returned value.

  method: the method to be used in fitting the model. The default (and
          presently only) method `glm.fit' uses iteratively reweighted
          least squares.

    x, y: logical values indicating whether the response vector and
          model matrix used in the fitting process should be returned
          as components of the returned value.

contrasts: an optional list. See the `contrasts.arg' of
          `model.matrix.default'.

  object: an object inheriting from class `"glm"'.

    type: character, partial matching allowed.  Type of weights to
          extract from the fitted model object.

_D_e_t_a_i_l_s:

     A typical predictor has the form `response ~ terms' where
     `response' is the (numeric) response vector and `terms' is a
     series of terms which specifies a linear predictor for `response'.
     For `binomial' models the response can also be specified as a
     `factor' (when the first level denotes failure and all others
     success) or as a two-column matrix with the columns giving the
     numbers of successes and failures.  A terms specification of the
     form `first + second' indicates all the terms in `first' together
     with all the terms in `second' with duplicates removed.

     A specification of the form `first:second' indicates the the set
     of terms obtained by taking the interactions of all terms in
     `first' with all terms in `second'. The specification
     `first*second' indicates the cross of `first' and `second'. This
     is the same as `first + second + first:second'.

_V_a_l_u_e:

     `glm' returns an object of class `glm' which inherits from the
     class `lm'. See later in this section.

     The function `summary' (i.e., `summary.glm') can be used to obtain
     or print a summary of the results and the function `anova' (i.e.,
     `anova.glm') to produce an analysis of variance table.

     The generic accessor functions `coefficients', `effects',
     `fitted.values' and `residuals' can be used to extract various
     useful features of the value returned by `glm'.

     `weights' extracts a vector of weights, one for each case in the
     fit (after subsetting and `na.action').

     An object of class `"glm"' is a list containing at least the
     following components:

coefficients: a named vector of coefficients

residuals: the working residuals, that is the residuals in the final
          iteration of the IWLS fit.

fitted.values: the fitted mean values, obtained by transforming the
          linear predictors by the inverse of the link function.

    rank: the numeric rank of the fitted linear model.

  family: the `family' object used.

linear.predictors: the linear fit on link scale.

deviance: up to a constant, minus twice the maximized log-likelihood. 
          Where sensible, the constant is chosen so that a saturated
          model has deviance zero.

     aic: Akaike's An Information Criterion, minus twice the maximized
          log-likelihood plus twice the number of coefficients (so
          assuming that the dispersion is known.

null.deviance: The deviance for the null model, comparable with
          `deviance'. The null model will include the offset, and an
          intercept if there is one in the model

    iter: the number of iterations of IWLS used.

 weights: the working residuals, that is the weights in the final
          iteration of the IWLS fit.

prior.weights: the case weights initially supplied.

df.residual: the residual degrees of freedom.

 df.null: the residual degrees of freedom for the null model.

       y: the `y' vector used. (It is a vector even for a binomial
          model.)

converged: logical. Was the IWLS algorithm judged to have converged?

boundary: logical. Is the fitted value on the boundary of the
          attainable values?

    call: the matched call.

 formula: the formula supplied.

   terms: the `terms' object used.

    data: the `data argument'.

  offset: the offset vector used.

 control: the value of the `control' argument used.

  method: the name of the fitter function used, in R always
          `"glm.fit"'.

contrasts: (where relevant) the contrasts used.

 xlevels: (where relevant) a record of the levels of the factors used
          in fitting.


     In addition, non-null fits will have components `qr', `R' and
     `effects' relating to the final weighted linear fit.

     Objects of class `"glm"' are normally of class `c("glm", "lm")',
     that is inherit from class `"lm"', and well-designed methods for
     class `"lm"' will be applied to the weighted linear model at the
     final iteration of IWLS.  However, care is needed, as extractor
     functions for class `"glm"' such as `residuals' and `weights' do
     not just pick out the component of the fit with the same name.

     If a `binomial' `glm' model is specified by giving a two-column
     response, the weights returned by `prior.weights' are the total
     numbers of cases (factored by the supplied case weights) and the
     component `y' of the result is the proportion of successes.

_N_o_t_e:

     Offsets specified by `offset' will not be included in predictions
     by `predict.glm', whereas those specified by an offset term in the
     formula will be.

_S_e_e _A_l_s_o:

     `anova.glm', `summary.glm', etc. for `glm' methods, and the
     generic functions `anova', `summary', `effects', `fitted.values',
     and `residuals'. Further, `lm' for non-generalized linear models.

     `esoph', `infert' and `predict.glm' have examples of fitting
     binomial glms.

_E_x_a_m_p_l_e_s:

     ## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
     ## Page 93: Randomized Controlled Trial :
     counts <- c(18,17,15,20,10,20,25,13,12)
     outcome <- gl(3,1,9)
     treatment <- gl(3,3)
     print(d.AD <- data.frame(treatment, outcome, counts))
     glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())
     anova(glm.D93)
     summary(glm.D93)

     ## an example with offsets from Venables & Ripley (1999, pp.217-8)


     ## Need the anorexia data from a recent version of the package MASS:
     library(MASS)
     data(anorexia)

     anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
                 family = gaussian, data = anorexia)
     summary(anorex.1)

     # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2)
     clotting <- data.frame(
         u = c(5,10,15,20,30,40,60,80,100),
         lot1 = c(118,58,42,35,27,25,21,19,18),
         lot2 = c(69,35,26,21,18,16,13,12,12))
     summary(glm(lot1 ~ log(u), data=clotting, family=Gamma))
     summary(glm(lot2 ~ log(u), data=clotting, family=Gamma))

