polymars              package:polymars              R Documentation

_M_u_l_t_i-_r_e_s_p_o_n_s_e _M_u_l_t_i_v_a_r_i_a_t_e _A_d_a_p_t_i_v_e _R_e_g_r_e_s_s_i_o_n _S_p_l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit multi-response Multivariate Adaptive Regression Splines.

_U_s_a_g_e:

     polymars(responses, predictors, maxsize, gcv=4, additive=FALSE,
              startmodel, weights, no.interact, knots, knot.space=3,
              ts.resp, ts.pred, ts.weights, classify, factors, tolerance,
              verbose=FALSE)

_A_r_g_u_m_e_n_t_s:

responses: vector of responses, or a matrix for multiple response
          regression.  In the case of a matrix each column corresponds
          to a response and each row corresponds to an observation.
          Missing values are not allowed.

predictors: matrix of predictor variables for the regression. Each
          column corresponds to a predictor and each row corresponds to
          an observation in the same order as they appear in the
          response argument. Missing values are not allowed.

 maxsize: the maximum number of basis functions that the model is
          allowed to grow to in the stepwise addition procedure. The
          default is `min(6 * (number.of.observations^(1/3)),
          number.of.observations/4, 100)'.

     gcv: parameter used to find the overall best model from a sequence
          of fitted models. The residual sum of squares of a model is
          penalized by dividing by the square of 1-(gcv x model
          size)/cases. A larger gcv value would tend to produce a
          smaller model. By default gcv is set to 4.

additive: Should the fitted model be additive in the predictors?

startmodel: the first model that is to be fit by the polymars function.
          It is either an object of the class polymars or a model
          specified by the user.  In that case, it takes the form of a
          4 x n matrix (n = number of basis functions in the starting
          model excluding the intercept term). Each row corresponds to
          one basis function (with two possible components). Column 1
          is the index of the first predictor involved. Column 2 is a
          possible knot in this predictor. If column 2 is NA, the first
          component is linear. Column 3 is the possible second
          predictor involved (if column 3 is NA the basis function only
          depends on one predictor). Column 4 contains the possible
          knot for the predictor in column 3, and it is NA when this
          component is linear.  Example: if a row reads 3 NA 2 4.7, the
          corresponding basis function is [X3 * (X2-4.7)_+]; if a row
          reads 2 4.3 NA NA the corresponding basis function is [X2 -
          4.3)_+].

          A fifth column can be added with 1's and 0's, The 1's specify
          which basis functions of the startmodel must be in each
          model. Thus, these functions stay in the model during the
          whole stepwise fitting procedure.

          If no startmodel is specified polymars starts with a model
          that only contains  the intercept.

 weights: vector of observation weights; if supplied, the algorithm
          fits to minimize the sum of the weights multiplied by the
          squared residuals.  The length of weights must be the same as
          the number of observations. The weights must be nonnegative.

no.interact: a matrix used if certain predictor interactions are not
          allowed in the model. It is given as a matrix (2 * m), with
          predictor indices as entries.  The two predictors of any row
          cannot have interaction terms with each other.

   knots: defines how the function is to find potential knots for the
          spline basis functions.  This can be set to the maximum
          number of knots you would like to be considered for each
          predictor. Usually, to avoid the design matrix becoming
          singular the actual number of knots produced is constrained
          to at most every third order statistic in any predictor. This
          constraint can be adjusted using the `knot.space' argument. 
          It can also be a vector with the number of potential knots
          for each predictor.  Again the actual number of knots
          produced is constrained to be at most every third order
          statistic any predictor.  A third possibility is to provide a
          matrix where each columns corresponds to the ordered knots
          you would like to have considered for that predictor. This
          matrix should be filled out to a rectangular data structure
          with NAs. The default is `min(20, round(ncases/4))' knots per
          predictor. 

          When specifying knots as a vector an entry of -1 indicates
          that the predictor is a categorical variable and each unique
          entry in it's column is treated as a level.

          When specifying knots as a single number or a matrix and
          there are categorical variables these are specified
          separately as such using the factor argument.

knot.space: is an integer describing the minimum number of order
          statistics apart that two knots can be. Knots should not be
          too close to insure numerical stability.  The default is 3.

 ts.resp: testset responses for model selection. Should have the same
          number of columns as the training set response. A testset can
          be used for the model selection.  Depending on the value of
          classify, either the model with the smallest testset residual
          sum of squares or the smallest testset classification error
          is provided. Overrides gcv.

 ts.pred: testset predictors. Should have the same number of columns as
          the training set predictors.

ts.weights: testset observation weights. A vector of length equal to
          the number of cases of the testset. All weights must be
          non-negative.

classify: when the response is discrete (categorical), polymars can be
          used for classification. In particular, when classify = T, a
          discrete response with K levels is replaced by K indicator
          variables as response. Model selection is still being carried
          out using gcv, except when a testset is provided, in which
          case testset misclassification is used to select the best
          model.

 factors: used to indicate that certain variables in the predictor set
          are categorical variables. Specified as a vector containing
          the appropriate predictor indices (column numbers of
          categorical variables in predictors matrix). Factors can also
          be set when the `knots' argument is given as a vector, with
          -1 as the appropriate entries for factors.

tolerance: for each possible candidate to be added/deleted the
          resulting residual sums of squares of the model, with/without
          this candidate, must be calculated.  The inversion of the
          `X-transpose by X' matrix, X being the design matrix, is done
          by an updating procedure c.f. C.R. Rao - Linear Statistical
          Inference and Its Applications, 2nd. edition, page 33.  In
          the inversion the size of the bottom right-hand entry of this
          matrix is critical. If its value is near zero or the value of
          its inverse is almost zero then the inversion procedure
          becomes somewhat inaccurate. The lower the tolerance value
          the more careful the procedure is in selecting candidates for
          addition to the model but it may exclude too conservatively.
          And the other hand if the tolerance is set too high a
          spurious result with a singular or otherwise sub-optimal
          model may occur. By default tolerance is set to 1.0e-5.

 verbose: when set =T, the function will print out a line for each
          addition or deletion  stage. For example, " + 8 : 5 3.25 2
          NA" means adding interaction basis function  of predictor 5
          with knot at 3.25 and predictor 2 (linear), to make a model
          of size 8, including intercept.

_D_e_t_a_i_l_s:

     `is.polymars' and `summary.polymars' are available.
     `print.polymars' defaults to `summary.polymars'.

_V_a_l_u_e:

     The returned object contains information about the fitting steps
     and the model selected. The first data frame contains a row for
     each step of the fitting procedure. In the columns are: a 1 for an
     addition step or a 0 for a deletion step, the size of the model at
     each step, residual sums of squares (RSS) and the generalized
     cross validation value (GCV), testset residual sums of squares or
     testset misclassification, whatever was used for the model
     selection.

     The second data frame, model, contains a row for each basis
     function of the model. Each row corresponds to one basis function
     (with two possible components). The pred1 column contains the
     indices of the first predictor of the basis function. Column knot1
     is a possible knot in this predictor. If this column is NA, the
     first component is linear. If any of the basis functions of the
     model is categorical then there will be a level1 column. Column
     pred2 is the possible second predictor involved (if it is NA the
     basis function only depends on one predictor). Column knot2
     contains the possible knot for the predictor pred2, and it is NA
     when this component is linear. This is a similar format to the
     startmodel argument together with an additional first row
     corresponding to the intercept but the startmodel doesn't use a
     separate column to specify levels of a categorical variable . If
     any predictor in pred2 is categorical then there will be a level2
     column. The column "coefs" (more than one column in the case of
     multiple response regression) contains the coefficients.

     The returned object also contains the fitted values and residuals
     of the data used in fitting the model.

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J. H. (1991). Multivariate adaptive regression splines
     (with discussion).  The Annals of Statistics, 19, 1-141.

     Kooperberg, C., Bose, S. and Stone, C. J. (1997). Polychotomous
     Regression.  Journal of the American Statistical Association, 92, 
     - .

_S_e_e _A_l_s_o:

     `plot.polymars', `predict.polymars', `summary.polymars'

_E_x_a_m_p_l_e_s:

     data(state)
     state.pm<-polymars(state.region,state.x77,knots=15,classify=T)
     data<-na.omit(auto.stats)
     auto.pm<-polymars(data[1:40,1],data[1:40,-1],factor=c(2,3),
       ts.pred=data[41:66,-1],ts.resp=data[41:66,1],maxsize=40)

