Why does the output from a model with N-level factor variable return only N-1 coefficients?
To demonstrate contrasts or the coding of factors in TIBCO Spotfire S+ consider the following S+ session in which an analysis of variance model is fit with the aov() function:
-------------------------------------------------------------------------
# Examine the built-in dataset 'guayule'.
> names(guayule)
[1] "variety" "treatment" "reps" "plants" "flats"
> dim(guayule)
[1] 96 5
# Pick two columns from the dataset for an aov() model.
> c(data.class(guayule$variety), data.class(guayule$treatment))
[1] "factor" "factor"
> c(length(levels(guayule$variety)),
length(levels(guayule$treatment)))
[1] 8 4
# Example call to aov().
> tgaov <- aov(plants ~ variety * treatment, data = guayule)
> summary(tgaov)
Df Sum of Sq Mean Sq F Value Pr(F)
variety 7 763.16 109.02 2.7058 0.01604076
treatment 3 30774.28 10258.09 254.5959 0.00000000
variety:treatment 21 2620.14 124.77 3.0966 0.00026666
Residuals 64 2578.67 40.29
# Note that the degrees of freedom for both factor columns is one less
# than the number of levels in the column. This means that for a
# factor with n levels, only n-1 parameters are fit in the model.
-------------------------------------------------------------------------
This is a result of the way that S+ numerically codes the levels of a factor variable when it includes them in a model computation. Pages 39-40 in Volume 1 of the "TIBCO Spotfire S+ 8.1 Guide to Statistics" provide a general explanation for this choice in fitting factor variables. Basically, overparameterization can occur when each level of a factor variable is with its own coefficient. Instead, linear combinations of the factor levels are fit, and a factor with N levels will have N-1 possible (independent) linear combinations. This means that only N-1 coefficients will be returned.
By default, most model functions that require contrasts use the value of
> options()$contrasts
to determine their coding scheme. Currently, there are Helmert contrasts, polynomial contrasts, sum contrasts, and treatment contrasts built into S+. Sum contrasts produce coefficients whose sum is zero, and treatment contrasts produce dummy coefficients in which the first level of the factor is zeroed out. For more details on any of these coding schemes, you can type '?contr.helmert' at the S+ prompt to see their common help file.
As a supplement to the discussion above, you may want to reference "Modern Applied Statistics with S" by W.N. Venables and B.D. Ripley (Fourth Edition). This text gives a more theoretical discussion on the relationship between the coefficients for contrasts and the coefficients for factor levels.