TIBCO Spotfire FAQ

Contrasts: Coding of Factors in TIBCO Spotfire S+

Why does the output from a model with N-level factor variable return only N-1 coefficients?  

To demonstrate contrasts or the coding of factors in TIBCO Spotfire S+ consider the following S+ session in which an analysis of variance model is fit with the aov() function:

   -------------------------------------------------------------------------

   # Examine the built-in dataset 'guayule'.

   > names(guayule)

   [1] "variety" "treatment" "reps" "plants" "flats"

   > dim(guayule)

   [1] 96 5

 

   # Pick two columns from the dataset for an aov() model.

   > c(data.class(guayule$variety), data.class(guayule$treatment))

   [1] "factor" "factor"

   > c(length(levels(guayule$variety)),

   length(levels(guayule$treatment)))

   [1] 8 4

 

   # Example call to aov().

   > tgaov <- aov(plants ~ variety * treatment, data = guayule)

   > summary(tgaov)

   Df Sum of Sq Mean Sq F Value Pr(F)

   variety 7 763.16 109.02 2.7058 0.01604076

   treatment 3 30774.28 10258.09 254.5959 0.00000000

   variety:treatment 21 2620.14 124.77 3.0966 0.00026666

   Residuals 64 2578.67 40.29

 

   # Note that the degrees of freedom for both factor columns is one less

   # than the number of levels in the column. This means that for a

   # factor with n levels, only n-1 parameters are fit in the model.

   -------------------------------------------------------------------------

This is a result of the way that S+ numerically codes the levels of a factor variable when it includes them in a model computation.  Pages 39-40 in Volume 1 of the "TIBCO Spotfire S+ 8.1 Guide to Statistics" provide a general explanation for this choice in fitting factor variables.  Basically, overparameterization can occur when each level of a factor variable is with its own coefficient.  Instead, linear combinations of the factor levels are fit, and a factor with N levels will have N-1 possible (independent) linear combinations.  This means that only N-1 coefficients will be returned.

By default, most model functions that require contrasts use the value of

   > options()$contrasts

to determine their coding scheme.  Currently, there are Helmert contrasts, polynomial contrasts, sum contrasts, and treatment contrasts built into S+.  Sum contrasts produce coefficients whose sum is zero, and treatment contrasts produce dummy coefficients in which the first level of the factor is zeroed out.  For more details on any of these coding schemes, you can type '?contr.helmert' at the S+ prompt to see their common help file.

As a supplement to the discussion above, you may want to reference "Modern Applied Statistics with S" by W.N. Venables and B.D. Ripley (Fourth Edition).  This text gives a more theoretical discussion on the relationship between the coefficients for contrasts and the coefficients for factor levels.

Comments

No Comments

Spotfire's interactive information visualization and analytic solutions give users a remarkable experience for quickly and easily querying data and reporting results for superior business intelligence. From portfolio management and customer retention programs to key processes such as CRM, marketing, research, bioinformatics, yield and asset management and design for manufacturing, enterprises around the world rely on Spotfire's business analytics software to improve operational performance.

©Copyright 2000-2011 TIBCO Software Inc | Privacy Policy | Terms of Use I Blog I Contact Us I Content Center