Monday, February 25, 2008

mcmcpvalue and contrasts

I thought Steven McKinney had a really interesting message in "mcmcpvalue and contrasts" on the R-sig-ME list today. I'll just reproduce his R session here:

> library(SASmixed)
> names(Semiconductor)
> str(Semiconductor)

'data.frame': 48 obs. of 5 variables:
$ resistance: num 5.22 5.61 6.11 6.33 6.13 6.14 5.6 5.91 5.49 4.6 ...
$ ET : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
$ Wafer : Factor w/ 3 levels "1","2","3": 1 1 1 1 2 2 2 2 3 3 ...
$ position : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
$ Grp : Factor w/ 12 levels "1/1","1/2","1/3",..: 1 1 1 1 2 2 2 2 3 3 ...
...
> options(contrasts=c("contr.treatment", "contr.poly"))
> lm1 <- lm(resistance ~ ET * position, data=Semiconductor)
> summary(lm1)


Call:
lm(formula = resistance ~ ET * position, data = Semiconductor)

Residuals:
Min 1Q Median 3Q Max
-1.01333 -0.25750 0.04333 0.28333 0.74667

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.61333 0.26891 20.874 <2e-16 ***
ET2 0.38000 0.38030 0.999 0.325
ET3 0.52333 0.38030 1.376 0.178
ET4 0.72667 0.38030 1.911 0.065 .
position2 -0.16333 0.38030 -0.429 0.670
position3 -0.06000 0.38030 -0.158 0.876
position4 0.27333 0.38030 0.719 0.478
ET2:position2 0.35667 0.53782 0.663 0.512
ET3:position2 0.37333 0.53782 0.694 0.493
ET4:position2 0.37667 0.53782 0.700 0.489
ET2:position3 -0.16667 0.53782 -0.310 0.759
ET3:position3 -0.30333 0.53782 -0.564 0.577
ET4:position3 -0.38333 0.53782 -0.713 0.481
ET2:position4 -0.35000 0.53782 -0.651 0.520
ET3:position4 -0.31667 0.53782 -0.589 0.560
ET4:position4 -0.07333 0.53782 -0.136 0.892
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4658 on 32 degrees of freedom
Multiple R-squared: 0.4211, Adjusted R-squared: 0.1498
F-statistic: 1.552 on 15 and 32 DF, p-value: 0.1449

> options(contrasts=c("contr.helmert", "contr.poly"))
> lm2 <- lm(resistance ~ ET * position, data=Semiconductor)
> summary(lm2)


Call:
lm(formula = resistance ~ ET * position, data = Semiconductor)

Residuals:
Min 1Q Median 3Q Max
-1.01333 -0.25750 0.04333 0.28333 0.74667

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.00292 0.06723 89.292 < 2e-16 ***
ET1 0.17000 0.09507 1.788 0.08324 .
ET2 0.09722 0.05489 1.771 0.08606 .
ET3 0.10986 0.03881 2.830 0.00797 **
position1 0.05667 0.09507 0.596 0.55535
position2 -0.11000 0.05489 -2.004 0.05360 .
position3 0.03542 0.03881 0.912 0.36834
ET1:position1 0.08917 0.13446 0.663 0.51197
ET2:position1 0.03250 0.07763 0.419 0.67826
ET3:position1 0.01667 0.05489 0.304 0.76337
ET1:position2 -0.05750 0.07763 -0.741 0.46427
ET2:position2 -0.03528 0.04482 -0.787 0.43700
ET3:position2 -0.02444 0.03169 -0.771 0.44617
ET1:position3 -0.05167 0.05489 -0.941 0.35363
ET2:position3 -0.01111 0.03169 -0.351 0.72818
ET3:position3 0.01125 0.02241 0.502 0.61909
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4658 on 32 degrees of freedom
Multiple R-squared: 0.4211, Adjusted R-squared: 0.1498
F-statistic: 1.552 on 15 and 32 DF, p-value: 0.1449

Note that R^2 is 0.4211 in both cases and that the F and corresponding p-values are also the same. "This is good - the
two models are mathematically equivalent, so the overall model fit statistics should not differ." (Well, I suppose I should paraphrase that too.)
> options(contrasts=c("contr.treatment", "contr.poly"))
> lmr1 <- lm(resistance ~ ET + position, data=Semiconductor)
> anova(lm1, lmr1)

Analysis of Variance Table

Model 1: resistance ~ ET * position
Model 2: resistance ~ ET + position
Res.Df RSS Df Sum of Sq F Pr(>F)
1 32 6.9421
2 41 7.7515 -9 -0.8095 0.4146 0.9178
> options(contrasts=c("contr.helmert", "contr.poly"))
> lmr2 <- lm(resistance ~ ET + position, data=Semiconductor)
> anova(lm2, lmr2)

Analysis of Variance Table

Model 1: resistance ~ ET * position
Model 2: resistance ~ ET + position
Res.Df RSS Df Sum of Sq F Pr(>F)
1 32 6.9421
2 41 7.7515 -9 -0.8095 0.4146 0.9178

Note that again the results are identical. If they were not, this would be a sign of a numerical issue. But this won't happen in R anyway because it's only in presenting the results that the differences come into play?


No longer following the e-mail: And what do the model matrices look like?
> a <- as.factor(rep(1:3, each=4))
> b <- as.factor(rep(1:2, each=6))
> y <- rnorm(12)
> options(contrasts=c("contr.treatment", "contr.poly"))
> model.matrix(y ~ a*b)

(Intercept) a2 a3 b2 a2:b2 a3:b2
1 1 0 0 0 0 0
2 1 0 0 0 0 0
3 1 0 0 0 0 0
4 1 0 0 0 0 0
5 1 1 0 0 0 0
6 1 1 0 0 0 0
7 1 1 0 1 1 0
8 1 1 0 1 1 0
9 1 0 1 1 0 1
10 1 0 1 1 0 1
11 1 0 1 1 0 1
12 1 0 1 1 0 1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

attr(,"contrasts")$b
[1] "contr.treatment"

> options(contrasts=c("contr.helmert", "contr.poly"))
> model.matrix(y ~ a*b)

(Intercept) a1 a2 b1 a1:b1 a2:b1
1 1 -1 -1 -1 1 1
2 1 -1 -1 -1 1 1
3 1 -1 -1 -1 1 1
4 1 -1 -1 -1 1 1
5 1 1 -1 -1 -1 1
6 1 1 -1 -1 -1 1
7 1 1 -1 1 1 -1
8 1 1 -1 1 1 -1
9 1 0 2 1 0 2
10 1 0 2 1 0 2
11 1 0 2 1 0 2
12 1 0 2 1 0 2
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.helmert"

attr(,"contrasts")$b
[1] "contr.helmert"

The help page for options reminds us that first element in the list for "contrasts" gives the function to be used with unordered factors while the second gives the function to use with ordered ones.
> b <- as.ordered(b)
> model.matrix(y ~ a*b)

(Intercept) a1 a2 b.L a1:b.L a2:b.L
1 1 -1 -1 -0.7071068 0.7071068 0.7071068
2 1 -1 -1 -0.7071068 0.7071068 0.7071068
3 1 -1 -1 -0.7071068 0.7071068 0.7071068
4 1 -1 -1 -0.7071068 0.7071068 0.7071068
5 1 1 -1 -0.7071068 -0.7071068 0.7071068
6 1 1 -1 -0.7071068 -0.7071068 0.7071068
7 1 1 -1 0.7071068 0.7071068 -0.7071068
8 1 1 -1 0.7071068 0.7071068 -0.7071068
9 1 0 2 0.7071068 0.0000000 1.4142136
10 1 0 2 0.7071068 0.0000000 1.4142136
11 1 0 2 0.7071068 0.0000000 1.4142136
12 1 0 2 0.7071068 0.0000000 1.4142136
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.helmert"

attr(,"contrasts")$b
[1] "contr.poly"


//Also, something to remember: options(contrasts=c("contr.SAS", "contr.poly")) sets the system to use contrasts in the style of SAS.

No comments: