Monday, May 12, 2008

Generalized Linear Mixed Models, etc.

There has recently been a very interesting thread on generalized linear mixed models on R-sig-ME. Ken Beath writes, "For choosing the number of classes for mixture models it has been shown that BIC fails theoretically but works well in practice (proven with simulations) and compares well to results from parametric bootstrapping of the LRT." Where should one look for discussion of this?

Dimitris Rizopoulos writes, "Well, in Linear Mixed Models the integral over the random effects can be analytically evaluated and thus no approximation (i.e., AGQ or Laplace) is required. In GLMMs this is not the case and thus the log-likelihood needs to be calculated approximately. One method for approximating the integral is the AGQ, and in fact Laplace is AGQ with one quadrature point." Should a stat computing class cover all this?

Doug Bates writes, "That is, you gain more from using the Laplace approximation at the conditional modes of the random effects (which is the 'adaptive' part) than increasing the number of Gauss-Hermite quadrature points at the marginal modes. The tricky bit of AGQ or Laplace is determining the conditional modes and that part is already done." I'll carefully read through the source code for lme4 sometime soon I think. Doesn't this sound like fun?

Peter Dalgaard gives describes some issues related to generalized linear mixed models. The Laplace method is a trick from Bayesian methods based on the observation that the likelihood approaches a Gaussian -- "Approximate using value at peak and Hessian."

Some fun R functions:
> embed(1:5, dim=3)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 4 3 2
[3,] 5 4 3
> embed(1:5, dim=2)
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] 4 3
[4,] 5 4
> as.integer(intToBits(as.integer(3)))
[1] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Using intToBits, we can write a function to print out the bits of a floating point number (ignoring exceptional cases such as Inf):
intToBitString <- function(x, bits=32) {
substring(paste(rev(as.integer(intToBits(as.integer(x)))),
collapse=""),
32 - bits + 1, 32)
}

doubleToBitString <- function(x) {
stopifnot(is.finite(x))
e <- floor(log2(x))
m <- (abs(x * 2 ^ -e) - 1)
shift <- 8
s.bits <- ""
remaining <- .Machine$double.digits
while (remaining > 0) {
u <- min(remaining, shift)
m <- m * 2 ^ u
k <- floor(m)
s.bits <- paste(s.bits, intToBitString(k, u), sep="")
m <- m - k
remaining <- remaining - u
}

mexp <- .Machine$double.exponent

paste(if (x > 0) 0 else 1,
intToBitString(e + 2^(mexp - 1) - 1, mexp),
s.bits)
}
(We could easily avoid depending on the base being 2, but is it worth it?) The output for pi seems to match that given for pi on Wikipedia page.


> doubleToBitString(pi)
[1] "0 10000000000 10010010000111111011010101000100010000101101000110000"

Here, just for kicks, is an implementation using the hack of dynamically creating a shared library to accomplish this task. (I wrote this a couple years ago in response to a class assignment to print out the bits of a floating point number in R. Well, the code I actually turned in was pure R -- and way longer.)
create.so <- function() {
writeLines(c("void doubleToBitString(const double* x,"
" int* length, char** result)",
"{",
" const char* bytes = (const void*)x;",
" int numBytes = sizeof(double);",
" static char buffer[1024];",
" int i;",
" for (i = 0; i != numBytes; ++i) {",
" int j;",
" for (j = 0; j != 8; ++j) {",
" int mask = 1 << j;",
" buffer[8*i + j] = (mask & bytes[i]) ? \'1\' : \'0\';",
" }",
" }",
" buffer[8*numBytes] = \'\\0\';",
" *result = buffer;",
" *length = numBytes;",
"}"),
con = "bitstring.c")
system("R CMD SHLIB bitstring.c", ignore.stderr=T)
}

doubleToBitString <- function(x) {
if (!file.exists("bitstring.so")) {
create.so()
}
dyn.load("bitstring.so")
out <- .C("doubleToBitString",
x=as.double(x),
len=as.integer(1),
bits=as.character('junk'))
return(out$bits)
}
Whoa. Does this write into memory it shouldn't be touching?

No comments: