Sunday, February 10, 2008

correlation on subsets

I don't think anyone's subscribing to this so I'll just make this a new post. (The path to freedom lies in attaining obscurity!)
##Find the correlation between each pair of columns
##in d != base conditional on base being in the
##corresponding subset.
##
##d -- 3 col data.frame or matrix
##subset -- list of vectors
##base -- column to use in conditioning
cor.on.subsets <- function(d, subsets, base=2) {
stopifnot(dim(d)[2] == 3)
stopifnot(base %in% 1:3)
x <- setdiff(1:3, base)
n <- length(subsets)
r <- numeric(n)
for (i in 1:n) {
d1 <- d[d[, base] %in% subsets[[i]], ]
r[i] <- cor(d1[,x[1]], d1[, x[2]])
}
r
}

Would it be more stylish to use subset rather than direct row selection in [d[, base] %in% subsets[[i]], ]?

No comments: