Quotidian: strings again

Saturday, March 8, 2008

strings again

getChars <- function(s) {
  n <- nchar(s)
  if (n > 0) substring(s, 1:n, 1:n) else character(0)
}

strip <- function(s, chars) {
  s.chars <- getChars(s)
  paste(s.chars[!(s.chars %in% chars)], collapse="")
}

tr <- function(s, from, to) {
  chars <- getChars(s)
  o <- match(chars, from)
  paste(ifelse(!is.na(o), to[o], chars), collapse="")
}

lower <- function(s) {
  tr(s, from=LETTERS, to=letters)
}

upper <- function(s) {
  tr(s, from=letters, to=LETTERS)
}

Another go: S-PLUS doesn't have strsplit so I use a different (and more efficient?) method for getting at the characters of a string.

> system.time(replicate(10000, strsplit("1234567890", "")[[1]])) user system elapsed 0.102 0.004 0.105 > system.time(replicate(10000, substring("1234567890", 1:10, 1:10))) user system elapsed 0.297 0.003 0.299

That's a surprise. Maybe I should try avoiding creating the index list twice? Still strsplit seems so much heavier.
//The source code for strsplit reveals that they make a special case of the pattern "". (See src/main/character.c.)
//Well, this is documented in the help page for strsplit as well.

Quotidian

Saturday, March 8, 2008

strings again

No comments:

About Me