Saturday, May 3, 2008

reading numbers from a file -- less buggy?

My previous readArray implementation contained a bug. I had implemented in terms of line-oriented input. However, this meant specifying a maximum line length (in characters). I thought 10000 characters was more than reasonable... well, it turns out my own code routinely prints out lines of numbers that are longer than that. (This is itself probably itself something that should change.) So, fgets was actually hitting this limit, while in the midst of reading in part of a number, splitting it into two parts, the next part showing up on the next fgets call, thus occasionally introducing two invalid numbers. If this hadn't introduced an extra number (and my code hadn't been checking it was getting the right number of numbers), I might have missed this.

One way to fix this would be to implement a function for line-oriented input that would guarantee the entire line had been read -- very close to the spirit of readArray. Instead though I've chosen to give up on line-oriented input and instead read things in a number at a time. The resulting code is a bit cleaner I think anyway (although there is still some duplication).

(Another fix would be to use a language for which reading a file full of a given data type is a feature of the standard library. I guess we almost have this with C++. Haskell must get this right, though, right?)
#include <stdlib.h>

/**
* Reads doubles from in until EOF.
* If le n is not null, *len is set to
* the number of elements read.
* The caller is responsible for freeing the
* memory of the returned object.
* Returns null in case of an error.
*/
double* readArray(FILE* in, int* len)
{
int capacity = 100;
double* y = malloc(sizeof(double) * capacity);
int n = 0;
while (!ferror(in) && !feof(in)) {
double v;
int status = fscanf(in, "%lf", &v);
if (status == 1) {
if (capacity == n) {
capacity *= 2;
y = realloc(y, capacity * sizeof(double));
}

y[n] = v;
++n;
}
else if (status == 0) {
fprintf(stderr,
"# Invalid input? (read %d values so far)\n",
n);
free(y);
return 0;
}
else if (status == EOF && !feof(in)) {
fprintf(stderr,
"# Got error (read %d values so far)\n", n);
perror("");
free(y);
return 0;
}
}

if (len) {
*len = n;
}

return y;
}

No comments: