Friday, June 6, 2008

partition number according to proportions

/**
* On exit,
* counts[k] = number of points assigned to kth class,
* k = 0:(numClasses - 1) out of numPoints. Ensures that counts[k] > 0
* for every k while also sum(counts) = numPoints.
*/
void proportionsToCounts(const double* proportions,
int numClasses,
int numPoints,
int* counts)
{
int total = 0;
int numEmpty = 0;
int i;

assert(numPoints >= numClasses);

for (i = 0; i < numClasses; ++i) {
counts[i] = (int) round(numPoints * proportions[i]);
total += counts[i];
if (!counts[i]) {
++numEmpty;
}
}

if (numEmpty > 0 || total != numPoints) {
const double remaining = numPoints - numClasses;
int assigned = 0;
for (i = 1; i < numClasses; ++i) {
const int extra = (int) (remaining * proportions[i]);
assigned += extra;
counts[i] = 1 + extra;
}

counts[0] = 1 + numPoints - numClasses - assigned;
}
}
Earlier I had implemented this using only the second loop, now only used conditionally, with difference that I first rounded remaining * proportions[i] before truncating it to an int. However, this caused problems with too many points getting allocated, leading to a negative remainder and counts[0] getting an invalid value. Switching to simply truncating to zero, however, led us to sometimes not get exact results even when they were available. Eg, since (int) (97 * 0.5) = 48, the code under the if statement above divides 100 into (26, 49, 25) given proportions (0.25, 0.5, 0.25) rather (25, 50, 25) as one would hope. Hence, I have added the first loop and delegated the "safer" code to be called only in case of emergencies.

I'm sure there's a better way to do this though, but the above is probably more than good enough for my purposes.

No comments: