rstatd sometimes dies, calling abort(); this is especially lame because the error message is written to stderr, which is connected to /dev/null. An assertion failure occurs when the interrupt count in the intr line of /proc/stat overflows. For example, /proc/stat on one of our machines currently contains: cpu 67762223 2464694 72368167 722550670 cpu0 33726360 1229782 36212508 361404227 cpu1 34035863 1234912 36155659 361146443 disk 127470829 51705841 122453933 0 disk_rio 66873964 42355718 91159870 0 disk_wio 60596865 9350123 31294063 0 disk_rblk 528762800 338845600 729278906 0 disk_wblk 446703432 74800984 250352504 0 page 812719133 365201218 swap 113654 9423 intr 179884353 432572877 ... ctxt 2827166050 btime 962238182 processes 12760425 The following patch works around the problem, and suggests a more thorough fix: --- rstat_proc.c.orig Thu Aug 17 20:57:26 2000 +++ rstat_proc.c Thu Aug 17 21:39:49 2000 @@ -406,10 +406,12 @@ char* b; buff[BUFFSIZE-1] = 0; /* ensure null termination in buffer */ read(stat,buff,BUFFSIZE-1); close(stat); +#ifdef BROKEN_ASSERT *itot = 0; *i1 = 1; /* ensure assert below will fail if the sscanf bombs */ +#endif /* BROKEN_ASSERT */ b = strstr(buff, "cpu "); sscanf(b, "cpu %lu %lu %lu %lu", cuse, cice, csys, cide); b = strstr(buff, "disk "); sscanf(b, "disk %u %u %u %u", d->xfer+0, d->xfer+1, d->xfer+2, d->xfer+3); @@ -428,9 +430,17 @@ b = strstr(buff, "intr "); sscanf(b, "intr %u %u", itot, i1); b = strstr(buff, "ctxt "); sscanf(b, "ctxt %u", ct); +#ifdef BROKEN_ASSERT + /* + * This is bogus: *itot can apparently overflow an unsigned int and + * appear to be smaller than *i1 ... If we really care about potential + * misinterpretation of /proc/stat, then we should check that b = strstr() + * is not null, and the actual return value from each sscanf(). + */ assert(*itot>*i1); +#endif /* BROKEN_ASSERT */ } } static int procnetdev_vsn; ----------------------------------------- This is broken for the rusers-0.16-4 RPM (as shipped with RH 6.2).
Fixed in RH 7.1 (newest netkit-rusers has the patch included). Read ya, Phil
This is *not* fixed in RH 7.1. I am looking at the source code from the rusers-server-0.17-10 RPM right now, and the offending assert() is still present. The new code does check the return values of the scanf calls, which is good. But it still asserts that *itot > *i1, which is false once these counters overflow. The easy fix is simply to remove the assertion, as the patch above does.
This is still not fixed in RedHat 7.3 (rusers-0.17-12) despite the CURRENTRELEASE resolution.
Confirmed still not fixed.
OK, the latest builds of the rusers package simply remove this assert as the kernel interface for the /proc/stat has changed for some of the 2.4 kernels anyway and the assert will freak out if those are encountered. So the current rawhide version (rusers-0.17-21) is fixed. Read ya, Phil