Bug 16487

Summary: rstatd aborts if /proc/stat intr count overflows
Product: [Retired] Red Hat Linux Reporter: Need Real Name <byrnes>
Component: rusersAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: lopresti
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-05-30 21:05:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2000-08-18 02:06:00 UTC
rstatd sometimes dies, calling abort(); this is especially lame because the
error message is written to stderr, which is connected to /dev/null.

An assertion failure occurs when the interrupt count in the intr line
of /proc/stat overflows.  For example, /proc/stat on one of our machines
currently contains:

cpu  67762223 2464694 72368167 722550670
cpu0 33726360 1229782 36212508 361404227
cpu1 34035863 1234912 36155659 361146443
disk 127470829 51705841 122453933 0
disk_rio 66873964 42355718 91159870 0
disk_wio 60596865 9350123 31294063 0
disk_rblk 528762800 338845600 729278906 0
disk_wblk 446703432 74800984 250352504 0
page 812719133 365201218
swap 113654 9423
intr 179884353 432572877 ...
ctxt 2827166050
btime 962238182
processes 12760425

The following patch works around the problem, and suggests a more thorough
fix:

--- rstat_proc.c.orig   Thu Aug 17 20:57:26 2000
+++ rstat_proc.c        Thu Aug 17 21:39:49 2000
@@ -406,10 +406,12 @@
     char* b;
     buff[BUFFSIZE-1] = 0;  /* ensure null termination in buffer */
     read(stat,buff,BUFFSIZE-1);
     close(stat);
+#ifdef BROKEN_ASSERT
     *itot = 0; 
     *i1 = 1;   /* ensure assert below will fail if the sscanf bombs */
+#endif /* BROKEN_ASSERT */
     b = strstr(buff, "cpu ");
     sscanf(b, "cpu  %lu %lu %lu %lu", cuse, cice, csys, cide);
     b = strstr(buff, "disk ");
     sscanf(b, "disk %u %u %u %u", d->xfer+0, d->xfer+1, d->xfer+2,
d->xfer+3);
@@ -428,9 +430,17 @@
     b = strstr(buff, "intr ");
     sscanf(b, "intr %u %u", itot, i1);
     b = strstr(buff, "ctxt ");
     sscanf(b, "ctxt %u", ct);
+#ifdef BROKEN_ASSERT
+    /*
+     * This is bogus: *itot can apparently overflow an unsigned int and
+     * appear to be smaller than *i1 ...  If we really care about
potential
+     * misinterpretation of /proc/stat, then we should check that b =
strstr()
+     * is not null, and the actual return value from each sscanf().
+     */
     assert(*itot>*i1);
+#endif /* BROKEN_ASSERT */
   }
 }
 
 static int procnetdev_vsn;

-----------------------------------------

This is broken for the rusers-0.16-4 RPM (as shipped with RH 6.2).

Comment 1 Phil Knirsch 2001-06-15 18:24:50 UTC
Fixed in RH 7.1 (newest netkit-rusers has the patch included).

Read ya, Phil

Comment 2 Patrick J. LoPresti 2001-08-01 20:33:58 UTC
This is *not* fixed in RH 7.1.  I am looking at the source code from the
rusers-server-0.17-10 RPM right now, and the offending assert() is still
present.

The new code does check the return values of the scanf calls, which is good. But
it still asserts that *itot > *i1, which is false once these counters overflow. 
The easy fix is simply to remove the assertion, as the patch above does.


Comment 3 Thomas Insel 2002-05-08 18:03:39 UTC
This is still not fixed in RedHat 7.3 (rusers-0.17-12) despite
the CURRENTRELEASE resolution.

Comment 4 Matt Wilson 2002-05-30 21:05:27 UTC
Confirmed still not fixed.


Comment 5 Phil Knirsch 2002-08-20 15:29:51 UTC
OK, the latest builds of the rusers package simply remove this assert as the
kernel interface for the /proc/stat has changed for some of the 2.4 kernels
anyway and the assert will freak out if those are encountered.

So the current rawhide version (rusers-0.17-21) is fixed.

Read ya, Phil