Description of problem: Uptime being long, the command sar -u gives incorrect values for CPU utilization. The problem occurs on systems running a 2.6 kernel as soon as one of the values in /proc/stat exceeds 2^32. Version-Release number of selected component (if applicable): 5.0.5-11.rhel4 How reproducible: Always after the threshold of 2^32 is reached. Steps to Reproduce: 1. Wait a certain time (approx. 2 months on an machine with 8 processors or 4 processors with HT). This is NOT a joke, it really happened on our machines: uptime: 16:09:57 up >>> 103 days <<<, ... cat /proc/stat: cpu 97721466 8392 15181193 >>> 7001411008 <<< 24658269 414919 1863876 cpu0 ... ... cpu8 ... 2. run "sar -P ALL 20 1" or similar and compare the result of the first line (average) with the average of the following lines for each column. The first line obviously contains corrupted data. Additional info: The problem occurs due to the size of the components in struct file_stats defined in file sa.h. In contrast to a 2.4 kernel the values in /proc/stat increase much faster (probably due to the frequency of 1000Hz). The attached patch fixes the problem. There might be other components of file_stats (or others) which also need to be enlarged in size.
Created attachment 136546 [details] use long long variables for cpu utilization within struct file_stats.
Created attachment 136618 [details] use long long variables for cpu utilization within struct file_stats and stats_one_cpu. This patch was generated after rpmbuild's %prep stage and this is to be applied AFTER all usual patches.
Comment on attachment 136618 [details] use long long variables for cpu utilization within struct file_stats and stats_one_cpu. The patch only covers the i386 architecture; the situation on x86_64 and others has to be examined as well.
We are seeing this issue on a number of servers, especially Dell 6850's which are 8 core systems (16 with hyper-threading). It only takes a few weeks for the problem to show up. This really should be fixed as it's very misleading when attempting to use sar to look at load history and has confused our DBA's several times.
This problem is fixed in sysstat-5.0.5-14.rhel4, if the problem persists after the upgrade to 14.rhel4, please reopen this bug.