Bug 206957 - sar gives incorrect values for CPU utilization
sar gives incorrect values for CPU utilization
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: sysstat (Show other bugs)
4.4
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Ivana Varekova
Brian Brock
:
Depends On: 196666
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-18 10:24 EDT by Thomas Sudbrak
Modified: 2007-11-16 20:14 EST (History)
3 users (show)

See Also:
Fixed In Version: sysstat-5.0.5-14.rhel4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-02 06:59:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
use long long variables for cpu utilization within struct file_stats. (4.23 KB, patch)
2006-09-18 10:24 EDT, Thomas Sudbrak
no flags Details | Diff
use long long variables for cpu utilization within struct file_stats and stats_one_cpu. (5.04 KB, patch)
2006-09-19 06:04 EDT, Thomas Sudbrak
no flags Details | Diff

  None (edit)
Description Thomas Sudbrak 2006-09-18 10:24:49 EDT
Description of problem:

Uptime being long, the command sar -u gives incorrect values for CPU
utilization.  The problem occurs on systems running a 2.6 kernel as soon as one
of the values in /proc/stat exceeds 2^32.

Version-Release number of selected component (if applicable):

5.0.5-11.rhel4

How reproducible:

Always after the threshold of 2^32 is reached.

Steps to Reproduce:

1. Wait a certain time (approx. 2 months on an machine with 8 processors or 4
processors with HT).  This is NOT a joke, it really happened on our machines:

uptime:
  16:09:57 up >>> 103 days <<<, ...

cat /proc/stat:
  cpu  97721466 8392 15181193 >>> 7001411008 <<< 24658269 414919 1863876
  cpu0 ...
  ...
  cpu8 ...

2. run "sar -P ALL 20 1" or similar and compare the result of the first line
(average) with the average of the following lines for each column.  The first
line obviously contains corrupted data.

Additional info:

The problem occurs due to the size of the components in struct file_stats
defined in file sa.h.  In contrast to a 2.4 kernel the values in /proc/stat
increase much faster (probably due to the frequency of 1000Hz).

The attached patch fixes the problem.  There might be other components of
file_stats (or others) which also need to be enlarged in size.
Comment 1 Thomas Sudbrak 2006-09-18 10:24:49 EDT
Created attachment 136546 [details]
use long long variables for cpu utilization within struct file_stats.
Comment 2 Thomas Sudbrak 2006-09-19 06:04:02 EDT
Created attachment 136618 [details]
use long long variables for cpu utilization within struct file_stats and stats_one_cpu.

This patch was generated after rpmbuild's %prep stage and this is to be applied
AFTER all usual patches.
Comment 3 Thomas Sudbrak 2006-09-19 06:09:29 EDT
Comment on attachment 136618 [details]
use long long variables for cpu utilization within struct file_stats and stats_one_cpu.

The patch only covers the i386 architecture; the situation on x86_64 and others
has to be examined as well.
Comment 4 Tom Sightler 2007-01-24 23:28:17 EST
We are seeing this issue on a number of servers, especially Dell 6850's which are 
8 core systems (16 with hyper-threading).  It only takes a few weeks for the 
problem to show up.  This really should be fixed as it's very misleading when 
attempting to use sar to look at load history and has confused our DBA's several 
times.
Comment 5 Ivana Varekova 2007-05-02 06:59:19 EDT
This problem is fixed in sysstat-5.0.5-14.rhel4, if the problem persists after
the upgrade to 14.rhel4, please reopen this bug.

Note You need to log in before you can comment on or make changes to this bug.