Bug 805392 - idle & iowait ticks overflow
idle & iowait ticks overflow
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.2
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Prarit Bhargava
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-21 02:44 EDT by colyli
Modified: 2013-01-15 09:46 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-01-15 09:46:02 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description colyli 2012-03-21 02:44:09 EDT
Description of problem:
under /proc/stat, idle and iowait ticks overflows, which results top/sar reporting high CPU utilization.
NOTE: This bug is fixed, the patch will be mentioned in bellowed information.


Version-Release number of selected component (if applicable):
2.6.32-220.7.1

How reproducible:

This bug is introduced after we merged 3 upstream patches to fix iowait accouting problem,
1)nohz: Fix update_ts_time_stat idle accounting
2)nohz: Make idle/iowait counter update conditional
3)Consider NO_HZ when printing idle and iowait times

After using these patches, we observed abnormal CPU utilization numbers, which are,
1) all user/sys/idle/iowait are 0%
2) user util% are more than 200%

After some debug, it seems when uptime more then 2 hours, idle ticks on some CPU core from /proc/stat are observed being decreased, for 16 core machine, we observe the overflow around 5-8 minutes.

Actual results:
After check the code, it seems there is idle and iowait ticks overflow which is introduced by Michal's above 3 fixes. And we found this issue is already fixed by upstream.

The core fix is,
procfs: do not overflow get_{idle,iowait}_time for nohz

We backport this patch, and corresponding implementation of nsecs_to_jiffies64(), run the fix for more then 12 hours, no overflow and mistaken CPU utilization number reported.
Comment 3 colyli 2012-03-28 00:40:28 EDT
running on 14 servers for 5+ days, NO report for idle & iowait ticks overflow.
Comment 4 RHEL Product and Program Management 2012-05-03 01:24:49 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 5 colyli 2012-05-03 02:16:59 EDT
FYI, up to today, there are 20+ servers running 40+ days, NO report for idle & iowait ticks overflow
Comment 6 Prarit Bhargava 2013-01-15 09:46:02 EST
(In reply to comment #5)
> FYI, up to today, there are 20+ servers running 40+ days, NO report for idle
> & iowait ticks overflow

Since this BZ hasn't been updated recently and this is the last comment from the reporter, I'm closing this as NOTABUG for now.

If this is still an issue, please reopen the BZ.

P.

Note You need to log in before you can comment on or make changes to this bug.