Bug 805392

Summary: idle & iowait ticks overflow
Product: Red Hat Enterprise Linux 6 Reporter: colyli
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: eguan, Jes.Sorensen, prarit
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-15 14:46:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description colyli 2012-03-21 06:44:09 UTC
Description of problem:
under /proc/stat, idle and iowait ticks overflows, which results top/sar reporting high CPU utilization.
NOTE: This bug is fixed, the patch will be mentioned in bellowed information.


Version-Release number of selected component (if applicable):
2.6.32-220.7.1

How reproducible:

This bug is introduced after we merged 3 upstream patches to fix iowait accouting problem,
1)nohz: Fix update_ts_time_stat idle accounting
2)nohz: Make idle/iowait counter update conditional
3)Consider NO_HZ when printing idle and iowait times

After using these patches, we observed abnormal CPU utilization numbers, which are,
1) all user/sys/idle/iowait are 0%
2) user util% are more than 200%

After some debug, it seems when uptime more then 2 hours, idle ticks on some CPU core from /proc/stat are observed being decreased, for 16 core machine, we observe the overflow around 5-8 minutes.

Actual results:
After check the code, it seems there is idle and iowait ticks overflow which is introduced by Michal's above 3 fixes. And we found this issue is already fixed by upstream.

The core fix is,
procfs: do not overflow get_{idle,iowait}_time for nohz

We backport this patch, and corresponding implementation of nsecs_to_jiffies64(), run the fix for more then 12 hours, no overflow and mistaken CPU utilization number reported.

Comment 3 colyli 2012-03-28 04:40:28 UTC
running on 14 servers for 5+ days, NO report for idle & iowait ticks overflow.

Comment 4 RHEL Program Management 2012-05-03 05:24:49 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 colyli 2012-05-03 06:16:59 UTC
FYI, up to today, there are 20+ servers running 40+ days, NO report for idle & iowait ticks overflow

Comment 6 Prarit Bhargava 2013-01-15 14:46:02 UTC
(In reply to comment #5)
> FYI, up to today, there are 20+ servers running 40+ days, NO report for idle
> & iowait ticks overflow

Since this BZ hasn't been updated recently and this is the last comment from the reporter, I'm closing this as NOTABUG for now.

If this is still an issue, please reopen the BZ.

P.