Bug 473824

Summary:	wrong CPU idle (ssCpuIdle.0) reported after certain uptime
Product:	Red Hat Enterprise Linux 5	Reporter:	Aleksandar Ivanisevic <alex>
Component:	net-snmp	Assignee:	Jan Safranek <jsafrane>
Status:	CLOSED WORKSFORME	QA Contact:	BaseOS QE <qe-baseos-auto>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5.2	CC:	johnny.agarwal
Target Milestone:	rc
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-03-30 12:33:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Aleksandar Ivanisevic 2008-11-30 22:28:47 UTC

Description of problem:

after cca 62 days of uptime all ssCpuIdle.0 ssCpuUser.0 ssCpuSystem.0 are zero

interestingly enough, some machines that are up more than 80 days do not have this problem

after snmpd restart values are ok for 1 minute and then ssCpuIdle goes back to zero again 

Version-Release number of selected component (if applicable):

5.3.1-24.el5_2.1 and .2 

How reproducible:

wait for 62 days of uptime and query 

while `sleep 5`; do snmpget -v2c -c xxx localhost ssCpuIdle.0 ssCpuUser.0 ssCpuSystem.0; done
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0 

Steps to Reproduce:
1.
2.
3.
  
Actual results:

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0

Expected results:

UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99 on an idle machine

Additional info:

seems to be fixed by reboot

Comment 1 Jan Safranek 2008-12-01 12:40:26 UTC

(In reply to comment #0)
> wait for 62 days of uptime and query 

Please post content of your /proc/stat when
1) the bug appears.
2) after reboot and one minute startup (i.e. when ssCpu* has correct value).
Both should be from the same machine.

And is there anything unusual in syslog regarding net-snmp when the bug appears?

It seems to me some counter overflows 2^32 after 60 days and the computation of idle percentage breaks. 80 days with working ssCpuIdle could be explained by higher load (->the 'something' does not overflow) or different HW (kernel reports CPU usage in 'ticks', which are recomputed to seconds by net-snmp) or 64-bit architecture. But that's just a speculation, please post the required files. Thanks in advance.

Comment 2 Jan Safranek 2008-12-02 14:01:54 UTC

RHEL 5.3 should fix some bugs regarding CPU stats, see bug #431439. Please test it when it comes out and report the results. Do not forget to attach aforementioned files. Thanks!

Comment 3 Aleksandar Ivanisevic 2008-12-03 13:02:38 UTC

it happened again after 62 days 4hours uptime

 ~ sudo cat /proc/stat
cpu 174327152 423956 10389839 1925373606 35746353 963097 2265005 0
cpu0 16877900 93477 2509464 500166579 17317693 77967 329262 0
cpu1 113958941 160479 4613370 403392730 12709212 876870 1660567 0
cpu2 20374801 85555 1452145 512639830 2706919 2492 110580 0
cpu3 23115509 84444 1814858 509174465 3012526 5767 164593 0
intr 7360095477 1079340996 4 0 0 2216 0 3 0 2 1 0 0 5 0 47796682 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1813943925 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124044347 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2940907571
btime 1222935383
processes 8119195
procs_running 2
procs_blocked 2

after snmpd restart

support@BTI05 ~ sudo cat /proc/stat
cpu 174331198 423956 10390614 1925381976 35750924 963122 2265085 0
cpu0 16878400 93477 2509627 500169382 17318687 77967 329268 0
cpu1 113960810 160479 4613488 403394706 12709675 876884 1660594 0
cpu2 20375632 85555 1452344 512642018 2708157 2494 110590 0
cpu3 23116354 84444 1815154 509175868 3014403 5775 164632 0
intr 7360196266 1079385672 4 0 0 2216 0 3 0 2 1 0 0 5 0 47797078 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1813968985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124075004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2940998632
btime 1222935383
processes 8119463
procs_running 1
procs_blocked 1

after 1 minute when ssCpuIdle has dropped to 0 again

 ~ sudo cat /proc/stat
cpu 174333823 423956 10390976 1925401091 35751516 963137 2265115 0
cpu0 16878546 93477 2509693 500174574 17318964 77968 329272 0
cpu1 113962732 160479 4613589 403398197 12709812 876898 1660616 0
cpu2 20375827 85555 1452404 512647351 2708253 2494 110591 0
cpu3 23116717 84444 1815290 509180968 3014487 5775 164635 0
intr 7360280114 1079442529 4 0 0 2216 0 3 0 2 1 0 0 5 0 47797582 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1813993511 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124076965 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2941043733
btime 1222935383
processes 8120233
procs_running 1
procs_blocked 1

on the previous machine it went back to normal after cca 6 hours i'll post /proc/stat when it clears on this machine

Comment 4 Aleksandar Ivanisevic 2008-12-10 20:12:38 UTC

it went back to normal after 7 days 6 hours +- 10 mins for the monitoring interval

[03-12-2008 12:38:56] - [10-12-2008 18:49:50]

~ uptime
 21:00:25 up 69 days, 11:44, 1 user, load average: 0.72, 0.71, 0.66

~ cat /proc/stat
cpu 195438207 486644 11589275 2150323340 40120709 1074359 2524496 0
cpu0 18932950 107639 2798908 558639532 19453192 88928 368199 0
cpu1 127586743 184488 5150227 450381671 14260222 976198 1849627 0
cpu2 22875500 97165 1618030 572646408 3026544 2732 122949 0
cpu3 26043012 97351 2022108 568655728 3380750 6500 183720 0
intr 8212593290 1709579674 4 0 0 2266 0 3 0 2 1 0 0 5 0 53402209 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2014998756 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 139643074 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 3258174414
btime 1222935383
processes 8919588
procs_running 2
procs_blocked 1

Comment 5 Aleksandar Ivanisevic 2008-12-24 09:54:28 UTC

Hi,

in hope of solving this problem I have tried to upgrade to net-snmp 5.4.1 from fedora 9, only to find out that ssCpuIdle,etc. are declared unreliable and have been completely removed.

I have since switched to using ssCpuRaw* as recommended on many places on the net.

I guess this bug should be closed as it is highly unlikely that it will ever be fixed.

Comment 6 Jan Safranek 2009-03-30 12:33:13 UTC

Net-snmp in RHEL 5.3 was upgraded regarding CPU statistics, you might give it a try. You are right, the ssCpu averages are deprecated and ssCpuRaw are better way to read CPU load.

Comment 7 johnny agarwal 2011-02-09 19:15:39 UTC

Hello,
I see the below note from Jan but we see many entries with "ssCpuRaw*" in ucd-mib and could not find which one exactly indicates CPU Usage.







Jan Safranek      2009-03-30 08:33:13 EDT

Net-snmp in RHEL 5.3 was upgraded regarding CPU statistics, you might give it a
try. You are right, the ssCpu averages are deprecated and ssCpuRaw are better
way to read CPU load.

Comment 8 Jan Safranek 2011-02-23 14:55:04 UTC

(In reply to comment #7)
> Hello,
> I see the below note from Jan but we see many entries with "ssCpuRaw*" in
> ucd-mib and could not find which one exactly indicates CPU Usage.

It should be sum of all ssCpuRaw* except ssCpuRawIdle, but there might be some exceptions (e.g. ssCpuRawSystem might already include also ssCpuRawWait and
'ssCpuRawKernel). Check Net-SNMP documentation at http://www.net-snmp.org/docs/mibs/ucdavis.html#ssCpuRawUser.