Bug 473824
Summary: | wrong CPU idle (ssCpuIdle.0) reported after certain uptime | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Aleksandar Ivanisevic <alex> |
Component: | net-snmp | Assignee: | Jan Safranek <jsafrane> |
Status: | CLOSED WORKSFORME | QA Contact: | BaseOS QE <qe-baseos-auto> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.2 | CC: | johnny.agarwal |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-03-30 12:33:13 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Aleksandar Ivanisevic
2008-11-30 22:28:47 UTC
(In reply to comment #0) > wait for 62 days of uptime and query Please post content of your /proc/stat when 1) the bug appears. 2) after reboot and one minute startup (i.e. when ssCpu* has correct value). Both should be from the same machine. And is there anything unusual in syslog regarding net-snmp when the bug appears? It seems to me some counter overflows 2^32 after 60 days and the computation of idle percentage breaks. 80 days with working ssCpuIdle could be explained by higher load (->the 'something' does not overflow) or different HW (kernel reports CPU usage in 'ticks', which are recomputed to seconds by net-snmp) or 64-bit architecture. But that's just a speculation, please post the required files. Thanks in advance. RHEL 5.3 should fix some bugs regarding CPU stats, see bug #431439. Please test it when it comes out and report the results. Do not forget to attach aforementioned files. Thanks! it happened again after 62 days 4hours uptime ~ sudo cat /proc/stat cpu 174327152 423956 10389839 1925373606 35746353 963097 2265005 0 cpu0 16877900 93477 2509464 500166579 17317693 77967 329262 0 cpu1 113958941 160479 4613370 403392730 12709212 876870 1660567 0 cpu2 20374801 85555 1452145 512639830 2706919 2492 110580 0 cpu3 23115509 84444 1814858 509174465 3012526 5767 164593 0 intr 7360095477 1079340996 4 0 0 2216 0 3 0 2 1 0 0 5 0 47796682 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1813943925 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124044347 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 2940907571 btime 1222935383 processes 8119195 procs_running 2 procs_blocked 2 after snmpd restart support@BTI05 ~ sudo cat /proc/stat cpu 174331198 423956 10390614 1925381976 35750924 963122 2265085 0 cpu0 16878400 93477 2509627 500169382 17318687 77967 329268 0 cpu1 113960810 160479 4613488 403394706 12709675 876884 1660594 0 cpu2 20375632 85555 1452344 512642018 2708157 2494 110590 0 cpu3 23116354 84444 1815154 509175868 3014403 5775 164632 0 intr 7360196266 1079385672 4 0 0 2216 0 3 0 2 1 0 0 5 0 47797078 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1813968985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124075004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 2940998632 btime 1222935383 processes 8119463 procs_running 1 procs_blocked 1 after 1 minute when ssCpuIdle has dropped to 0 again ~ sudo cat /proc/stat cpu 174333823 423956 10390976 1925401091 35751516 963137 2265115 0 cpu0 16878546 93477 2509693 500174574 17318964 77968 329272 0 cpu1 113962732 160479 4613589 403398197 12709812 876898 1660616 0 cpu2 20375827 85555 1452404 512647351 2708253 2494 110591 0 cpu3 23116717 84444 1815290 509180968 3014487 5775 164635 0 intr 7360280114 1079442529 4 0 0 2216 0 3 0 2 1 0 0 5 0 47797582 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1813993511 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124076965 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 2941043733 btime 1222935383 processes 8120233 procs_running 1 procs_blocked 1 on the previous machine it went back to normal after cca 6 hours i'll post /proc/stat when it clears on this machine it went back to normal after 7 days 6 hours +- 10 mins for the monitoring interval [03-12-2008 12:38:56] - [10-12-2008 18:49:50] ~ uptime 21:00:25 up 69 days, 11:44, 1 user, load average: 0.72, 0.71, 0.66 ~ cat /proc/stat cpu 195438207 486644 11589275 2150323340 40120709 1074359 2524496 0 cpu0 18932950 107639 2798908 558639532 19453192 88928 368199 0 cpu1 127586743 184488 5150227 450381671 14260222 976198 1849627 0 cpu2 22875500 97165 1618030 572646408 3026544 2732 122949 0 cpu3 26043012 97351 2022108 568655728 3380750 6500 183720 0 intr 8212593290 1709579674 4 0 0 2266 0 3 0 2 1 0 0 5 0 53402209 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2014998756 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 139643074 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 3258174414 btime 1222935383 processes 8919588 procs_running 2 procs_blocked 1 Hi, in hope of solving this problem I have tried to upgrade to net-snmp 5.4.1 from fedora 9, only to find out that ssCpuIdle,etc. are declared unreliable and have been completely removed. I have since switched to using ssCpuRaw* as recommended on many places on the net. I guess this bug should be closed as it is highly unlikely that it will ever be fixed. Net-snmp in RHEL 5.3 was upgraded regarding CPU statistics, you might give it a try. You are right, the ssCpu averages are deprecated and ssCpuRaw are better way to read CPU load. Hello, I see the below note from Jan but we see many entries with "ssCpuRaw*" in ucd-mib and could not find which one exactly indicates CPU Usage. Jan Safranek 2009-03-30 08:33:13 EDT Net-snmp in RHEL 5.3 was upgraded regarding CPU statistics, you might give it a try. You are right, the ssCpu averages are deprecated and ssCpuRaw are better way to read CPU load. (In reply to comment #7) > Hello, > I see the below note from Jan but we see many entries with "ssCpuRaw*" in > ucd-mib and could not find which one exactly indicates CPU Usage. It should be sum of all ssCpuRaw* except ssCpuRawIdle, but there might be some exceptions (e.g. ssCpuRawSystem might already include also ssCpuRawWait and 'ssCpuRawKernel). Check Net-SNMP documentation at http://www.net-snmp.org/docs/mibs/ucdavis.html#ssCpuRawUser. |