Bug 608981

Summary: 'top' reports doubled CPU utilization on dual Opteron 2354 under RHEL 5.5
Product: Red Hat Enterprise Linux 5 Reporter: starlight
Component: kernelAssignee: Stefan Assmann <sassmann>
Status: CLOSED NEXTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: bmr, bugzilla, jarod, jkysela, mschmidt, prarit, sassmann
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-17 14:17:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci output
none
lspci -vvv output
none
'dmidecode' output from Tyan B2912 server
none
full 'dmesg' output
none
dmesg_tyan_s2912_2.6.18-194.11.1.el5.txt none

Description starlight 2010-06-29 05:41:38 UTC
Description of problem:

'top' reports doubled value for CPU consumption on dual-socket Opteron 2354 server.  Server is Tyan GT24-B2912, has 16GB RAM.  Do not see this behavior on newer Intel X5560 dual-socket Supermicro.

Version-Release number of selected component (if applicable):

RHEL 5.5
kernel 2.6.18-194.3.1.el5

How reproducible:

Run "while true; continue; done" from command prompt in one window and observer 'top' running Irix mode in another window.  Notice that 200% CPU utilization is reported.  'vmstat' shows correct 12.5% utilization.

Actual results:

200% CPU utilization reported

Expected results:

100% CPU utilization reported

Additional info:

First noticed this after observing 'pbzip' reported as using 1600% CPU rather than 800%.

Comment 1 starlight 2010-07-08 15:53:06 UTC
Also observed extremely poor scheduler latency relative to prior kernel.  Going back to old version.

Comment 2 starlight 2010-07-09 16:39:36 UTC
tested kernel 2.6.18-194.8.1.el5

bug is still present

Comment 3 starlight 2010-08-28 23:47:47 UTC
tested kernel 2.6.18-194.11.1.el5

problem still there

However just loaded 2.6.18-194.11.1.el5 on a HP
DL165 and the problem is not present.  Difference
between the Tyan and the HP is that Tyan has
Nvidia chipset and HP has Broadcom/Serverworks
chipset.  So probably it has something to do with
the Nvidia chipset support drivers.

Exact chipset is nVIDIA NFP3600.

http://tyan.com/product_board_spec.aspx?pid=157

Comment 4 starlight 2010-08-28 23:49:22 UTC
Created attachment 441744 [details]
lspci output

Comment 5 starlight 2010-08-28 23:49:44 UTC
Created attachment 441745 [details]
lspci -vvv output

Comment 6 Chris Schanzle 2010-10-07 22:32:54 UTC
I found turning off C2+C3 states in the BIOS has eliminated the CPU double-accounting problem on a couple different AMD servers we have.  Don't know much about the side-effects of doing this, but in three cases, performance improved significantly with the ondemand governor as some cores never increased their clock rate when C2+C3 was enabled.

Comment 7 starlight 2010-10-08 00:21:04 UTC
ACPI power management has always been disabled on our Tyan S2912 where the problem appears.

Hello Red Hat!  Any plans to ever look into this rather serious regression?

Comment 8 Jarod Wilson 2010-10-08 02:16:02 UTC
(In reply to comment #7)
> ACPI power management has always been disabled on our Tyan S2912 where the
> problem appears.
> 
> Hello Red Hat!  Any plans to ever look into this rather serious regression?

Regression versus what? Have you identified a kernel version where this worked?

The 194.x.y kernels aren't likely to have a fix for your issue just suddenly crop up, as these updates are typically for cve's and panic fixes. The development kernels are located elsewhere. If you're looking to try something newer that might have a fix, wander over here:

http://people.redhat.com/~jwilson/el5/

Comment 9 starlight 2010-10-08 02:39:02 UTC
Hmm.  Seemed so obvious that it never occurred to me
to state it.  Every kernel prior to RHEL 5.5 works fine,
the most recent we tried, and still run in production is

    2.6.18-164.6.1.el5

Don't want non-prod release.  We'll stick with old
kernel till it works.

Comment 10 Jarod Wilson 2010-10-08 14:51:08 UTC
(In reply to comment #9)
> Hmm.  Seemed so obvious that it never occurred to me
> to state it.  Every kernel prior to RHEL 5.5 works fine,
> the most recent we tried, and still run in production is
> 
>     2.6.18-164.6.1.el5

Okay, that's very helpful to know, thank you.

> Don't want non-prod release.  We'll stick with old
> kernel till it works.

Hrm, okay. Well, to have a shot at fixing this, I think we'll have to run some non-prod releases *somewhere*, be that on your end or on ours, if we can dig up a machine that reproduces the problem. Based on comment #3, it would seem to be nvidia-chipset-specific.

Prarit, is there someone in your team who would be an ideal candidate to dig into what seems to be an nvidia chipset issue? Not sure this should still be assigned to Bhavna, since it might not be an AMD issue, per se.

Comment 11 Prarit Bhargava 2010-10-08 15:21:57 UTC
> Prarit, is there someone in your team who would be an ideal candidate to dig
> into what seems to be an nvidia chipset issue? Not sure this should still be
> assigned to Bhavna, since it might not be an AMD issue, per se.

It seems like everyone on the team's had their fingers in NVIDIA ;)  Jaroslav, Michal, me, etc..

I'll cc a few people and ask them to take a look.

P.

Comment 13 Prarit Bhargava 2010-10-08 16:06:47 UTC
Any chance you could do a 'dmidecode' and send us the output?  I'd like to see if we have a similar system available in our test lab.

Thanks,

P.

Comment 14 starlight 2010-10-08 17:43:08 UTC
Created attachment 452391 [details]
'dmidecode' output from Tyan B2912 server

Comment 16 starlight 2010-10-08 17:50:29 UTC
>Hrm, okay. Well, to have a shot at fixing this, I think we'll have to run some
>non-prod releases *somewhere*, be that on your end or on ours, if we can dig up
>a machine that reproduces the problem. Based on comment #3, it would seem to be
>nvidia-chipset-specific.

I am willing to try out fixes that are intended to specifically correct
this issue--it's easy enough to fire up a kernel, run 'top' and 
the simple shell loop during off-hours.  Just don't want to be stabbing
in the dark running random kernels hoping they will work.

Also willing to run a debug/trace setup to help identify the
issue if it's not a huge pain.

Comment 17 Prarit Bhargava 2010-10-08 18:07:36 UTC
(In reply to comment #16)
> >Hrm, okay. Well, to have a shot at fixing this, I think we'll have to run some
> >non-prod releases *somewhere*, be that on your end or on ours, if we can dig up
> >a machine that reproduces the problem. Based on comment #3, it would seem to be
> >nvidia-chipset-specific.
> 
> I am willing to try out fixes that are intended to specifically correct
> this issue--it's easy enough to fire up a kernel, run 'top' and 
> the simple shell loop during off-hours.  Just don't want to be stabbing
> in the dark running random kernels hoping they will work.

We're not planning on asking you to do that.

However, we may ask you boot and run some kernels in order for us to narrow down the specific build that this problem occurred in.

What was the last pre-5.5 kernel you booted that you know worked?

P.

Comment 18 starlight 2010-10-08 18:10:13 UTC
as stated, 2.6.18-164.6.1.el5

Comment 19 Jarod Wilson 2010-10-08 19:00:44 UTC
(In reply to comment #16)
> >Hrm, okay. Well, to have a shot at fixing this, I think we'll have to run some
> >non-prod releases *somewhere*, be that on your end or on ours, if we can dig up
> >a machine that reproduces the problem. Based on comment #3, it would seem to be
> >nvidia-chipset-specific.
> 
> I am willing to try out fixes that are intended to specifically correct
> this issue--it's easy enough to fire up a kernel, run 'top' and 
> the simple shell loop during off-hours.  Just don't want to be stabbing
> in the dark running random kernels hoping they will work.

Yeah, no, not what I was meaning to suggest. But giving the latest 5.6 development kernel a quick spin could be enlightening. Its *possible* the issue has already been fixed, as we're several hundred patches newer than the 5.5 kernel by now, which includes some timer-related fixes and various process accounting patches. None of them, from what I recall, specifically said they'd fix your very issue, but its possible they did. If so, we may be able to simply cherry-pick the fix for a 5.5 build. If not, we know we've also got to fix 5.6 -- we don't want to fix it for 5.5, only to have the problem crop back up in 5.6.

Comment 20 starlight 2010-10-08 19:28:42 UTC
Ok, I can try the new kernel sometime in next week or so to see if it fixes it.

Are there no Opteron servers at RH with the Nvidia chipset?  Both Supermicro and Tyan made a bunch of them--thousands or tens of thousands I'm sure.  Probably HP did not, but I wouldn't be surprised if Dell or IBM did.

Comment 21 Prarit Bhargava 2010-10-08 19:33:16 UTC
(In reply to comment #20)
> Ok, I can try the new kernel sometime in next week or so to see if it fixes it.
> 
> Are there no Opteron servers at RH with the Nvidia chipset?  

We do have them.  The concern is that it may be a combination of the mobo + chipset + processor that is causing the problem.  That specific set of hardware will likely be difficult to track down.

Thanks for you offer of helping with debugging -- we appreciate that we're asking you to test on production servers and that isn't an easy thing to do.

P.

Comment 22 starlight 2010-10-09 04:49:17 UTC
Tried 2.6.18-225.el5 and the problem is corrected in this kernel.

Sorry for being skeptical here.  The upstream guys will
waste days of one's time with no thought or consideration,
and I've been burned by this a couple of times when
attempting to try out vanilla kernels--truly a bad idea.
The result is that I'm a bit cautious.

Comment 23 starlight 2010-10-09 04:58:59 UTC
Created attachment 452472 [details]
full 'dmesg' output

Curious 'dmesg' entries that might be relevent:

NFORCE-MCP55: IDE controller at PCI slot 0000:00:04.0
NFORCE-MCP55: chipset revision 161
NFORCE-MCP55: not 100% native mode: will probe irqs later
NFORCE-MCP55: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE-MCP55: 0000:00:04.0 (rev a1) UDMA133 controller

Comment 25 Michal Schmidt 2010-10-11 13:47:45 UTC
'dmesg' from the buggy kernel version for would be interesting to compare.
Perhaps it's using a different clocksource? Do the contents of /sys/devices/system/clocksource/*/current_clocksource differ between the kernels?

Comment 27 starlight 2010-10-11 13:54:43 UTC
will post it tomorrow

Comment 28 Stefan Assmann 2010-10-11 14:10:06 UTC
how long does it take to reproduce the problem? Does it show up immediately in top or does it take a while? So far the only thing I see is:
Tasks: 227 total,   2 running, 225 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.3%us,  0.0%sy,  0.0%ni, 91.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4051336k total,  1537740k used,  2513596k free,   106936k buffers
Swap:  6094840k total,        0k used,  6094840k free,  1177296k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4584 root      25   0 66068 1628 1272 R  100  0.0  44:06.51 bash


Could be that the machine (MCP55) I'm using doesn't show the problem, but I want to make sure I'm not missing anything.

Comment 29 starlight 2010-10-11 18:21:20 UTC
Bug is binary, either present or not present.  200% or 100%.  Very simple to reproduce when a system is affected.

Comment 30 starlight 2010-10-11 18:49:02 UTC
Created attachment 452774 [details]
dmesg_tyan_s2912_2.6.18-194.11.1.el5.txt

dmesg output from system affected by 'top' CPU doubled reporting issue

Comment 31 Stefan Assmann 2010-10-13 11:59:43 UTC
I've tried all kinds of MCP55+Opteron machines we have available, none of them shows the behaviour described.

Could you please test the kernel from
http://people.redhat.com/jwilson/el5/215.el5/

I hope you're okay to test some kernels. It cuts down the amount of commits we need to consider by a lot. Thanks!

Comment 33 starlight 2010-10-13 17:04:41 UTC
Ok.  Will be a couple of days till I can do it.

Did you check with the other individual who has this problem?  Maybe 
they can identify some other systems with the issue.  He did not 
appear to subscribe to the bug so you would have to e-mail him 
directly.

Comment 34 starlight 2010-10-16 20:52:43 UTC
Ok, I get the hint.  Bisected it for you:

kernel-2.6.18-225.el5  good
kernel-2.6.18-215.el5  bad
kernel-2.6.18-222.el5  bad
kernel-2.6.18-223.el5  good

So whatever the patch is that fixes it was added in 223.

Comment 36 Stefan Assmann 2010-10-18 13:02:04 UTC
Thank you for testing these kernel. Additionally could you please confirm that the kernel from http://people.redhat.com/jwilson/el5/227.el5/ also works?
Testing with this kernel might isolate the responsible patch.

Comment 39 starlight 2010-10-24 01:28:07 UTC
kernel 2.6.18-227.el5 good

Comment 40 starlight 2010-11-11 04:05:03 UTC
Tried running 2.6.18-229.el5 in production and it works fine.  So I suppose generating a patch for 2.6.18-194 can be skipped unless someone else has the problem and is allergic to unsupported kernels.