Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 619311

Summary:

RHEL 5.4/5.5 i386 guest 100% CPU utilization under RHEV 2.2

Product:

Red Hat Enterprise Linux 5

Reporter:

Dan Yasny <dyasny>

Component:

kvm

Assignee:

Glauber Costa <gcosta>

Status:

CLOSED DUPLICATE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.5

CC:

acathrow, gcosta, gleb, knoel, mkenneth, tao, virt-maint, ykaul

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-08-26 12:54:52 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

580949

Attachments:

Description	Flags
kvm_stat log	none
Get the average and maximum of each field in kvm_stat	none

Description Dan Yasny 2010-07-29 09:10:42 UTC

Created attachment 435231 [details]
kvm_stat log

Description of problem:
centos 5.4 i686 guest ends in 100% CPU utilization couple of minutes after starting, but no application traffic,
all i/o components are virtio
only hard stop can reset the machine, no shutdown, no console nor ssh connection is possible

Also a former rhel 5.5 guest was affected with the 100% vcpu util issue

Tested with both RHEL and CentOS guests


If the vm is in state, then the console is completly blank, no key input is accepted.

One of the first guests was rhel 5.5 i686 with the same issue, therefor we decided to rollback to centos 5.4, but with the same issue. Otherwise we have one more centos 5.4 i686 and one rhel 5.5 x64, these two guest are running fine. 


Version-Release number of selected component (if applicable):
# rpm -qa |grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-164.el5_5.12
kvm-83-164.el5_5.12
kvm-qemu-img-83-164.el5_5.12
kmod-kvm-83-164.el5_5.12
kvm-tools-83-164.el5_5.12

RHEV 2.2 GA

How reproducible:
Always

Steps to Reproduce:
1.as described above
2.
3.
  
Actual results:
VMs hanging

Expected results:
No hanging VMs

Additional info:
info registers:
EAX=69c00000 EBX=00000000 ECX=00000016 EDX=0008093f

ESI=80000000 EDI=000ee6b2 EBP=c068f320 ESP=c0768f58

EIP=c042d02a EFL=00000087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =007b 00000000 ffffffff 00c0f300

CS =0060 00000000 ffffffff 00c09b00

SS =0068 00000000 ffffffff 00c09300

DS =007b 00000000 ffffffff 00c0f300

FS =0000 00000000 ffffffff 00000000

GS =0000 00000000 ffffffff 00000000

LDT=0088 c0746020 00000027 00008200

TR =0080 c1803a80 00002073 00008b00

GDT=     c1812000 000000ff

IDT=     c06f6000 000007ff

CR0=8005003b CR2=b7f30000 CR3=37c3b000 CR4=000006d0

DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 

DR6=ffff0ff0 DR7=00000400

FCW=037f FSW=0120 [ST=0] FTW=00 MXCSR=00000000

FPR0=0000000000000000 0000 FPR1=0000000000000000 0000

FPR2=0000000000000000 0000 FPR3=0000000000000000 0000

FPR4=e000000000000000 4000 FPR5=d400000000000000 c004

FPR6=f424000000000000 4012 FPR7=0000000000000000 0000

XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000

XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000

XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000

XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000


kvm_stat log Attached

Comment 2 Avi Kivity 2010-07-29 12:40:19 UTC

Please request a kvmtrace log so we can see what is happening.

What's the exact kernel rpm version in the guest?

Comment 3 Dan Yasny 2010-07-29 12:53:08 UTC

(In reply to comment #2)
> Please request a kvmtrace log so we can see what is happening.
> 
What params do you need me to run kvmtrace with?

> What's the exact kernel rpm version in the guest?    

 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 4 Avi Kivity 2010-07-29 13:09:24 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > Please request a kvmtrace log so we can see what is happening.
> > 
> What params do you need me to run kvmtrace with?
> 

kvmtrace -w 1 -o /tmp/aaa

Comment 5 Avi Kivity 2010-07-29 13:31:51 UTC

Please also provide the contents of /proc/modules in the guest.

Comment 6 Dan Yasny 2010-07-29 14:35:37 UTC

(In reply to comment #5)
> Please also provide the contents of /proc/modules in the guest.    

I'll ask for it, but it might be tricky if the VM just hangs

Comment 7 Dan Yasny 2010-08-09 13:13:02 UTC

the SA Malte Menkhoff has updated the RHEH-H systems of the affected cluster
(rhevh-b-node03 + rhev-b-node04) to the actual version 5.5-2.2.5.2

the systems with the 100% CPU issues was isolated to the host rhev-b-node03
(host list:
bvbe-luv-xline01 => production centos 5.4 system (with VNC)
bvbe-luv-archiv1 => production centos 5.4 system (with SPICE)
vrhel55i686 => testsystem RHEL 5.5 system (with VNC)
vrhel55i686_2 => testsystem RHEL 5.5 system (with SPICE)
with the option to be no HA systems and with prefered host rhevh-b-node03

the non-linux systems (one WinXP + one WIn2k8 Server) was migrated to host rhevh-b-node04 but for availability reasons with active HA

for every linux system is a cat /proc/module created => see tar.bz on dropbox.redhat.com

on the host rhevh-b-node03 was setup an kvmtrace for the moment that one guest goes to the 100% CPU situation - with an interval of 600 seconds.
after the 5th time this was successful - during 540 and 600 the guest vrhel55i686 goes to the state.

the necessary kvmtraces were created with:  
# kvmtrace -o outfile -w 600
and can be found in the tar.bz2 file on dropbox

the file on cropbox is called:
2040792_BDB_20100805_full.tar.bz2

the content is:
bvbe-luv-xline01.log (cat /proc/modeules of one centos system)
bvbw-luv-archiv01.modules.log (cat /proc/modeules of one centos system)
vrhel55i686.modules.log (cat /proc/modules of the rhel system with the 100CPU)
_vrhel55i686_2.modules.log (cat /proc/modules of one rhel reference system)

info_reg.rhevh-b-node03.bdb.local.20100805.txt => requested info_reg of the affected RHEV
kvm_stat.rhevh-b-node03.bdb.local.20100805.txt => 5min kvm_stat after the guest was gone in the 100%CPU state of the RHEV-H rhevh-b-node03

the kvmtraces:
rhevh-b-node03.kvmtrace_600.log.kvmtrace.0
rhevh-b-node03.kvmtrace_600.log.kvmtrace.1
rhevh-b-node03.kvmtrace_600.log.kvmtrace.2
rhevh-b-node03.kvmtrace_600.log.kvmtrace.3
rhevh-b-node03.kvmtrace_600.log.kvmtrace.4
rhevh-b-node03.kvmtrace_600.log.kvmtrace.5
rhevh-b-node03.kvmtrace_600.log.kvmtrace.6
rhevh-b-node03.kvmtrace_600.log.kvmtrace.7
rhevh-b-node03.kvmtrace_600.log.kvmtrace.8
rhevh-b-node03.kvmtrace_600.log.kvmtrace.9
rhevh-b-node03.kvmtrace_600.log.kvmtrace.10
rhevh-b-node03.kvmtrace_600.log.kvmtrace.11

at the moment one centos system is also in the 100% CPU state. 

ftp://seg.rdu.redhat.com/dropbox/2040792_BDB_20100805_full.tar.bz2 holds the logs and traces

Comment 8 Mark Wu 2010-08-11 07:20:07 UTC

Created attachment 438094 [details]
Get the average and maximum of each field in kvm_stat

Comment 12 Glauber Costa 2010-08-26 12:54:52 UTC


*** This bug has been marked as a duplicate of bug 570824 ***