Bug 619311 - RHEL 5.4/5.5 i386 guest 100% CPU utilization under RHEV 2.2
Summary: RHEL 5.4/5.5 i386 guest 100% CPU utilization under RHEV 2.2
Keywords:
Status: CLOSED DUPLICATE of bug 570824
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Glauber Costa
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier1
TreeView+ depends on / blocked
 
Reported: 2010-07-29 09:10 UTC by Dan Yasny
Modified: 2013-07-02 07:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-26 12:54:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kvm_stat log (139.89 KB, text/plain)
2010-07-29 09:10 UTC, Dan Yasny
no flags Details
Get the average and maximum of each field in kvm_stat (14.71 KB, text/html)
2010-08-11 07:20 UTC, Mark Wu
no flags Details

Description Dan Yasny 2010-07-29 09:10:42 UTC
Created attachment 435231 [details]
kvm_stat log

Description of problem:
centos 5.4 i686 guest ends in 100% CPU utilization couple of minutes after starting, but no application traffic,
all i/o components are virtio
only hard stop can reset the machine, no shutdown, no console nor ssh connection is possible

Also a former rhel 5.5 guest was affected with the 100% vcpu util issue

Tested with both RHEL and CentOS guests


If the vm is in state, then the console is completly blank, no key input is accepted.

One of the first guests was rhel 5.5 i686 with the same issue, therefor we decided to rollback to centos 5.4, but with the same issue. Otherwise we have one more centos 5.4 i686 and one rhel 5.5 x64, these two guest are running fine. 


Version-Release number of selected component (if applicable):
# rpm -qa |grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-164.el5_5.12
kvm-83-164.el5_5.12
kvm-qemu-img-83-164.el5_5.12
kmod-kvm-83-164.el5_5.12
kvm-tools-83-164.el5_5.12

RHEV 2.2 GA

How reproducible:
Always

Steps to Reproduce:
1.as described above
2.
3.
  
Actual results:
VMs hanging

Expected results:
No hanging VMs

Additional info:
info registers:
EAX=69c00000 EBX=00000000 ECX=00000016 EDX=0008093f

ESI=80000000 EDI=000ee6b2 EBP=c068f320 ESP=c0768f58

EIP=c042d02a EFL=00000087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =007b 00000000 ffffffff 00c0f300

CS =0060 00000000 ffffffff 00c09b00

SS =0068 00000000 ffffffff 00c09300

DS =007b 00000000 ffffffff 00c0f300

FS =0000 00000000 ffffffff 00000000

GS =0000 00000000 ffffffff 00000000

LDT=0088 c0746020 00000027 00008200

TR =0080 c1803a80 00002073 00008b00

GDT=     c1812000 000000ff

IDT=     c06f6000 000007ff

CR0=8005003b CR2=b7f30000 CR3=37c3b000 CR4=000006d0

DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 

DR6=ffff0ff0 DR7=00000400

FCW=037f FSW=0120 [ST=0] FTW=00 MXCSR=00000000

FPR0=0000000000000000 0000 FPR1=0000000000000000 0000

FPR2=0000000000000000 0000 FPR3=0000000000000000 0000

FPR4=e000000000000000 4000 FPR5=d400000000000000 c004

FPR6=f424000000000000 4012 FPR7=0000000000000000 0000

XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000

XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000

XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000

XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000


kvm_stat log Attached

Comment 2 Avi Kivity 2010-07-29 12:40:19 UTC
Please request a kvmtrace log so we can see what is happening.

What's the exact kernel rpm version in the guest?

Comment 3 Dan Yasny 2010-07-29 12:53:08 UTC
(In reply to comment #2)
> Please request a kvmtrace log so we can see what is happening.
> 
What params do you need me to run kvmtrace with?

> What's the exact kernel rpm version in the guest?    

 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 4 Avi Kivity 2010-07-29 13:09:24 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Please request a kvmtrace log so we can see what is happening.
> > 
> What params do you need me to run kvmtrace with?
> 

kvmtrace -w 1 -o /tmp/aaa

Comment 5 Avi Kivity 2010-07-29 13:31:51 UTC
Please also provide the contents of /proc/modules in the guest.

Comment 6 Dan Yasny 2010-07-29 14:35:37 UTC
(In reply to comment #5)
> Please also provide the contents of /proc/modules in the guest.    

I'll ask for it, but it might be tricky if the VM just hangs

Comment 7 Dan Yasny 2010-08-09 13:13:02 UTC
the SA Malte Menkhoff has updated the RHEH-H systems of the affected cluster
(rhevh-b-node03 + rhev-b-node04) to the actual version 5.5-2.2.5.2

the systems with the 100% CPU issues was isolated to the host rhev-b-node03
(host list:
bvbe-luv-xline01 => production centos 5.4 system (with VNC)
bvbe-luv-archiv1 => production centos 5.4 system (with SPICE)
vrhel55i686 => testsystem RHEL 5.5 system (with VNC)
vrhel55i686_2 => testsystem RHEL 5.5 system (with SPICE)
with the option to be no HA systems and with prefered host rhevh-b-node03

the non-linux systems (one WinXP + one WIn2k8 Server) was migrated to host rhevh-b-node04 but for availability reasons with active HA

for every linux system is a cat /proc/module created => see tar.bz on dropbox.redhat.com

on the host rhevh-b-node03 was setup an kvmtrace for the moment that one guest goes to the 100% CPU situation - with an interval of 600 seconds.
after the 5th time this was successful - during 540 and 600 the guest vrhel55i686 goes to the state.

the necessary kvmtraces were created with:  
# kvmtrace -o outfile -w 600
and can be found in the tar.bz2 file on dropbox

the file on cropbox is called:
2040792_BDB_20100805_full.tar.bz2

the content is:
bvbe-luv-xline01.log (cat /proc/modeules of one centos system)
bvbw-luv-archiv01.modules.log (cat /proc/modeules of one centos system)
vrhel55i686.modules.log (cat /proc/modules of the rhel system with the 100CPU)
_vrhel55i686_2.modules.log (cat /proc/modules of one rhel reference system)

info_reg.rhevh-b-node03.bdb.local.20100805.txt => requested info_reg of the affected RHEV
kvm_stat.rhevh-b-node03.bdb.local.20100805.txt => 5min kvm_stat after the guest was gone in the 100%CPU state of the RHEV-H rhevh-b-node03

the kvmtraces:
rhevh-b-node03.kvmtrace_600.log.kvmtrace.0
rhevh-b-node03.kvmtrace_600.log.kvmtrace.1
rhevh-b-node03.kvmtrace_600.log.kvmtrace.2
rhevh-b-node03.kvmtrace_600.log.kvmtrace.3
rhevh-b-node03.kvmtrace_600.log.kvmtrace.4
rhevh-b-node03.kvmtrace_600.log.kvmtrace.5
rhevh-b-node03.kvmtrace_600.log.kvmtrace.6
rhevh-b-node03.kvmtrace_600.log.kvmtrace.7
rhevh-b-node03.kvmtrace_600.log.kvmtrace.8
rhevh-b-node03.kvmtrace_600.log.kvmtrace.9
rhevh-b-node03.kvmtrace_600.log.kvmtrace.10
rhevh-b-node03.kvmtrace_600.log.kvmtrace.11

at the moment one centos system is also in the 100% CPU state. 

ftp://seg.rdu.redhat.com/dropbox/2040792_BDB_20100805_full.tar.bz2 holds the logs and traces

Comment 8 Mark Wu 2010-08-11 07:20:07 UTC
Created attachment 438094 [details]
Get the average and maximum of each field in kvm_stat

Comment 12 Glauber Costa 2010-08-26 12:54:52 UTC

*** This bug has been marked as a duplicate of bug 570824 ***


Note You need to log in before you can comment on or make changes to this bug.