Created attachment 435231 [details]
Description of problem:
centos 5.4 i686 guest ends in 100% CPU utilization couple of minutes after starting, but no application traffic,
all i/o components are virtio
only hard stop can reset the machine, no shutdown, no console nor ssh connection is possible
Also a former rhel 5.5 guest was affected with the 100% vcpu util issue
Tested with both RHEL and CentOS guests
If the vm is in state, then the console is completly blank, no key input is accepted.
One of the first guests was rhel 5.5 i686 with the same issue, therefor we decided to rollback to centos 5.4, but with the same issue. Otherwise we have one more centos 5.4 i686 and one rhel 5.5 x64, these two guest are running fine.
Version-Release number of selected component (if applicable):
# rpm -qa |grep kvm
RHEV 2.2 GA
Steps to Reproduce:
1.as described above
No hanging VMs
EAX=69c00000 EBX=00000000 ECX=00000016 EDX=0008093f
ESI=80000000 EDI=000ee6b2 EBP=c068f320 ESP=c0768f58
EIP=c042d02a EFL=00000087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300
CS =0060 00000000 ffffffff 00c09b00
SS =0068 00000000 ffffffff 00c09300
DS =007b 00000000 ffffffff 00c0f300
FS =0000 00000000 ffffffff 00000000
GS =0000 00000000 ffffffff 00000000
LDT=0088 c0746020 00000027 00008200
TR =0080 c1803a80 00002073 00008b00
GDT= c1812000 000000ff
IDT= c06f6000 000007ff
CR0=8005003b CR2=b7f30000 CR3=37c3b000 CR4=000006d0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
FCW=037f FSW=0120 [ST=0] FTW=00 MXCSR=00000000
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=e000000000000000 4000 FPR5=d400000000000000 c004
FPR6=f424000000000000 4012 FPR7=0000000000000000 0000
kvm_stat log Attached
Please request a kvmtrace log so we can see what is happening.
What's the exact kernel rpm version in the guest?
(In reply to comment #2)
> Please request a kvmtrace log so we can see what is happening.
What params do you need me to run kvmtrace with?
> What's the exact kernel rpm version in the guest?
2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
(In reply to comment #3)
> (In reply to comment #2)
> > Please request a kvmtrace log so we can see what is happening.
> What params do you need me to run kvmtrace with?
kvmtrace -w 1 -o /tmp/aaa
Please also provide the contents of /proc/modules in the guest.
(In reply to comment #5)
> Please also provide the contents of /proc/modules in the guest.
I'll ask for it, but it might be tricky if the VM just hangs
the SA Malte Menkhoff has updated the RHEH-H systems of the affected cluster
(rhevh-b-node03 + rhev-b-node04) to the actual version 5.5-18.104.22.168
the systems with the 100% CPU issues was isolated to the host rhev-b-node03
bvbe-luv-xline01 => production centos 5.4 system (with VNC)
bvbe-luv-archiv1 => production centos 5.4 system (with SPICE)
vrhel55i686 => testsystem RHEL 5.5 system (with VNC)
vrhel55i686_2 => testsystem RHEL 5.5 system (with SPICE)
with the option to be no HA systems and with prefered host rhevh-b-node03
the non-linux systems (one WinXP + one WIn2k8 Server) was migrated to host rhevh-b-node04 but for availability reasons with active HA
for every linux system is a cat /proc/module created => see tar.bz on dropbox.redhat.com
on the host rhevh-b-node03 was setup an kvmtrace for the moment that one guest goes to the 100% CPU situation - with an interval of 600 seconds.
after the 5th time this was successful - during 540 and 600 the guest vrhel55i686 goes to the state.
the necessary kvmtraces were created with:
# kvmtrace -o outfile -w 600
and can be found in the tar.bz2 file on dropbox
the file on cropbox is called:
the content is:
bvbe-luv-xline01.log (cat /proc/modeules of one centos system)
bvbw-luv-archiv01.modules.log (cat /proc/modeules of one centos system)
vrhel55i686.modules.log (cat /proc/modules of the rhel system with the 100CPU)
_vrhel55i686_2.modules.log (cat /proc/modules of one rhel reference system)
info_reg.rhevh-b-node03.bdb.local.20100805.txt => requested info_reg of the affected RHEV
kvm_stat.rhevh-b-node03.bdb.local.20100805.txt => 5min kvm_stat after the guest was gone in the 100%CPU state of the RHEV-H rhevh-b-node03
at the moment one centos system is also in the 100% CPU state.
ftp://seg.rdu.redhat.com/dropbox/2040792_BDB_20100805_full.tar.bz2 holds the logs and traces
Created attachment 438094 [details]
Get the average and maximum of each field in kvm_stat
*** This bug has been marked as a duplicate of bug 570824 ***