Created attachment 435231 [details] kvm_stat log Description of problem: centos 5.4 i686 guest ends in 100% CPU utilization couple of minutes after starting, but no application traffic, all i/o components are virtio only hard stop can reset the machine, no shutdown, no console nor ssh connection is possible Also a former rhel 5.5 guest was affected with the 100% vcpu util issue Tested with both RHEL and CentOS guests If the vm is in state, then the console is completly blank, no key input is accepted. One of the first guests was rhel 5.5 i686 with the same issue, therefor we decided to rollback to centos 5.4, but with the same issue. Otherwise we have one more centos 5.4 i686 and one rhel 5.5 x64, these two guest are running fine. Version-Release number of selected component (if applicable): # rpm -qa |grep kvm etherboot-zroms-kvm-5.4.4-13.el5 kvm-debuginfo-83-164.el5_5.12 kvm-83-164.el5_5.12 kvm-qemu-img-83-164.el5_5.12 kmod-kvm-83-164.el5_5.12 kvm-tools-83-164.el5_5.12 RHEV 2.2 GA How reproducible: Always Steps to Reproduce: 1.as described above 2. 3. Actual results: VMs hanging Expected results: No hanging VMs Additional info: info registers: EAX=69c00000 EBX=00000000 ECX=00000016 EDX=0008093f ESI=80000000 EDI=000ee6b2 EBP=c068f320 ESP=c0768f58 EIP=c042d02a EFL=00000087 [--S--PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =007b 00000000 ffffffff 00c0f300 CS =0060 00000000 ffffffff 00c09b00 SS =0068 00000000 ffffffff 00c09300 DS =007b 00000000 ffffffff 00c0f300 FS =0000 00000000 ffffffff 00000000 GS =0000 00000000 ffffffff 00000000 LDT=0088 c0746020 00000027 00008200 TR =0080 c1803a80 00002073 00008b00 GDT= c1812000 000000ff IDT= c06f6000 000007ff CR0=8005003b CR2=b7f30000 CR3=37c3b000 CR4=000006d0 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 FCW=037f FSW=0120 [ST=0] FTW=00 MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=e000000000000000 4000 FPR5=d400000000000000 c004 FPR6=f424000000000000 4012 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 kvm_stat log Attached
Please request a kvmtrace log so we can see what is happening. What's the exact kernel rpm version in the guest?
(In reply to comment #2) > Please request a kvmtrace log so we can see what is happening. > What params do you need me to run kvmtrace with? > What's the exact kernel rpm version in the guest? 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
(In reply to comment #3) > (In reply to comment #2) > > Please request a kvmtrace log so we can see what is happening. > > > What params do you need me to run kvmtrace with? > kvmtrace -w 1 -o /tmp/aaa
Please also provide the contents of /proc/modules in the guest.
(In reply to comment #5) > Please also provide the contents of /proc/modules in the guest. I'll ask for it, but it might be tricky if the VM just hangs
the SA Malte Menkhoff has updated the RHEH-H systems of the affected cluster (rhevh-b-node03 + rhev-b-node04) to the actual version 5.5-2.2.5.2 the systems with the 100% CPU issues was isolated to the host rhev-b-node03 (host list: bvbe-luv-xline01 => production centos 5.4 system (with VNC) bvbe-luv-archiv1 => production centos 5.4 system (with SPICE) vrhel55i686 => testsystem RHEL 5.5 system (with VNC) vrhel55i686_2 => testsystem RHEL 5.5 system (with SPICE) with the option to be no HA systems and with prefered host rhevh-b-node03 the non-linux systems (one WinXP + one WIn2k8 Server) was migrated to host rhevh-b-node04 but for availability reasons with active HA for every linux system is a cat /proc/module created => see tar.bz on dropbox.redhat.com on the host rhevh-b-node03 was setup an kvmtrace for the moment that one guest goes to the 100% CPU situation - with an interval of 600 seconds. after the 5th time this was successful - during 540 and 600 the guest vrhel55i686 goes to the state. the necessary kvmtraces were created with: # kvmtrace -o outfile -w 600 and can be found in the tar.bz2 file on dropbox the file on cropbox is called: 2040792_BDB_20100805_full.tar.bz2 the content is: bvbe-luv-xline01.log (cat /proc/modeules of one centos system) bvbw-luv-archiv01.modules.log (cat /proc/modeules of one centos system) vrhel55i686.modules.log (cat /proc/modules of the rhel system with the 100CPU) _vrhel55i686_2.modules.log (cat /proc/modules of one rhel reference system) info_reg.rhevh-b-node03.bdb.local.20100805.txt => requested info_reg of the affected RHEV kvm_stat.rhevh-b-node03.bdb.local.20100805.txt => 5min kvm_stat after the guest was gone in the 100%CPU state of the RHEV-H rhevh-b-node03 the kvmtraces: rhevh-b-node03.kvmtrace_600.log.kvmtrace.0 rhevh-b-node03.kvmtrace_600.log.kvmtrace.1 rhevh-b-node03.kvmtrace_600.log.kvmtrace.2 rhevh-b-node03.kvmtrace_600.log.kvmtrace.3 rhevh-b-node03.kvmtrace_600.log.kvmtrace.4 rhevh-b-node03.kvmtrace_600.log.kvmtrace.5 rhevh-b-node03.kvmtrace_600.log.kvmtrace.6 rhevh-b-node03.kvmtrace_600.log.kvmtrace.7 rhevh-b-node03.kvmtrace_600.log.kvmtrace.8 rhevh-b-node03.kvmtrace_600.log.kvmtrace.9 rhevh-b-node03.kvmtrace_600.log.kvmtrace.10 rhevh-b-node03.kvmtrace_600.log.kvmtrace.11 at the moment one centos system is also in the 100% CPU state. ftp://seg.rdu.redhat.com/dropbox/2040792_BDB_20100805_full.tar.bz2 holds the logs and traces
Created attachment 438094 [details] Get the average and maximum of each field in kvm_stat
*** This bug has been marked as a duplicate of bug 570824 ***