Bug 619798

Summary: rhel5.5 running as kvm guest hangs randomly
Product: [Fedora] Fedora Reporter: Jiri Pirko <jpirko>
Component: qemuAssignee: Justin M. Forbes <jforbes>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 13CC: amit.shah, berrange, bjrosen, clalance, dwmw2, ehabkost, extras-orphan, gcosta, itamar, jaswinder, jforbes, jtluka, knoel, ldoktor, markmc, mschmidt, notting, ondrejj, quintela, rh_bugzilla, rkhan, scottt.tw, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-29 12:45:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jiri Pirko 2010-07-30 14:56:24 UTC
Description of problem:

I'm experiencing random hangs (depends on host load) on rhel5.5 guest. Not sure if this is guest or host issue. I tested this on hosts with F13,F12,Rawhide,RHEL6. Only on F13 I'm able to reproduce this issue. This issue only occurs when guest have 2 or more cpus assigned and host in under cpu load.

using kernel 2.6.33.6-147.fc13.x86_64

How reproducible:
always

Steps to Reproduce:
1. Run "stress -c X" where X is a number cpus on your F13 host.
2. Create new KVM guest in virt-manager with *two* cpus and install RHEL5.5 in there. 
3. observe system hang, either during the installation or after it. Reboot few times it it wouldn't appear right away.

The guest system is not *completely* death, it reacts on keypresses and ctrl-alt-del.

Comment 1 Michal Schmidt 2010-07-30 15:43:23 UTC
You might want to try a Rawhide kernel on F-13 to see if that fixes it.

Comment 2 Jiri Pirko 2010-08-02 07:48:08 UTC
(In reply to comment #1)
> You might want to try a Rawhide kernel on F-13 to see if that fixes it.    

With 2.6.35-0.58.rc6.git6.fc14.x86_64 it works fine.

Comment 3 Patrick 2010-08-10 16:06:08 UTC
I'm running Fedora 13 x86_64 with an Intel i7 cpu and 6GB mem, fully updated and kernel 2.6.35.1-5.rc1.fc14.x86_64 with a VM with CentOS 5.5 x86_64, 2GB mem, 2 cpu's and just experienced a hang. 

Hang means: output of the rpmbuild command has stopped, the build has stopped, ctrl-c will stop the rpmbuild process and results in a prompt. Any other commands can be typed in but once I press enter the prompt goes to the next line and just sits/hangs there. the command I ussued is not executed, I see no output, I get no new prompt and ctrl-c no longer kills it. I can no longer ssh to the VM.

The hang occurs not as fast as with the latest F13 kernel but it still did hang. It ran for at least 8 hours. The hang started when I was compiling a rather large application. A normal shutdown of the VM no longer works. Although I can enter '/sbin/shutdown -h now' on the command line, if I press enter it hangs and nothing happens. Selecting shutdown from virt-manager does nothing. I have to force the VM off. Here are the libvirt and kernel messages I could find in /var/log/messages although they are from hours ago and not anywhere near the time where the VM hang.

Aug 10 08:11:17 plato libvirtd: 08:11:17.744: warning : qemudParsePCIDeviceStrs:1411 : Unexpected exit status '1', qemu probably failed
Aug 10 08:11:17 plato libvirtd: 08:11:17.913: error : qemudDomainGetVcpus:5801 : Requested operation is not valid: cannot list vcpu pinning for an inactive domain
Aug 10 08:11:17 plato libvirtd: 08:11:17.917: error : qemudDomainGetVcpus:5801 : Requested operation is not valid: cannot list vcpu pinning for an inactive domain
Aug 10 08:11:17 plato libvirtd: 08:11:17.920: error : qemudDomainGetVcpus:5801 : Requested operation is not valid: cannot list vcpu pinning for an inactive domain
Aug 10 08:11:17 plato libvirtd: 08:11:17.923: error : qemudDomainGetVcpus:5801 : Requested operation is not valid: cannot list vcpu pinning for an inactive domain
Aug 10 08:11:17 plato libvirtd: 08:11:17.926: error : qemudDomainGetVcpus:5801 : Requested operation is not valid: cannot list vcpu pinning for an inactive domain
Aug 10 08:11:21 plato libvirtd: 08:11:21.475: warning : qemudParsePCIDeviceStrs:1411 : Unexpected exit status '1', qemu probably failed

Aug 10 08:11:21 plato kernel: tun: Universal TUN/TAP device driver, 1.6
Aug 10 08:11:21 plato kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk>
Aug 10 08:11:21 plato kernel: device vnet0 entered promiscuous mode
Aug 10 08:11:21 plato kernel: br0: new device vnet0 does not support netpoll (disabling)
Aug 10 08:11:21 plato kernel: br0: port 2(vnet0) entering forwarding state
Aug 10 08:11:21 plato kernel: br0: port 2(vnet0) entering forwarding state
Aug 10 08:11:21 plato qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
Aug 10 08:11:23 plato avahi-daemon[1649]: Registering new address record for fe80::d805:97ff:feb3:9555 on vnet0.*.
Aug 10 08:11:24 plato ntpd[1846]: Listen normally on 8 vnet0 fe80::d805:97ff:feb3:9555 UDP 123
Aug 10 08:11:31 plato kernel: kvm: 2816: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x130079
Aug 10 08:11:31 plato kernel: kvm: 2816: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xffce9422
Aug 10 08:11:31 plato kernel: kvm: 2816: cpu0 unimplemented perfctr wrmsr: 0x186 data 0x530079
Aug 10 08:11:31 plato kernel: kvm: 2816: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x130079
Aug 10 08:11:31 plato kernel: kvm: 2816: cpu1 unimplemented perfctr wrmsr: 0xc1 data 0xffce9422
Aug 10 08:11:31 plato kernel: kvm: 2816: cpu1 unimplemented perfctr wrmsr: 0x186 data 0x530079

Aug 10 08:38:11 plato kernel: hrtimer: interrupt took 11672 ns

Aug 10 10:42:50 plato kernel: qemu-kvm used greatest stack depth: 3232 bytes left

Please let me know if you need more information. Be happy to help where I can to expedite a fix.

Comment 4 Joshua Rosen 2010-08-10 16:36:47 UTC
This should be merged with Bug 619560 which I opened on 7/29. I'm seeing the seeing the same problem with CentOS 5.5 VMs I also had problems with an XP VM. My work around was to build and install a 2.6.35 Kernel from kernel.org. I've been running the 2.6.35 kernel on three machines that had hanging VMs, an iCore7 with an CentOS 5.5VM, a Core2 with a CentOS 5.5 VM and a Core2 with an XP VM. All machines have been running for several days with the 2.6.35 kernel and I haven't had any problems.

Comment 5 Bug Zapper 2011-06-01 12:28:28 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2011-06-29 12:45:41 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.