Description of problem: With the latest RHEL 5.8 KVM and kmod-kvm (kvm-83-249.el5), after a period of time the Guest (in this case, also a RHEL-5.8 machine) becomes unresponsive. No logins via the console or SSH. When attempting a login there is just a hang. This issue seems to only happen with machines using the virtio disk driver. This issue also has info in the RHEL5 mailing list: https://www.redhat.com/archives/rhelv5-list/2012-February/msg00060.html https://www.redhat.com/archives/rhelv5-list/2012-March/msg00016.html https://www.redhat.com/archives/rhelv5-list/2012-March/msg00017.html This issue is also present in CentOS-5.8 as detailed in this bug: http://bugs.centos.org/view.php?id=5582 And discussed on the CentOS mailing list in this thead: http://lists.centos.org/pipermail/centos/2012-March/124043.html Version-Release number of selected component (if applicable): kvm-83-249.el5 How reproducible: Always - after 1-3 days Steps to Reproduce: 1. Install the new kvm and kmod-kvm on a Host server with virtio disks, wait a period of time (in my case 1-3 days).
Dear Johnny Hughes, Thank you for taking the time to enter a bug report with us. We do appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for getting support, and as such we are not able to make any guarantees as to the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain that it gets the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please see: https://www.redhat.com/support/process/production/#howto Thanks.
Can not reproduce this bug with diff host kernel and kvm version nic: virtio, blk: virtio details: I. kvm-83-249.el5 1. host kernel: 2.6.18-308.el5 1). run 5 guests on host ---> run 17 hours guest: 2.6.18-308.el5 3 guests: keep downloading files 2 guests: dile 2). run 1 guest on host ----> run 24 hours guest: 2.6.18-308.el5 dile 19 hours, run dd 5 hours 3). install guest in lvm --> run 12 hours guest: 2.6.18-308.el5 keep dd in guest 2. host kernel: 2.6.18-300.el5 1). run 6 guests on host ----> 30 hours guest: 2.6.18-308.el5 2 guests: keep downloading files 1 geusts: running httpd inside and with 100 concurrect connection 3 guests: dile 3. host kernel: 2.6.18-274.18.1el5 1). run 1 guest on host ----> run 24 hours guest: 2.6.18-308.el5 dile 19 hours, run dd 5 hours 4. host kernel: 2.6.18-305.el5 1). run 1 guest on host ----> run 24 hours guest: 2.6.18-308.el5 dile 19 hours, run dd 5 hours II. kvm-83-246.el5 1. host kernel: 2.6.18-305.el5 1). run 1 guest on host ----> run 24 hours guest: 2.6.18-308.el5 dile 19 hours, run dd 5 hours III. kvm-83-239.el5 1. host kernel: 2.6.18-274.17.1.el5 1). run 1 guest on host ----> run 24 hours guest: 2.6.18-308.el5 dile 19 hours, run dd 5 hours cmd: /usr/libexec/qemu-kvm -name rhel5.8 -monitor stdio -serial unix:/tmp/serial-20120313-002624-AuCO,server,nowait -drive file=/home/RHEL-Server-5.8-64-virtio.qcow2,index=0,if=virtio,media=disk,cache=none,boot=on,format=qcow2 -net nic,vlan=0,model=virtio,macaddr=a0:01:8a:76:75:00 -net tap,vlan=0,script=/etc/qemu-ifup-switch -m 4096 -smp 2,cores=1,threads=1,sockets=2 -cpu qemu64,+sse2 -soundhw ac97 -vnc :0 -rtc-td-hack -M rhel5.6.0 -boot c -no-kvm-pit-reinjection -usbdevice tablet
I'm using this simpler cmdline and can reproduce with rhel5.8 release kernel on host and guest: /usr/libexec/qemu-kvm -m 1024 -boot c -k de \ -daemonize \ -drive file=/dev/sdc1,if=virtio,index=0,boot=on \ -drive file=/dev/mapper/VG00-LVvm2swap,if=virtio,index=1 \ -net nic,vlan=0,macaddr=xxx,model=virtio \ -net nic,vlan=1,macaddr=xxx,model=virtio Maybe memory related?
(In reply to comment #6) > I'm using this simpler cmdline and can reproduce with rhel5.8 release kernel on > host and guest: > > /usr/libexec/qemu-kvm -m 1024 -boot c -k de \ > -daemonize \ > -drive file=/dev/sdc1,if=virtio,index=0,boot=on \ > -drive file=/dev/mapper/VG00-LVvm2swap,if=virtio,index=1 \ > -net nic,vlan=0,macaddr=xxx,model=virtio \ > -net nic,vlan=1,macaddr=xxx,model=virtio > > Maybe memory related? didn't attache all guests' cmd, the following two scenerios are tested with 1024M mem seems you test with multi blk and nics, try to test with you scenerio I. kvm-83-249.el5 1. host kernel: 2.6.18-308.el5 1). run 5 guests on host ---> run 17 hours guest: 2.6.18-308.el5 3 guests: keep downloading files 2 guests: dile /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name test -smp 1,cores=1 -k en-us -m 1024 -boot dcn -net nic,vlan=1,macaddr=89:12:41:43:2c:52,model=virtio -net tap,vlan=1,ifname=virtio_10_1,script=/etc/qemu-ifup -drive file=/home/rhel5.8GA-copy-4.qcow2,media=disk,if=virtio,serial=7b-8a18-438e1f274bd2,boot=on,format=qcow2,werror=stop -soundhw ac97 -vnc :1 -vga cirrus -cpu qemu64,+sse2 -M rhel5.5.0 -notify all -vga cirrus -balloon none -monitor stdio 2. host kernel: 2.6.18-300.el5 1). run 6 guests on host ----> 30 hours guest: 2.6.18-308.el5 2 guests: keep downloading files 1 geusts: running httpd inside and with 100 concurrect connection 3 guests: dile /usr/libexec/qemu-kvm -no-hpet -no-kvm-pit-reinjection -usbdevice tablet -rtc-td-hack -startdate now -name test -smp 1,cores=1 -k en-us -m 1024 -boot dcn -net nic,vlan=1,macaddr=83:13:35:43:a3:32,model=virtio -net tap,vlan=1,ifname=virtio_10_6,script=/etc/qemu-ifup -drive file=/home/rhel5.8GA.qcow2,media=disk,if=virtio,serial=73-ad18-438e1f2a4aa2,boot=on,format=qcow2,werror=stop -soundhw ac97 -vnc :6 -cpu qemu64,+sse2 -M rhel5.4.0 -notify all -balloon none
Another thing that seems common at least to my scenario and Rainer's is that we both are using LVM on the host for our images where the tests in comment #5 seem to be using qcow2 local images on the host.
(In reply to comment #8) > Another thing that seems common at least to my scenario and Rainer's is that we > both are using LVM on the host for our images where the tests in comment #5 > seem to be using qcow2 local images on the host. we also test guest installed on lvm I. kvm-83-249.el5 1. host kernel: 2.6.18-308.el5 1). run 5 guests on host ---> run 17 hours guest: 2.6.18-308.el5 3 guests: keep downloading files 2 guests: dile 2). run 1 guest on host ----> run 24 hours guest: 2.6.18-308.el5 dile 19 hours, run dd 5 hours --------------------lvm guest ------------------ 3). install guest in lvm --> run 12 hours guest: 2.6.18-308.el5 keep dd in guest
I would also point out that I have not had an issue after switching from the virtio disk driver to the ide driver. Does Not Work (crashes normally within 24-36 hours): /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name testbox -uuid 636e9e51-f14f-b895-0753-2df877cafa8e -monitor unix:/var/lib/libvirt/qemu/testbox.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/dev/VG_VirtHosts/LV_testbox,if=virtio,boot=on,format=raw,cache=none -drive if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=54:52:00:78:68:ee,vlan=0,model=virtio -net tap,fd=19,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio Works (up longer than 48 hours, no issues): /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name testbox -uuid 636e9e51-f14f-b895-0753-2df877cafa8e -monitor unix:/var/lib/libvirt/qemu/testbox.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/dev/VG_VirtHosts/LV_testbox,if=ide,bus=0,unit=0,boot=on,format=raw,cache=none -drive if=ide,media=cdrom,bus=1,unit=0,readonly=on,format=raw -net nic,macaddr=54:52:00:78:68:ee,vlan=0,model=virtio -net tap,fd=19,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio
Similar problem here with both 5.7 and 5.8, but guest hang could be immediate, or take a few days. Dropping back to kvm 83-239 fixed it for me. I have an open support case and there is a fixed test version of 83-249. John Nebel
Experiencing the same issue (VM is unresponsive after 3-10 hours) and am using the same workaround (changed disk from vda/virtio to hda/ide). Both host and guest are running 5.8 with 2.6.18-308.1.1.el5 kernel. Command for guest: /usr/libexec/qemu-kvm -S -M rhel5.4.0 -m 8192 -smp 4,sockets=4,cores=1,threads=1 -name radm002p -uuid fe57b3f0-74be-ee69-26fa-a6e5c14d8e24 -monitor unix:/var/lib/libvirt/qemu/radm002p.monitor,server,nowait -no-kvm-pit-reinjection -boot c -drive file=/dev/vg0/radm002p,if=ide,bus=0,unit=0,boot=on,format=raw,cache=none -net nic,macaddr=54:52:00:43:43:06,vlan=0,model=virtio -net tap,fd=19,vlan=0 -net nic,macaddr=54:52:00:14:b7:6f,vlan=1,model=virtio -net tap,fd=20,vlan=1 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio
Hi Johnny, could you please get the core with the following steps? thanks. 1. gdb -p <pid> while bug reproduce --> c 2. kill -ABRT <pid> 3. generate core with (gdb)generate-core-file you can get the core here
We believe the current issue is a duplicate of bug #782631. Please verify the fix that resides on http://people.redhat.com/myamazak/.kvm-83-249.affinity_fix.el5_8/
I installed the kvm-83-249.affinity_fix.el5_8 packages, switched back to the vda/virtio driver, and my VMs have been running for 4 days.
Same here. No hangs reported over the weekend with affinity_fix package. Is this indeed a duplicate of 782631? Can we get an ETA? Thank you
I have installed the affinity_fix package today and shifted my test VM's drive from IDE to Virtio. Before this, I had 8 days of no issues having shifted from Virtio to IDE. I will post if I have the issue (or after 3 days as it never took that long with the non-affinity packages).
(In reply to comment #16) > Same here. No hangs reported over the weekend with affinity_fix package. Is > this indeed a duplicate of 782631? Can we get an ETA? > > Thank you Thanks for the report and the conformation the patch solves it. I'll mark this bug as a clone of bug #782631. We already composed a Z stream rpm for it - kvm-83-249.el5_8 and its in an ON_QA state. *** This bug has been marked as a duplicate of bug 782631 ***
Any chance you can allow access to 782631 so we can follow it? Referring us to a private bug does not help those without access. Thanks
The SRPM for this is on the public FTP site, but there does not seem to be an announcement on the Errata page for Red Hat Enterprise Linux (v. 5 *) yet (where * is Server, Client, etc.)
This is fixed by the RPMS in this announcement: https://rhn.redhat.com/errata/RHBA-2012-0398.html