Bug 794658 - VMs occasionally become unresponsive and it is impossible to type into other applications until virt-manager is killed
Summary: VMs occasionally become unresponsive and it is impossible to type into other ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: David Blechter
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 747464
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-17 08:43 UTC by Alon Levy
Modified: 2014-08-04 22:09 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 747464
Environment:
Last Closed: 2012-04-22 15:36:59 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Alon Levy 2012-02-17 08:43:50 UTC
The problem is that the qxl device itself still blocks on the red_worker thread. This was the block we removed by introducing new async io ports, but neglected to fix the device usage. Cloning to qemu-kvm to resolve this. Patches already posted upstream.

+++ This bug was initially created as a clone of Bug #747464 +++

For the last few months on F16 I've been dealing with this frustrating bug. The nature of the VM doesn't seem to matter, I've had this happen on F16, F15 and even Windows VMs.

What happens is that the view of the VM - and all of virt-manager - will become unresponsive, essentially hung. The VM itself is running fine in the background. What's especially odd is that it becomes impossible to type into any other running application while this is happening: I can switch between and manipulate other apps with the mouse, but I cannot type into them.

If I switch to a VT and kill virt-manager it clears things: I can now type into apps on the desktop again, and I can re-run virt-manager and pick up where I left off with the VM. But it's a frustrating bug.

I haven't found a 100% reproducer of the bug yet, but one place where it does seem to trigger quite often, for whatever reason, is at anaconda's timezone selection screen - I click my city (Vancouver), and the hang happens.

--- Additional comment from marcandre.lureau on 2011-10-19 18:36:15 EDT ---

Most probably a dead-lock issue between spice-gtk, libvirt and qemu, which is caused by qemu/qxl doing some IO synchronously. A nasty kind of bug that involved several parts.

You need a fairly recent version of spice & qemu (I think the one in f16 is recent enough) but the xorg-qxl driver fix is not yet committed, afaik.

See also:
https://bugzilla.redhat.com/show_bug.cgi?id=700134
https://bugs.freedesktop.org/show_bug.cgi?id=41622

Alon, care to share the status of the various components in f16?

--- Additional comment from marcandre.lureau on 2011-10-19 18:37:41 EDT ---

Adam, please get a backtrace when the hang happen, to be sure it's the same bug. thanks

--- Additional comment from awilliam on 2011-10-19 19:02:28 EDT ---

okay, I'll try.

[adamw@adam grub2 (f16)]$ rpm -q spice-gtk libvirt qemu-kvm xorg-x11-drv-qxl
spice-gtk-0.7.39-1.fc16.x86_64
libvirt-0.9.6-2.fc16.x86_64
qemu-kvm-0.15.0-5.fc16.x86_64
xorg-x11-drv-qxl-0.0.21-5.fc16.x86_64

--- Additional comment from awilliam on 2011-10-19 19:03:49 EDT ---

note that when I talk about 'other apps' I am of course talking about other apps running *on the host* besides virt-manager, in case that wasn't clear.

--- Additional comment from awilliam on 2011-10-19 21:54:56 EDT ---

wasn't entirely sure what you want a backtrace from, but here's the one from virt-manager. is it any use? virt-manager is python, right?

--- Additional comment from awilliam on 2011-10-19 21:56:30 EDT ---

Created attachment 529158 [details]
backtrace from virt-manager while it's hung

--- Additional comment from marcandre.lureau on 2011-10-20 06:23:33 EDT ---

(In reply to comment #5)
> wasn't entirely sure what you want a backtrace from, but here's the one from
> virt-manager. is it any use? virt-manager is python, right?

try the one from qemu while the client hangs. Thanks

--- Additional comment from awilliam on 2011-10-20 14:38:00 EDT ---

crap. I got the qemu one first then decided you probably wanted the virt-manager one. sigh =) okay, will get qemu next time it happens.

--- Additional comment from marcandre.lureau on 2011-10-20 16:52:20 EDT ---

(In reply to comment #8)
> crap. I got the qemu one first then decided you probably wanted the
> virt-manager one. sigh =) okay, will get qemu next time it happens.

Sorry, and btw, we need all threads :)

--- Additional comment from awilliam on 2011-10-24 15:06:34 EDT ---

Created attachment 529945 [details]
qemu backtrace

here's the backtrace from qemu while the hang is happening

--- Additional comment from alevy on 2011-10-25 06:31:55 EDT ---

Thanks Adam. The hang is on an io write (update area) from the vm that is waiting on a read from spice server, that is waiting on a read that strangely appears in libpthread, but maybe this is the missing async support for the X qxl driver. Marc-Andre, have you done a build with the patch Gerd had that I sent you? if not I can do one.

Alon

--- Additional comment from awilliam on 2012-01-23 21:04:11 EST ---

I'm still hitting this in current Rawhide. Any updates?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

--- Additional comment from crobinso on 2012-02-14 07:15:12 EST ---

There's an f15 bug with multiple stack traces attached:

https://bugzilla.redhat.com/show_bug.cgi?id=768404

Though at least my report in that bug was from an f16. Any word on this? I can reproduce fairly regularly if more info is needed.

--- Additional comment from awilliam on 2012-02-14 12:25:25 EST ---

I've heard nothing further.

You can use 'spicec' as a kind of workaround: use virt-manager to launch the VM, then run spicec and connect to localhost, port 5900. virt-manager bugs out so often it's kind of unusable at present.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

--- Additional comment from alevy on 2012-02-14 12:56:04 EST ---

Adam,

 Sorry for the long delay.

 Can you please install spice-server debug symbols and reproduce the qemu backtrace?

 Can you update the qxl driver in the F16 vm (not sure it's in F15) to 21-13, that's when Marc-Andre added the async patches, and retest?

Thanks for the patience,
Alon

--- Additional comment from alevy on 2012-02-15 03:30:38 EST ---

Adam, I missed the "async = QXL_SYNC" in the qemu stack trace, so even if you don't provide an updated stack trace I'm sure this is the result of using a too old driver, or otherwise letting qemu tell the guest it is using a too old device - so:

 1. update the driver as noted in comment 15 to xorg-x11-drv-qxl-0.0.21-13.fc17 or newer
 2. please provide the qemu command line, although as long as there is no "revision=1" property for the qxl-vga device, it should be fine.
 3. tell me if it still reproduces. If so please have the spice-server debug symbols installed.

Thanks,
Alon

--- Additional comment from awilliam on 2012-02-15 12:31:21 EST ---

well, 'too old' seems...well, it's one way of putting it. I've been experiencing this bug since F16. If there's a fix for the bug in the latest bleeding-edge qxl or whatever, you should probably backport it to F16, since that's our actual current release and all.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

--- Additional comment from awilliam on 2012-02-16 17:44:28 EST ---

okay, I did hit this yesterday performing an install of F17 Alpha RC2, which has xorg-x11-drv-qxl-0.0.21-16.fc17 . The qemu command line is:

qemu     28580 23.5  6.2 6717988 1033380 ?     Sl   14:40   0:56 /usr/bin/qemu-kvm -S -M pc-0.14 -enable-kvm -m 2048 -smp 1,sockets=1,cores=1,threads=1 -name Test_1 -uuid 1b76a7fb-b6a4-251e-f415-e2a5ff08404b -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Test_1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/media/Sea500/images/Fedora-17-Alpha-x86_64-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/media/Sea500/images/Test_1.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:03:ad:59,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7

I'll try and reproduce again and get a new trace.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

--- Additional comment from alevy on 2012-02-17 02:04:27 EST ---

Hi Adam.

 could you try http://people.freedesktop.org/~alon/qemu-1.0-7.fc18.src.rpm
 I think it should fix your hangs. (I'll attach a binary rpm if I manage to build it, it failed during tests for some reason, and I don't have permissions on the qemu package to do a scratch build).

Alon

Comment 2 Alon Levy 2012-04-22 15:36:59 UTC
Original bug closed based on a qxl driver (linux) update according to comment https://bugzilla.redhat.com/show_bug.cgi?id=747464#c32, so closing this clone.


Note You need to log in before you can comment on or make changes to this bug.