Description of problem: Over the past many months, I have observed that session (haven't tested against system) libvirtd stops to send domain state change events. So far I didn't care to report this issue because the issue mysteriously disappears each time I try to investigate. I was just able to reproduce it again and before it vanishes again because of me investigating it, I thought I should file it here first. How reproducible: sometimes Steps to Reproduce: Don't know exactly but try the following a few times: 1. Run gnome-boxes as 'G_MESSAGES_DEBUG=Libvirt.GObject gnome-boxes'. 1. Run a VM in gnome-boxes and play/click around a bit on the desktop. 2. Save the VM (just quit Boxes). 3. Restore the VM (start Boxes again and select the VM). Actual results: There is no appropriate messages on the console from 'Libvirt.GObject' in the format: %s: %s event:%d, detail:%d (These messages are emitted from libvirt-glib's event handler function) on each event. Expected results: There is at least one message on the console from 'Libvirt.GObject' in the format: %s: %s event:%d, detail:%d Additional info: Don't know if these errors I see in the log are relevant or not but thats the only thing I see there: $ tail ~/.libvirt/libvirtd.log 2012-05-07 18:07:18.461+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:18.518+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:18.707+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:27.602+0000: 28350: error : virNetSocketReadWire:991 : Cannot recv data: Connection reset by peer 2012-05-07 18:07:27.604+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:27.604+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:27.605+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:27.605+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:27.606+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event 2012-05-07 18:07:27.606+0000: 28350: warning : qemuMonitorJSONHandleGraphics:683 : missing auth scheme in graphics event
I've noticed weird thing. gnome-boxes sometimes hangs, stop responding for more than 30 seconds. However, it's not waiting in libvirt API. But since it does not reply to server's keep alive requests, the connection is closed meanwhile and thus all events won't be delivered any longer. NB, the hang - there are two symptoms which make me think of that: 1) I cannot click/type into the guest, even the clock on guest's taskbar is not updated 2) the debug output stops as well Zeeshan, do you experience the same?
I've taken look into libvirt-glib sources and I think I see the problem. It's not updating timeouts properly: http://libvirt.org/git/?p=libvirt-glib.git;a=blob;f=libvirt-glib/libvirt-glib-event.c;h=94f4de8e306474dd66ae2cea45c10f19fb6fe218;hb=HEAD#l369 This should look like this: http://fpaste.org/XUYj/ because libvirt internal apis allow timeout=0; but libvirt-glib mapping into glib mainloop doesn't. I've tested locally and it fixes the problem. Therefore changing the component to libvirt-glib.
I've proposed patch upstream: https://www.redhat.com/archives/libvir-list/2012-May/msg01098.html
(In reply to comment #1) > I've noticed weird thing. gnome-boxes sometimes hangs, stop responding for > more than 30 seconds. However, it's not waiting in libvirt API. But since it > does not reply to server's keep alive requests, the connection is closed > meanwhile and thus all events won't be delivered any longer. > > NB, the hang - there are two symptoms which make me think of that: > 1) I cannot click/type into the guest, even the clock on guest's taskbar is > not updated > 2) the debug output stops as well > > Zeeshan, do you experience the same? Yes, I have noticed this lately but I'll need to try your patch for a while to be sure if this is related to original issue I reported. We don't need to keep the bug open after your patch goes in, I can re-open if I see the isee again. Thanks so much for looking into and fixing this!
So, from backtrace taken at the moment of stuck we can see gnome-boxes is calling libvirt API within an event loop. The problem is, libvirt-glib is mapping libvirt event loop into gmainloop. Therefore if an libvirt API called within gmainloop stuck, and client receive an event (e.g. keepalive request from the daemon) such event cannot be processed - it will be in the next iteration which, obviously, won't come until we jump from stuck API (e.g. after deamon kills off connection due to keepalive timeout). Backtrace: http://pastebin.test.redhat.com/90223 I've proposed patch: https://www.redhat.com/archives/libvir-list/2012-May/msg01157.html IOW, it's gnome boxes what's broken actually. But since (nearly) all glib applications are event driven - actual work is done within callbacks called from the gmainloop it's worth fixing libvirt-glib instead of all it's consumers.
> IOW, it's gnome boxes what's broken actually. I disagree. The libvirt keepalive code is broken. Boxes' API call is waiting in this function: #1 0x000000340a324fe2 in virNetClientIOEventLoop (client=client@entry=0x7f5a60afb010, thiscall=thiscall@entry=0x3748b20) at rpc/virnetclient.c:1352 which should be responsible for processing the keepalive packets.
This has been fixed by http://www.redhat.com/archives/libvir-list/2012-June/msg00445.html which will be part of the soon-to-be-released libvirt 0.9.13
Yeah, Christophe thanks for updating this. Moving to POST per Comment 10.
(In reply to comment #10) > This has been fixed by > http://www.redhat.com/archives/libvir-list/2012-June/msg00445.html which > will be part of the soon-to-be-released libvirt 0.9.13 That unfortunately is not true. Last week at GUADEC, me and Marc-Andre were able to reproduce this bug against git master of libvirt and libvirt-glib many times. Marc-Andre tried to trace the issue but then he couldn't reproduce it.
The keepalive issue mentioned in this bug has been fixed, there seems to be a much harder bug to reproduce still lurking, but this can be dealt with in a separate bug report.
(In reply to comment #13) > The keepalive issue mentioned in this bug has been fixed, there seems to be > a much harder bug to reproduce still lurking, but this can be dealt with in > a separate bug report. Well the original bug I filed was not about the 'keep alive' issue but rather libvirt events not received but yes, we can open another one for this new issue..
(In reply to comment #14) > Well the original bug I filed was not about the 'keep alive' issue but > rather libvirt events not received but yes, we can open another one for this > new issue.. Yes, please do.
libvirt-0.9.11.5-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/libvirt-0.9.11.5-1.fc17
Package libvirt-0.9.11.5-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libvirt-0.9.11.5-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-11838/libvirt-0.9.11.5-1.fc17 then log in and leave karma (feedback).
libvirt-0.9.11.5-2.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/libvirt-0.9.11.5-2.fc17
libvirt-0.9.11.5-3.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/libvirt-0.9.11.5-3.fc17
libvirt-0.9.11.5-3.fc17 fixed the error "missing auth scheme in graphics event" for me in addition to the autostart of the guests: [root@host ~]# rpm -qa|grep -i libvirt libvirt-daemon-config-nwfilter-0.9.11.5-3.fc17.x86_64 libvirt-daemon-config-network-0.9.11.5-3.fc17.x86_64 libvirt-python-0.9.11.5-3.fc17.x86_64 libvirt-client-0.9.11.5-3.fc17.x86_64 libvirt-daemon-0.9.11.5-3.fc17.x86_64 libvirt-0.9.11.5-3.fc17.x86_64 Thanks.
libvirt-0.9.11.5-3.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.