Bug 752718
Summary: | virt-manager occasionally closes libvirt connection when a guest shuts down | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Boris Derzhavets <bderzhavets> |
Component: | virt-manager | Assignee: | Cole Robinson <crobinso> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 16 | CC: | aaron.toponce, berrange, clalancette, crobinso, dongsu.park, dougsland, dpierce, eblake, frankly3d, hbrock, ipilcher, itamar, jforbes, jpopelka, kparal, laine, loganjerry, matrixs.zero, pruan, sgordon, twaugh, veillard, virt-maint |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | virt-manager-0.9.1-1.fc16 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-07-07 17:35:46 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Boris Derzhavets
2011-11-10 08:28:53 UTC
I've been subconsciously noticing this after the fact for awhile as well, but on a Fedora 14 host with the latest virt-manager and libvirt from upstream git (every once in awhile I'll notice that virt-manager is no longer connected, wonder about it for a second, then reconnect and go on with my work). I just ran some experiments now, and found that virt-manager never disconnected if I had more than a single guest running and shut one down, but it consistently disconnected when shutting down 3 of 4 different guests *if they were the only guest running at the time. Here is the list of guests: * Windows XP - causes disconnect * RHEL5 - causes disconnect * Fedora 15 - causes disconnect * Fedora 16 - *no disconnect* Of course there are other differences in the config. If anyone has ideas of things to try, or wants to see the config of the various guests, I'd be happy to try things / provide access to the machine / supply the configs. (Since this seems to not be a F16-specific bug, should we move it somewhere else / clone it to upstream?) Just +1ing this for now, will need to try reproduce more consistently though. Rebuild based on libvirt-0.9.7-1.fc14.src.rpm . . . . . . . Wrote: /home/boris/rpmbuild/SRPMS/libvirt-0.9.7-1.fc16.src.rpm Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-0.9.7-1.fc16.x86_64.rpm Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-client-0.9.7-1.fc16.x86_64.rpm Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-devel-0.9.7-1.fc16.x86_64.rpm Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-lock-sanlock-0.9.7-1.fc16.x86_64.rpm Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-python-0.9.7-1.fc16.x86_64.rpm Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-debuginfo-0.9.7-1.fc16.x86_64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.qHVgyQ + umask 022 + cd /home/boris/rpmbuild/BUILD + cd libvirt-0.9.7 + rm -fr /home/boris/rpmbuild/BUILDROOT/libvirt-0.9.7-1.fc16.x86_64 + exit 0 [boris@fedora16 SPECS]$ cd .. [boris@fedora16 rpmbuild]$ ls -l total 24 drwxr-xr-x 3 boris boris 4096 Nov 12 21:59 BUILD drwxr-xr-x 2 boris boris 4096 Nov 12 22:01 BUILDROOT drwxr-xr-x 3 boris boris 4096 Nov 12 21:46 RPMS drwxr-xr-x 2 boris boris 4096 Nov 12 21:35 SOURCES drwxr-xr-x 2 boris boris 4096 Nov 12 21:37 SPECS drwxr-xr-x 2 boris boris 4096 Nov 12 22:01 SRPMS [boris@fedora16 rpmbuild]$ cd SRPMS [boris@fedora16 SRPMS]$ ls -l total 17416 -rw-rw-r-- 1 boris boris 17831963 Nov 12 22:01 libvirt-0.9.7-1.fc16.src.rpm Works fine on F16. Thank you. I see this very often. libvirt-0.9.6-2.fc16.x86_64 Upgrade libvirt up to 0.9.7-X It fixes issue on F16, Ubuntu 11.10, 12.04 ( daily builds) This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. *** Bug 758160 has been marked as a duplicate of this bug. *** Can this fix be backported to F15?, same intermittent problem. Nothing in updates-testing\koji (In reply to comment #11) > Can this fix be backported to F15?, > same intermittent problem. > Nothing in updates-testing\koji Rebuild libvirt-0.9.7-1.fc14.src.rpm (download from Net) on F15 and install generated RPMS. In other words, upgrade Libvirt up to 0.9.7.1 on F15. It's a bug fix. I've not seen libvirt-0.9.7-1.fc16(fc15).src.rpm (In reply to comment #12) > (In reply to comment #11) > > Can this fix be backported to F15?, > > same intermittent problem. > > Nothing in updates-testing\koji > > Rebuild libvirt-0.9.7-1.fc14.src.rpm (download from Net) on F15 > and install generated RPMS. In other words, upgrade Libvirt up to 0.9.7.1 > on F15. It's a bug fix. I've not seen libvirt-0.9.7-1.fc16(fc15).src.rpm While you can build 0.9.7 yourself on F16, the distro policy is that we will only backport specific commits into F16's 0.9.6 baseline (likewise for F15's 0.8.7 baseline), rather than rebasing. For this fix to be backported, we have to identify the specific commits that makes the problem go away, and evaluate whether they are easy enough to cherry pick in for the next time we build for F15 or F16. That's fine. But, how usual customer is supposed to manage in meantime ? Fedora is an open project, with no paid customer support. Help us identify which commits to backport, and the process will be that much faster. Or, on F16, use the fedora-virt-preview repo, which provides the latest libvirt release (currently 0.9.7, but soon to be 0.9.8) while still having the stability of the rest of F16 rather than the risk of a complete rawhide system. If you need paid customer support, consider using RHEL; then, your paid support can prioritize the time of developers to specifically focus on this bug on your behalf. I installed the three fc14 pkgs from: http://libvirt.org/sources/ yum localinstall --nogpgcheck libvirt-python-0.9.7-1.fc14.x86_64.rpm libvirt-client-0.9.7-1.fc14.x86_64.rpm libvirt-0.9.7-1.fc14.x86_64.rpm Didn't bother rebuild as a test, F15 host. Will see if anything hits the proverbial fan. If it work F15 ppl, may be able to do like wise. (In reply to comment #16) > I installed the three fc14 pkgs from: http://libvirt.org/sources/ > > yum localinstall --nogpgcheck libvirt-python-0.9.7-1.fc14.x86_64.rpm > libvirt-client-0.9.7-1.fc14.x86_64.rpm libvirt-0.9.7-1.fc14.x86_64.rpm > > Didn't bother rebuild as a test, F15 host. > Will see if anything hits the proverbial fan. > > If it work F15 ppl, may be able to do like wise. Apologies for bw. Tested on 4 guests, Mix 14\15\16\rawhide Simultaneously closed three of them, one by force (deliberate). No loss of connection on the 4th. (In reply to comment #15) > Fedora is an open project, with no paid customer support. Help us identify > which commits to backport, and the process will be that much faster. Or, on > F16, use the fedora-virt-preview repo, which provides the latest libvirt > release (currently 0.9.7, but soon to be 0.9.8) while still having the > stability of the rest of F16 rather than the risk of a complete rawhide system. > > If you need paid customer support, consider using RHEL; then, your paid support > can prioritize the time of developers to specifically focus on this bug on your > behalf. fedora-virt-preview repo won't work on F15 , it may help only on F16. That why i suggested rebuild fc14.src.rpm as shortest way for fix. There were a bunch of fixes in the RPC code, that I suspect we need to cherry pick back to the Fedora libvirt. There shouldn't be too many conflicts here & they should get rid of this class of annoying bugs, without us rebasing the entire codebsae I'm getting exactly the same error on Debian Squeeze, with libvirt-0.9.7-2. Does anyone know exactly, which patches I need to cherry-pick from upstream? As there are huge amount of RPC code changes between 0.9.7 and upstream, it's difficult for me to find out which one is actually the solution. I'm working on a patch list now which Dan will double-check before I post it here. His rough description was "all patchsets that touch src/rpm or daemon/stream.[ch]", but that's a farily large list, and many aren't applicable. (In reply to comment #20) > I'm getting exactly the same error on Debian Squeeze, with libvirt-0.9.7-2. > > Does anyone know exactly, which patches I need to cherry-pick from upstream? > As there are huge amount of RPC code changes between 0.9.7 and upstream, it's > difficult for me to find out which one is actually the solution. Could you post a stack trace generated by virt-manager ? (In reply to comment #22) > (In reply to comment #20) > > I'm getting exactly the same error on Debian Squeeze, with libvirt-0.9.7-2. > > > > Does anyone know exactly, which patches I need to cherry-pick from upstream? > > As there are huge amount of RPC code changes between 0.9.7 and upstream, it's > > difficult for me to find out which one is actually the solution. > > Could you post a stack trace generated by virt-manager ? Sorry, I cannot post it, because I'm not using virt-manager. I'm executing migrate command on virsh. "virsh migrate --live --persistent domain_name qemu+ssh://username@target/system" Here is the debug output when I'm turning on LIBVIRT_DEBUG option for running virsh. http://pastebin.com/FsZaXxut There you can find the line: "error: Unable to read from monitor: Connection reset by peer" The main issue here isn't really a libvirt bug, it's a virt-manager problem. virt-manager will close a libvirt connection if it catches a SYSTEM_ERROR exception from certain cases. this is to prevent virt-manager from keeping a busted connection around if someone restarts libvirtd. Problem is that there is a kinda common case where a transient SYSTEM_ERROR is thrown: when a VM is being shut down and another thread calls virDomainGetInfo at the same time. QEMU monitor is hung up, info call raises a SYSTEM_ERROR, connection gets shut down. Now it would be nice if there was some way libvirt could avoid that error in the first place, but that's a more subtle problem. Regardless, virt-manager needs to handle this situation with a bit more smarts. Upstream now does this: http://git.fedorahosted.org/git?p=virt-manager.git;a=commit;h=5bf341052d1f8c1ee5499b6a59c4b237b071523a Reassigning to virt-manager virt-manager-0.9.1-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/virt-manager-0.9.1-1.fc16 Package virt-manager-0.9.1-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing virt-manager-0.9.1-1.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-1206/virt-manager-0.9.1-1.fc16 then log in and leave karma (feedback). virt-manager-0.9.1-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. Unable to complete install: 'Unable to read from monitor: Connection reset by peer' Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/create.py", line 1911, in do_install guest.start_install(False, meter=meter) File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1239, in start_install noboot) File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1307, in _create_guest dom = self.conn.createLinux(start_xml or final_xml, 0) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2077, in createLinux if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self) libvirtError: Unable to read from monitor: Connection reset by peer Peter, if you reopen a bug, please provide offending package version. I have met this problem too. fedora 16 libvirt: libvirt-0.9.10-1.fc16.x86_64 virt-common: virt-manager-common-0.9.1-3.fc16.noarch virt-manager: virt-manager-0.9.1-3.fc16.noarch Unable to complete install: "Unable to complete install: 'Unable to read from monitor: Connection reset by peer'" Unable to complete install: 'Unable to read from monitor: Connection reset by peer' Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/create.py", line 1911, in do_install guest.start_install(False, meter=meter) File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1239, in start_install noboot) File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1307, in _create_guest dom = self.conn.createLinux(start_xml or final_xml, 0) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2413, in createLinux if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self) libvirtError: Unable to read from monitor: Connection reset by peer That 'unable to complete install' issue is separate from this. If people can reliably reproduce, please open a new bug with the following info: virt-manager --debug when reproducing /var/log/libvirt/qemu/<vmname>.log of the guest being created fedora distro and virt-manager version number Closing this bug as the original issue should have been fixed for a while This is not a virt-manager bug. It's a libvirt one. I'm not using virt-manager at all to live-migrate my domains, and I see this regularly on Ubuntu. The bug should probably be reopened, and assigned to libvirt. Here is what I run: virsh # migrate --live --copy-storage-all --persistent --verbose vm qemu+ssh://aaaron@kvm/system aaron@kvm's password: Migration: [ 10 %]error: Unable to read from monitor: Connection reset by peer virsh # migrate --live --copy-storage-all --persistent --verbose vm qemu+ssh://atoponce@kvm04/system aaron@kvm's password: Migration: [ 6 %]error: Unable to read from monitor: Connection reset by peer $ dpkg -l | grep libvirt ii libvirt-bin 0.9.2-4ubuntu15.2 the programs for the libvirt library ii libvirt0 0.9.2-4ubuntu15.2 library for interfacing with different virtualization systems ii munin-libvirt-plugins 0.0.6-1 Munin plugins using libvirt ii python-libvirt 0.9.2-4ubuntu15.2 libvirt Python bindings $ cat /etc/issue Ubuntu 11.10 \n \l Aaron, This bug was originally filed against the Fedora 16 build of libvirt-0.9.6, and was later moved to the Fedora 16 build of virt-manager. According to your own information, you are running libvirt-0.9.2 on Ubuntu. Even if the bug described here were a libvirt bug (Cole's analysis indicates it was not), the fact that you are experiencing it has no bearing on whether or not this BZ should still be opened - you are running an older version of libvirt on a different platform, not libvirt-0.9.6 on Fedora 16. Aside from that, the error message being issued here is a very general message, and could be caused by any number of things. The original bug reported in this BZ had a partially similar symptom to the problem you are encountering, but according to Cole's analysis was caused by a bug in virt-manager. The fact that you are now encountering the same error message under different circumstances where virt-manager isn't involved doesn't mean that the bug reported here wasn't caused by virt-manager; it means that you have encountered a different bug that happens to have similar symptoms. Bugs filed under Fedora are intended for problems encountered with Fedora-originated builds of packages for specific versions of Fedora. Since you are running Ubuntu, you should be filing bugs in Ubuntu's bug tracker at https://bugs.launchpad.net/ubuntu/ (unless you are running the latest upstream libvirt (currently at 0.9.13) which you have built yourself, and in that case, you should file them here at bugzilla.redhat.com, under "Community Projects"-->"Virtualization" with component set to libvirt). I appreceate the response, but I'm only clarifying that the bug does in fact affect instances without virt-manager even installed. I'm aware of Launchpad. Libvirt isn't an Ubuntu/Canonical product. It's a Red Hat one, thus the reason I posted here. I'd rather post upstream, and according to http://libvirt.org/bugs.html, that's here. (In reply to comment #34) > I appreceate the response, but I'm only clarifying that the bug does in fact > affect instances without virt-manager even installed. I'm aware of > Launchpad. Libvirt isn't an Ubuntu/Canonical product. It's a Red Hat one, > thus the reason I posted here. I'd rather post upstream, and according to > http://libvirt.org/bugs.html, that's here. This is the correct bugzilla umbrella, but remember that there is more than one component managed by this bugzilla. You are replying to a bug associated with RHEL, but you are complaining about an issue observed with upstream libvirt. Therefore, you should open a new BZ against the "Virtualization" component, rather than continuing on about this BZ against the "RHEL" component. (In reply to comment #35) > (In reply to comment #34) > > I appreceate the response, but I'm only clarifying that the bug does in fact > > affect instances without virt-manager even installed. I'm aware of > > Launchpad. Libvirt isn't an Ubuntu/Canonical product. It's a Red Hat one, > > thus the reason I posted here. I'd rather post upstream, and according to > > http://libvirt.org/bugs.html, that's here. > > This is the correct bugzilla umbrella, but remember that there is more than > one component managed by this bugzilla. You are replying to a bug > associated with RHEL, but you are complaining about an issue observed with > upstream libvirt. Therefore, you should open a new BZ against the > "Virtualization" component, rather than continuing on about this BZ against > the "RHEL" component. Correction - this BZ is tied to the "Fedora" component instead of "RHEL" component (and yes, this bugzilla really does have RHEL, Fedora, and upstream all managed under one umbrella) |