Bug 752718 - virt-manager occasionally closes libvirt connection when a guest shuts down
Summary: virt-manager occasionally closes libvirt connection when a guest shuts down
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: virt-manager
Version: 16
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Cole Robinson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 758160 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-10 08:28 UTC by Boris Derzhavets
Modified: 2012-08-08 16:04 UTC (History)
23 users (show)

Fixed In Version: virt-manager-0.9.1-1.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-07 17:35:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Boris Derzhavets 2011-11-10 08:28:53 UTC
Description of problem:

Spice session initiated via Virt-Manager is connected via spicy.
After domain shutdown :-

 Error polling connection 'qemu:///system': Unable to read from monitor:
Connection reset by peer

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/engine.py", line 440, in _tick
    conn.tick()
  File "/usr/share/virt-manager/virtManager/connection.py", line 1507, in tick
    vm.tick(now)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1531, in tick
    info = self._backend.info()
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1411, in info
    if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer

Just one guest. Restart of libvirtd daemon may allow  2-3 (max) successful
shut downs.
Then :-
 Error polling connection 'qemu:///system': Unable to read from monitor:
Connection reset by peer

Version-Release number of selected component (if applicable):

Fedora 16


How reproducible:

Several (3-4 ) times shut down F16 KVM guest 

Steps to Reproduce:
1.
2.
3.
  
Actual results:

Stack trace above

Expected results:

Changing status of guest in VirtManager to shutdown


Additional info:

Comment 1 Laine Stump 2011-11-10 16:52:25 UTC
I've been subconsciously noticing this after the fact for awhile as well, but on a Fedora 14 host with the latest virt-manager and libvirt from upstream git (every once in awhile I'll notice that virt-manager is no longer connected, wonder about it for a second, then reconnect and go on with my work).

I just ran some experiments now, and found that virt-manager never disconnected if I had more than a single guest running and shut one down, but it consistently disconnected when shutting down 3 of 4 different guests *if they were the only guest running at the time. Here is the list of guests:

 * Windows XP - causes disconnect
 * RHEL5 - causes disconnect
 * Fedora 15 - causes disconnect
 * Fedora 16 - *no disconnect*

Of course there are other differences in the config.

If anyone has ideas of things to try, or wants to see the config of the various guests, I'd be happy to try things / provide access to the machine / supply the configs.

(Since this seems to not be a F16-specific bug, should we move it somewhere else / clone it to upstream?)

Comment 2 Stephen Gordon 2011-11-11 14:49:22 UTC
Just +1ing this for now, will need to try reproduce more consistently though.

Comment 3 Boris Derzhavets 2011-11-12 18:08:00 UTC
Rebuild based on libvirt-0.9.7-1.fc14.src.rpm
.  .  .  .  .  .  .

Wrote: /home/boris/rpmbuild/SRPMS/libvirt-0.9.7-1.fc16.src.rpm
Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-0.9.7-1.fc16.x86_64.rpm
Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-client-0.9.7-1.fc16.x86_64.rpm
Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-devel-0.9.7-1.fc16.x86_64.rpm
Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-lock-sanlock-0.9.7-1.fc16.x86_64.rpm
Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-python-0.9.7-1.fc16.x86_64.rpm
Wrote: /home/boris/rpmbuild/RPMS/x86_64/libvirt-debuginfo-0.9.7-1.fc16.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.qHVgyQ
+ umask 022
+ cd /home/boris/rpmbuild/BUILD
+ cd libvirt-0.9.7
+ rm -fr /home/boris/rpmbuild/BUILDROOT/libvirt-0.9.7-1.fc16.x86_64
+ exit 0
[boris@fedora16 SPECS]$ cd ..
[boris@fedora16 rpmbuild]$ ls -l
total 24
drwxr-xr-x 3 boris boris 4096 Nov 12 21:59 BUILD
drwxr-xr-x 2 boris boris 4096 Nov 12 22:01 BUILDROOT
drwxr-xr-x 3 boris boris 4096 Nov 12 21:46 RPMS
drwxr-xr-x 2 boris boris 4096 Nov 12 21:35 SOURCES
drwxr-xr-x 2 boris boris 4096 Nov 12 21:37 SPECS
drwxr-xr-x 2 boris boris 4096 Nov 12 22:01 SRPMS
[boris@fedora16 rpmbuild]$ cd SRPMS
[boris@fedora16 SRPMS]$ ls -l
total 17416
-rw-rw-r-- 1 boris boris 17831963 Nov 12 22:01 libvirt-0.9.7-1.fc16.src.rpm

Works fine on F16.
Thank you.

Comment 4 Kamil Páral 2011-11-25 13:58:56 UTC
I see this very often.

libvirt-0.9.6-2.fc16.x86_64

Comment 5 Boris Derzhavets 2011-11-25 14:46:10 UTC
Upgrade libvirt up to 0.9.7-X
It fixes issue on F16, Ubuntu 11.10, 12.04 ( daily builds)

Comment 6 Fedora Admin XMLRPC Client 2011-11-30 19:33:14 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 7 Fedora Admin XMLRPC Client 2011-11-30 19:36:50 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 8 Fedora Admin XMLRPC Client 2011-11-30 19:44:16 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 9 Fedora Admin XMLRPC Client 2011-11-30 19:54:48 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 10 Tim Waugh 2011-12-01 16:38:14 UTC
*** Bug 758160 has been marked as a duplicate of this bug. ***

Comment 11 Frank Murphy 2011-12-01 16:42:17 UTC
Can this fix be backported to F15?,
same intermittent problem.
Nothing in updates-testing\koji

Comment 12 Boris Derzhavets 2011-12-01 17:03:54 UTC
(In reply to comment #11)
> Can this fix be backported to F15?,
> same intermittent problem.
> Nothing in updates-testing\koji

Rebuild libvirt-0.9.7-1.fc14.src.rpm (download from Net) on F15
and install generated RPMS. In other words, upgrade Libvirt up to 0.9.7.1
on F15. It's a bug fix. I've not seen libvirt-0.9.7-1.fc16(fc15).src.rpm

Comment 13 Eric Blake 2011-12-01 18:18:15 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Can this fix be backported to F15?,
> > same intermittent problem.
> > Nothing in updates-testing\koji
> 
> Rebuild libvirt-0.9.7-1.fc14.src.rpm (download from Net) on F15
> and install generated RPMS. In other words, upgrade Libvirt up to 0.9.7.1
> on F15. It's a bug fix. I've not seen libvirt-0.9.7-1.fc16(fc15).src.rpm

While you can build 0.9.7 yourself on F16, the distro policy is that we will only backport specific commits into F16's 0.9.6 baseline (likewise for F15's 0.8.7 baseline), rather than rebasing.  For this fix to be backported, we have to identify the specific commits that makes the problem go away, and evaluate whether they are easy enough to cherry pick in for the next time we build for F15 or F16.

Comment 14 Boris Derzhavets 2011-12-01 18:42:44 UTC
That's fine.  But, how usual customer is supposed to manage in meantime ?

Comment 15 Eric Blake 2011-12-01 18:49:21 UTC
Fedora is an open project, with no paid customer support.  Help us identify which commits to backport, and the process will be that much faster.  Or, on F16, use the fedora-virt-preview repo, which provides the latest libvirt release (currently 0.9.7, but soon to be 0.9.8) while still having the stability of the rest of F16 rather than the risk of a complete rawhide system.

If you need paid customer support, consider using RHEL; then, your paid support can prioritize the time of developers to specifically focus on this bug on your behalf.

Comment 16 Frank Murphy 2011-12-01 18:52:04 UTC
I installed the three fc14 pkgs from: http://libvirt.org/sources/

yum localinstall --nogpgcheck  libvirt-python-0.9.7-1.fc14.x86_64.rpm 
 libvirt-client-0.9.7-1.fc14.x86_64.rpm  libvirt-0.9.7-1.fc14.x86_64.rpm

Didn't bother rebuild as a test, F15 host.
Will see if anything hits the proverbial fan.

If it work F15 ppl, may be able to do like wise.

Comment 17 Frank Murphy 2011-12-01 19:29:33 UTC
(In reply to comment #16)
> I installed the three fc14 pkgs from: http://libvirt.org/sources/
> 
> yum localinstall --nogpgcheck  libvirt-python-0.9.7-1.fc14.x86_64.rpm 
>  libvirt-client-0.9.7-1.fc14.x86_64.rpm  libvirt-0.9.7-1.fc14.x86_64.rpm
> 
> Didn't bother rebuild as a test, F15 host.
> Will see if anything hits the proverbial fan.
> 
> If it work F15 ppl, may be able to do like wise.

Apologies for bw.

Tested on 4 guests, Mix 14\15\16\rawhide
Simultaneously closed three of them, one by force (deliberate).
No loss of connection on the 4th.

Comment 18 Boris Derzhavets 2011-12-01 19:37:08 UTC
(In reply to comment #15)
> Fedora is an open project, with no paid customer support.  Help us identify
> which commits to backport, and the process will be that much faster.  Or, on
> F16, use the fedora-virt-preview repo, which provides the latest libvirt
> release (currently 0.9.7, but soon to be 0.9.8) while still having the
> stability of the rest of F16 rather than the risk of a complete rawhide system.
> 
> If you need paid customer support, consider using RHEL; then, your paid support
> can prioritize the time of developers to specifically focus on this bug on your
> behalf.

fedora-virt-preview repo won't work on F15 , it may help only on F16.
That why i suggested rebuild fc14.src.rpm as shortest way for fix.

Comment 19 Daniel Berrangé 2011-12-01 20:03:23 UTC
There were a bunch of fixes in the RPC code, that I suspect we need to cherry pick back to the Fedora libvirt. There shouldn't be too many conflicts here & they should get rid of this class of annoying bugs, without us rebasing the entire codebsae

Comment 20 dongsu.park 2011-12-05 17:46:02 UTC
I'm getting exactly the same error on Debian Squeeze, with libvirt-0.9.7-2.

Does anyone know exactly, which patches I need to cherry-pick from upstream?
As there are huge amount of RPC code changes between 0.9.7 and upstream, it's difficult for me to find out which one is actually the solution.

Comment 21 Laine Stump 2011-12-05 18:40:31 UTC
I'm working on a patch list now which Dan will double-check before I post it here. His rough description was "all patchsets that touch src/rpm or daemon/stream.[ch]", but that's a farily large list, and many aren't applicable.

Comment 22 Boris Derzhavets 2011-12-06 07:50:28 UTC
(In reply to comment #20)
> I'm getting exactly the same error on Debian Squeeze, with libvirt-0.9.7-2.
> 
> Does anyone know exactly, which patches I need to cherry-pick from upstream?
> As there are huge amount of RPC code changes between 0.9.7 and upstream, it's
> difficult for me to find out which one is actually the solution.

Could you post a stack trace generated by virt-manager ?

Comment 23 dongsu.park 2011-12-06 09:38:02 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > I'm getting exactly the same error on Debian Squeeze, with libvirt-0.9.7-2.
> > 
> > Does anyone know exactly, which patches I need to cherry-pick from upstream?
> > As there are huge amount of RPC code changes between 0.9.7 and upstream, it's
> > difficult for me to find out which one is actually the solution.
> 
> Could you post a stack trace generated by virt-manager ?

Sorry, I cannot post it, because I'm not using virt-manager.
I'm executing migrate command on virsh.
"virsh migrate --live --persistent domain_name qemu+ssh://username@target/system"

Here is the debug output when I'm turning on LIBVIRT_DEBUG option for running virsh.

http://pastebin.com/FsZaXxut

There you can find the line:
"error: Unable to read from monitor: Connection reset by peer"

Comment 24 Cole Robinson 2012-01-29 16:39:48 UTC
The main issue here isn't really a libvirt bug, it's a virt-manager problem.

virt-manager will close a libvirt connection if it catches a SYSTEM_ERROR exception from certain cases. this is to prevent virt-manager from keeping a busted connection around if someone restarts libvirtd.

Problem is that there is a kinda common case where a transient SYSTEM_ERROR is thrown: when a VM is being shut down and another thread calls virDomainGetInfo at the same time. QEMU monitor is hung up, info call raises a SYSTEM_ERROR, connection gets shut down.

Now it would be nice if there was some way libvirt could avoid that error in the first place, but that's a more subtle problem.

Regardless, virt-manager needs to handle this situation with a bit more smarts. Upstream now does this:

http://git.fedorahosted.org/git?p=virt-manager.git;a=commit;h=5bf341052d1f8c1ee5499b6a59c4b237b071523a

Reassigning to virt-manager

Comment 25 Fedora Update System 2012-02-01 17:05:52 UTC
virt-manager-0.9.1-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/virt-manager-0.9.1-1.fc16

Comment 26 Fedora Update System 2012-02-02 17:30:39 UTC
Package virt-manager-0.9.1-1.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing virt-manager-0.9.1-1.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-1206/virt-manager-0.9.1-1.fc16
then log in and leave karma (feedback).

Comment 27 Fedora Update System 2012-02-05 21:50:34 UTC
virt-manager-0.9.1-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 28 Peter Ruan 2012-06-21 18:18:49 UTC
Unable to complete install: 'Unable to read from monitor: Connection reset by peer'

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/create.py", line 1911, in do_install
    guest.start_install(False, meter=meter)
  File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1239, in start_install
    noboot)
  File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1307, in _create_guest
    dom = self.conn.createLinux(start_xml or final_xml, 0)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2077, in createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self)
libvirtError: Unable to read from monitor: Connection reset by peer

Comment 29 Kamil Páral 2012-06-22 13:43:24 UTC
Peter, if you reopen a bug, please provide offending package version.

Comment 30 Lei Li 2012-07-06 06:24:33 UTC
I have met this problem too.

fedora 16
libvirt: libvirt-0.9.10-1.fc16.x86_64
virt-common: virt-manager-common-0.9.1-3.fc16.noarch
virt-manager: virt-manager-0.9.1-3.fc16.noarch

Unable to complete install: "Unable to complete install: 'Unable to read from monitor: Connection reset by peer'"


Unable to complete install: 'Unable to read from monitor: Connection reset by peer'

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 45, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/create.py", line 1911, in do_install
    guest.start_install(False, meter=meter)
  File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1239, in start_install
    noboot)
  File "/usr/lib/python2.7/site-packages/virtinst/Guest.py", line 1307, in _create_guest
    dom = self.conn.createLinux(start_xml or final_xml, 0)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2413, in createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self)
libvirtError: Unable to read from monitor: Connection reset by peer

Comment 31 Cole Robinson 2012-07-07 17:35:46 UTC
That 'unable to complete install' issue is separate from this. If people can reliably reproduce, please open a new bug with the following info:

virt-manager --debug when reproducing
/var/log/libvirt/qemu/<vmname>.log of the guest being created
fedora distro and virt-manager version number

Closing this bug as the original issue should have been fixed for a while

Comment 32 Aaron Toponce 2012-07-20 15:31:58 UTC
This is not a virt-manager bug. It's a libvirt one. I'm not using virt-manager at all to live-migrate my domains, and I see this regularly on Ubuntu. The bug should probably be reopened, and assigned to libvirt. Here is what I run:

virsh # migrate --live --copy-storage-all --persistent --verbose vm qemu+ssh://aaaron@kvm/system
aaron@kvm's password:
Migration: [ 10 %]error: Unable to read from monitor: Connection reset by peer
virsh # migrate --live --copy-storage-all --persistent --verbose vm qemu+ssh://atoponce@kvm04/system
aaron@kvm's password: 
Migration: [  6 %]error: Unable to read from monitor: Connection reset by peer

$ dpkg -l | grep libvirt
ii  libvirt-bin                      0.9.2-4ubuntu15.2                            the programs for the libvirt library
ii  libvirt0                         0.9.2-4ubuntu15.2                            library for interfacing with different virtualization systems
ii  munin-libvirt-plugins            0.0.6-1                                      Munin plugins using libvirt
ii  python-libvirt                   0.9.2-4ubuntu15.2                            libvirt Python bindings

$ cat /etc/issue
Ubuntu 11.10 \n \l

Comment 33 Laine Stump 2012-07-20 21:09:44 UTC
Aaron,

This bug was originally filed against the Fedora 16 build of libvirt-0.9.6, and was later moved to the Fedora 16 build of virt-manager. According to your own information, you are running libvirt-0.9.2 on Ubuntu. Even if the bug described here were a libvirt bug (Cole's analysis indicates it was not), the fact that you are experiencing it has no bearing on whether or not this BZ should still be opened - you are running an older version of libvirt on a different platform, not libvirt-0.9.6 on Fedora 16. 

Aside from that, the error message being issued here is a very general message, and could be caused by any number of things. The original bug reported in this BZ had a partially similar symptom to the problem you are encountering, but according to Cole's analysis was caused by a bug in virt-manager. The fact that you are now encountering the same error message under different circumstances where virt-manager isn't involved doesn't mean that the bug reported here wasn't caused by virt-manager; it means that you have encountered a different bug that happens to have similar symptoms.

Bugs filed under Fedora are intended for problems encountered with Fedora-originated builds of packages for specific versions of Fedora.

Since you are running Ubuntu, you should be filing bugs in Ubuntu's bug tracker at https://bugs.launchpad.net/ubuntu/ (unless you are running the latest upstream libvirt (currently at 0.9.13) which you have built yourself, and in that case, you should file them here at bugzilla.redhat.com, under "Community Projects"-->"Virtualization" with component set to libvirt).

Comment 34 Aaron Toponce 2012-08-08 15:04:12 UTC
I appreceate the response, but I'm only clarifying that the bug does in fact affect instances without virt-manager even installed. I'm aware of Launchpad. Libvirt isn't an Ubuntu/Canonical product. It's a Red Hat one, thus the reason I posted here. I'd rather post upstream, and according to http://libvirt.org/bugs.html, that's here.

Comment 35 Eric Blake 2012-08-08 16:02:41 UTC
(In reply to comment #34)
> I appreceate the response, but I'm only clarifying that the bug does in fact
> affect instances without virt-manager even installed. I'm aware of
> Launchpad. Libvirt isn't an Ubuntu/Canonical product. It's a Red Hat one,
> thus the reason I posted here. I'd rather post upstream, and according to
> http://libvirt.org/bugs.html, that's here.

This is the correct bugzilla umbrella, but remember that there is more than one component managed by this bugzilla.  You are replying to a bug associated with RHEL, but you are complaining about an issue observed with upstream libvirt.  Therefore, you should open a new BZ against the "Virtualization" component, rather than continuing on about this BZ against the "RHEL" component.

Comment 36 Eric Blake 2012-08-08 16:04:27 UTC
(In reply to comment #35)
> (In reply to comment #34)
> > I appreceate the response, but I'm only clarifying that the bug does in fact
> > affect instances without virt-manager even installed. I'm aware of
> > Launchpad. Libvirt isn't an Ubuntu/Canonical product. It's a Red Hat one,
> > thus the reason I posted here. I'd rather post upstream, and according to
> > http://libvirt.org/bugs.html, that's here.
> 
> This is the correct bugzilla umbrella, but remember that there is more than
> one component managed by this bugzilla.  You are replying to a bug
> associated with RHEL, but you are complaining about an issue observed with
> upstream libvirt.  Therefore, you should open a new BZ against the
> "Virtualization" component, rather than continuing on about this BZ against
> the "RHEL" component.

Correction - this BZ is tied to the "Fedora" component instead of "RHEL" component (and yes, this bugzilla really does have RHEL, Fedora, and upstream all managed under one umbrella)


Note You need to log in before you can comment on or make changes to this bug.