892007 – virt-manager error message for a guest that cannot resume is ambiguous and unhelpful

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 892007 - virt-manager error message for a guest that cannot resume is ambiguous and unhelpful

Summary: virt-manager error message for a guest that cannot resume is ambiguous and un...

Keywords:
Status:	CLOSED DUPLICATE of bug 765733
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	virt-manager
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Blake
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-01-04 20:09 UTC by John Poelstra
Modified:	2013-07-12 19:21 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-07-12 19:21:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description John Poelstra 2013-01-04 20:09:38 UTC

Description of problem:

virt-manager error message for a guest that can resume is confusing and unhelpful

Not sure the status of the problem with RHEL 6.4, but I continue to get a consistent stream of traffic a blog post I did:
http://johnpoelstra.com/resuming-corrupted-suspended-guests

Feel free to close this as NOTABUG if you believe it has been addressed. Just wanted to raise in the event it helps make our product better.

----

Error restoring domain: Unable to read from monitor: Connection reset by peer

Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb
callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/domain.py", line 1050, in startup
self._backend.create()
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 510, in create
if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer

Comment 2 Dave Allan 2013-01-04 20:32:46 UTC

It would be great if we could find out from qemu that it was refusing to start because it couldn't restore from file, perhaps that's possible.

Comment 4 Martin Kletzander 2013-01-04 20:47:27 UTC

That would help with the error reporting, of course, but the creation of the machine shouldn't fail at all.

However, virt-manager could realize there is a managed save for the machine and offer reseting the machine to the default shutoff state.  I'll take it as a non-critical RFE.

In the meantime, I'll try to reproduce the issue, as it seems very common, and will see what we can do about the failing start of suspended domains.  Especially when libvirt-guests service handles them properly.

Comment 6 CongDong 2013-04-07 06:01:11 UTC

Can you tell me how to reproduce this bug?

Comment 7 CongDong 2013-04-07 06:28:06 UTC

The setps are not clear, can you tell me how to reproduce this bug?

Comment 8 John Poelstra 2013-04-08 00:20:28 UTC

(In reply to comment #7)
> The setps are not clear, can you tell me how to reproduce this bug?

I don't know exactly how to reproduce it.  All I know is that somehow the suspend date for a guest got corrupted during suspend or thereafter and I couldn't resume it and then I'd end up with the confusing error message noted.

Comment 9 John Poelstra 2013-04-10 18:45:55 UTC

One user reports http://johnpoelstra.com/resuming-corrupted-suspended-guests/#comment-2618 

"This same message occurs if one overallocates memory to the VM. I got this message and it took a while before I realized that the issue was not a suspended machine, but overallocation of memory!!"

Comment 10 Dave Allan 2013-04-10 18:54:13 UTC

Eric, you're looking into the libvirt and qemu changes that might be required to produce a more helpful message, so I'm handing this BZ over to you as well.

Comment 11 Martin Kletzander 2013-04-11 07:51:13 UTC

Reproduction should be easy.  Try these steps:

1. Create simple machine (any startable machine will do)
2. run 'virsh managedsave <machinename>'
3. edit '/var/lib/libvirt/qemu/save/<machinename.save>' in a manner that will cause the startup to fail (I tried changing the XML so that qemu has to allocate impossible amount of memory, 32768 GiB was enough)
4. try starting the machine

Let me know if that works for you and reproduces the error.  The error message might be different from 'Unable to read from monitor: Connection reset by peer'.  It depends on the way how you edit the file, I just described the easiest one.

Comment 12 CongDong 2013-04-11 12:26:34 UTC

I change the '/var/lib/libvirt/qemu/save/rhel.save', but the error message is not unhelpful.

[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.32-369.el6.x86_64 #1 SMP Thu Mar 28 21:49:55 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# rpm -qa qemu-kvm  virt-manager libvirt
virt-manager-0.9.0-18.el6.x86_64
libvirt-0.10.2-18.el6.x86_64
qemu-kvm-0.12.1.2-2.360.el6.x86_64

1. Create simple machine (my guest's name is rhel)
2. #virsh managedsave rhel
3. I edit the domain's name from 'rhel' to 'rhel.' in rhel.save file, and save the file
4. after that I can't start the vm , the error message tell me the domain name stuff.

---------------------------------------

Error restoring domain: operation failed: cannot restore domain 'rhel' uuid 27ace8b6-0afc-35a6-c73a-02f428f2aa08 from a file which belongs to domain 'rhel.' uuid 27ace8b6-0afc-35a6-c73a-02f428f2aa08

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1063, in startup
    self._backend.create()
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 678, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: operation failed: cannot restore domain 'rhel' uuid 27ace8b6-0afc-35a6-c73a-02f428f2aa08 from a file which belongs to domain 'rhel.' uuid 27ace8b6-0afc-35a6-c73a-02f428f2aa08

---------------------------------------

Comment 13 John Poelstra 2013-04-11 14:54:00 UTC

Sorry, I can't give reproducer... I don't have a RHEL 6 machine any more.

Comment 14 Dave Allan 2013-04-11 15:28:47 UTC

I think it's actually a general problem--we need better information transfer between qemu and libvirt to indicate the cause of startup failures.  That's going to require quite a bit of coordination though.

Comment 15 Sean Flanigan 2013-05-28 00:48:15 UTC

I think this is how we triggered it using Virtual Machine Manager:

1. Shutdown/Save the guest
2. Remove virtual disk from the guest
3. Add a new virtual disk to the guest
4. Try to resume the guest

This led to the error message:

Error restoring domain: Unable to read from monitor: Connection reset by peer

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1063, in startup
    self._backend.create()
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 619, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer

Comment 16 Eric Blake 2013-07-12 19:21:04 UTC


*** This bug has been marked as a duplicate of bug 765733 ***

Note You need to log in before you can comment on or make changes to this bug.