Bug 892745

Summary: Boxes fails to restore state saved by older qemu versions
Product: [Fedora] Fedora Reporter: Matthew Garrett <mjg59>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 18CC: amit.shah, berrange, cfergeau, clalancette, crobinso, dallan, dwmw2, itamar, jforbes, jhrozek, jyang, laine, libvirt-maint, marcandre.lureau, pbonzini, rjones, scottt.tw, veillard, virt-maint, zali
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-24 18:02:54 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Matthew Garrett 2013-01-07 13:03:24 EST
After an update, boxes started failing to restore a VM. From the logs:

gnome-boxes:9731): Boxes-DEBUG: app.vala:707: connect display failed: Unable to start domain: Unable to read from monitor: Connection reset by peer

and from the qemu logs:

qemu: warning: error while loading state for instance 0x0 of device '0000:00:02.0/qxl'

This seems to be caused if there's any kind of qxl ABI bump. Boxes doesn't seem to provide any kind of UI to handle this - I could delete the saved state with virsh, but Boxes would seem to require me to delete and recreate the entire VM.
Comment 1 Jakub Hrozek 2013-01-07 14:01:00 EST
Wrong boxes. Reassigning.
Comment 2 Zeeshan Ali 2013-01-08 09:24:46 EST
As discussed on IRC, this is most likely a libvirt issue since you were not able to start the domain from virsh either. I don't see why Boxes should provide a UI to port VMs from one version of libvirt to another.
Comment 3 Dave Allan 2013-01-08 11:56:36 EST
(In reply to comment #2)
> As discussed on IRC, this is most likely a libvirt issue since you were not
> able to start the domain from virsh either. I don't see why Boxes should
> provide a UI to port VMs from one version of libvirt to another.

Um, let's not blame libvirt or anyone else until we have root cause, ok?
Comment 4 Zeeshan Ali 2013-01-08 19:33:33 EST
(In reply to comment #3)
> (In reply to comment #2)
> > As discussed on IRC, this is most likely a libvirt issue since you were not
> > able to start the domain from virsh either. I don't see why Boxes should
> > provide a UI to port VMs from one version of libvirt to another.
> 
> Um, let's not blame libvirt or anyone else until we have root cause, ok?

Could be qemu for all we know. "most likely a libvirt issue" is hardly a blame. :) What I was trying to say was this is unlikely to be a Boxes issue (for the reasons mentioned) so either libvirt or further down the stack..
Comment 5 Zeeshan Ali 2013-01-19 10:04:07 EST
This is a major loss of functionality.
Comment 6 Cole Robinson 2013-01-19 14:28:45 EST
Zeeshan, can you reproduce? If so, can you narrow down what specific package and versions need to be updated to reproduce?
Comment 7 Zeeshan Ali 2013-01-21 10:01:59 EST
(In reply to comment #6)
> Zeeshan, can you reproduce? If so, can you narrow down what specific package
> and versions need to be updated to reproduce?

No, this happens with qemu upgrade so hard to reproduce easily. However, either Matthew or Mattias Bengtsson (who encountered this on this weekend) could provide the needed info: https://bugzilla.gnome.org/show_bug.cgi?id=692062
Comment 8 Zeeshan Ali 2013-01-21 10:07:53 EST
Actually this bug has been seen by others before: https://bugzilla.gnome.org/show_bug.cgi?id=687626
Comment 9 Zeeshan Ali 2013-01-21 10:08:52 EST
Meh, didn't mean to remove NEEDINFO all together.
Comment 10 Dave Allan 2013-01-21 12:11:51 EST
(In reply to comment #0)
> After an update, boxes started failing to restore a VM. From the logs:
> 
> gnome-boxes:9731): Boxes-DEBUG: app.vala:707: connect display failed: Unable
> to start domain: Unable to read from monitor: Connection reset by peer
> 
> and from the qemu logs:
> 
> qemu: warning: error while loading state for instance 0x0 of device
> '0000:00:02.0/qxl'
> 
> This seems to be caused if there's any kind of qxl ABI bump. Boxes doesn't
> seem to provide any kind of UI to handle this - I could delete the saved
> state with virsh, but Boxes would seem to require me to delete and recreate
> the entire VM.

This situation appears to be related to a qemu update, not a libvirt update, so I'm changing the component to qemu.  If there's anything libvirt can do to help, please let me know.  Specifically, it looks to me like the old save file is not compatible with the new qemu, but I'm really just speculating there and we should wait for the qemu folks to give their opinion.
Comment 11 Christophe Fergeau 2013-01-23 11:40:53 EST
https://bugzilla.gnome.org/show_bug.cgi?id=687626#c1 has more info about the package versions I could reproduce this bug with:
"Fwiw, it is currently very easy to get in such a state by saving with
qemu-1.2.0-18 and trying to restore with qemu-1.2.0-19"
(saving with the former and trying o restore with the latter fails).

However, I've just run some more tests, and if I use qemu-1.2.0-25 instead of -19 I still get the same bug. With qemu-1.2.2-1 I can properly restore the VM.
Comment 12 Cole Robinson 2013-01-24 18:02:54 EST
(In reply to comment #11)
> 
> However, I've just run some more tests, and if I use qemu-1.2.0-25 instead
> of -19 I still get the same bug. With qemu-1.2.2-1 I can properly restore
> the VM.

Hmm, so sounds like this bug is gone with current qemu in F18.

1.2.0-19 added a bunch of spice + qxl patches, so something likely messed up the migration format (notice the 'qxl' bit in the qemu error message). The 1.2.2 update pulled in even more fixes which probably undid the damage.

However people that saved a guest on qemu -19 to -25 might still have issues. Though I wouldn't be surprised if it's basically non fixable in a forward compatible way. And figuring that those versions only existed in pre release Fedora versions, I'm closing this.

If I've misunderstood and anyone is still affected by this issue, please reopen.
Comment 13 Cole Robinson 2013-01-24 18:09:08 EST
Okay, I missed comment #7 mentioning that someone hit this bug as recently as this weekend. My guess is he either is not updated to F18 GA, or he is hitting the issue I described where his guest was saved between qemu -19 and qemu -25.

Without digging into the root issue, my answer would still be that he has to throw away his saved state since those qemu versions generated incompatible migration data.