Bug 671100

Summary: possible migration failure due to erroneous interpretation of subsection
Product: Red Hat Enterprise Linux 6 Reporter: Paolo Bonzini <pbonzini>
Component: qemu-kvmAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: bcao, fyang, mjenner, mkenneth, shu, virt-maint
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.143.el6 Doc Type: Bug Fix
Doc Text:
Cause: a possible ambiguity in the migration format is handled incorrectly by the receiving end. Consequence: in very rare cases migration may fail. Fix: the ambiguity is resolved correctly. Result: incoming migration data is interpreted correctly and migration succeeds.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 11:34:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580954    

Description Paolo Bonzini 2011-01-20 10:36:30 UTC
See http://permalink.gmane.org/gmane.comp.emulators.qemu/87767 and thread.

> Although it's rare to happen in live migration, when the head of a
> byte stream contains 0x05 which is the marker of subsection, the
> loader gets corrupted because vmstate_subsection_load() continues even
> the device doesn't require it.  This patch adds a checker whether
> subsection is needed, and skips following routines if not needed.

This was reported with Kemari, but it is not limited to it.  After a VMS_STRUCT a 0x5 byte is part of the parent data stream, but it is parsed as a subsection of the data stream.

Having subsection nested and under VMS_STRUCT is simply not going to work, so the patch linked above is the right solution.

Comment 3 Paolo Bonzini 2011-02-01 09:01:13 UTC
The bug is very rare; we can provide a patched package that will always fail without this patch and always pass with it.  Is that okay?

Comment 7 Mike Cao 2011-02-15 05:26:59 UTC
(In reply to comment #3)
> The bug is very rare; we can provide a patched package that will always fail
> without this patch and always pass with it.  Is that okay?

I will use this workround to verify this bug .Could you provide me the scratch build ?
Mike

Comment 8 Paolo Bonzini 2011-02-15 16:31:37 UTC
I placed them at http://people.redhat.com/pbonzini/bz671100/

The ".pbtest" rpms won't pass a save/restore (virsh save/virsh restore), the ".pbfixed" rpms will.

Unfortunately, due to bug 677712, you won't be able to restore a ".pbtest" vm with the ".pbfixed" rpms, which would also be a nice test.

Comment 9 Shaolong Hu 2011-02-18 08:42:04 UTC
Reproduced on qemu-kvm-0.12.1.2-2.113.el6.pbtest.x86_64.rpm at http://people.redhat.com/pbonzini/bz671100/ as following steps.

Reproduce Procedure:
---------------------
1. boot guest A with:

/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name RHEL-Server-6.0_64_raw -uuid `uuidgen` -rtc base=utc -boot order=cd -drive file=./RHEL-Server-6.0_64_raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:fa -usb -device usb-tablet,id=input0 -vnc :10 -monitor stdio

2. boot guest B with:

/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name RHEL-Server-6.0_64_raw -uuid `uuidgen` -rtc base=utc -boot order=cd -drive file=./RHEL-Server-6.0_64_raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:fa -usb -device usb-tablet,id=input0 -vnc :10 -monitor stdio -incoming tcp:0:5555

3. in guest A:
   (qemu) migrate -d tcp:xx.xx.xx.xx:5555
4. in guest A:
   (qemu) info migrate

Actual results:
----------------
After step 4, info migrate suggest that migrate failed, no dmesg on guest A and B.


Verify this bug on qemu-kvm-0.12.1.2-2.113.el6.pbfixed.x86_64.rpm at http://people.redhat.com/pbonzini/bz671100/ and qemu-kvm-0.12.1.2-2.146.el6 as the same steps above.

Actual results:
----------------
After step 4, info migrate suggest migrate completed.

Conclusion:
-------------
According to above results, this bug has been resolved.

Comment 11 Dor Laor 2011-03-24 10:31:29 UTC
*** Bug 629453 has been marked as a duplicate of this bug. ***

Comment 12 Paolo Bonzini 2011-05-05 13:18:02 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: a possible ambiguity in the migration format is handled incorrectly by the receiving end.

Consequence: in very rare cases migration may fail.

Fix: the ambiguity is resolved correctly.

Result: incoming migration data is interpreted correctly and migration succeeds.

Comment 13 errata-xmlrpc 2011-05-19 11:34:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 14 errata-xmlrpc 2011-05-19 13:01:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html