Bug 671100 - possible migration failure due to erroneous interpretation of subsection
Summary: possible migration failure due to erroneous interpretation of subsection
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 629453 (view as bug list)
Depends On:
Blocks: 580954
TreeView+ depends on / blocked
 
Reported: 2011-01-20 10:36 UTC by Paolo Bonzini
Modified: 2011-05-19 13:01 UTC (History)
6 users (show)

Fixed In Version: qemu-kvm-0.12.1.2-2.143.el6
Doc Type: Bug Fix
Doc Text:
Cause: a possible ambiguity in the migration format is handled incorrectly by the receiving end. Consequence: in very rare cases migration may fail. Fix: the ambiguity is resolved correctly. Result: incoming migration data is interpreted correctly and migration succeeds.
Clone Of:
Environment:
Last Closed: 2011-05-19 11:34:39 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0534 0 normal SHIPPED_LIVE Important: qemu-kvm security, bug fix, and enhancement update 2011-05-19 11:20:36 UTC

Description Paolo Bonzini 2011-01-20 10:36:30 UTC
See http://permalink.gmane.org/gmane.comp.emulators.qemu/87767 and thread.

> Although it's rare to happen in live migration, when the head of a
> byte stream contains 0x05 which is the marker of subsection, the
> loader gets corrupted because vmstate_subsection_load() continues even
> the device doesn't require it.  This patch adds a checker whether
> subsection is needed, and skips following routines if not needed.

This was reported with Kemari, but it is not limited to it.  After a VMS_STRUCT a 0x5 byte is part of the parent data stream, but it is parsed as a subsection of the data stream.

Having subsection nested and under VMS_STRUCT is simply not going to work, so the patch linked above is the right solution.

Comment 3 Paolo Bonzini 2011-02-01 09:01:13 UTC
The bug is very rare; we can provide a patched package that will always fail without this patch and always pass with it.  Is that okay?

Comment 7 Mike Cao 2011-02-15 05:26:59 UTC
(In reply to comment #3)
> The bug is very rare; we can provide a patched package that will always fail
> without this patch and always pass with it.  Is that okay?

I will use this workround to verify this bug .Could you provide me the scratch build ?
Mike

Comment 8 Paolo Bonzini 2011-02-15 16:31:37 UTC
I placed them at http://people.redhat.com/pbonzini/bz671100/

The ".pbtest" rpms won't pass a save/restore (virsh save/virsh restore), the ".pbfixed" rpms will.

Unfortunately, due to bug 677712, you won't be able to restore a ".pbtest" vm with the ".pbfixed" rpms, which would also be a nice test.

Comment 9 Shaolong Hu 2011-02-18 08:42:04 UTC
Reproduced on qemu-kvm-0.12.1.2-2.113.el6.pbtest.x86_64.rpm at http://people.redhat.com/pbonzini/bz671100/ as following steps.

Reproduce Procedure:
---------------------
1. boot guest A with:

/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name RHEL-Server-6.0_64_raw -uuid `uuidgen` -rtc base=utc -boot order=cd -drive file=./RHEL-Server-6.0_64_raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:fa -usb -device usb-tablet,id=input0 -vnc :10 -monitor stdio

2. boot guest B with:

/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name RHEL-Server-6.0_64_raw -uuid `uuidgen` -rtc base=utc -boot order=cd -drive file=./RHEL-Server-6.0_64_raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:fa -usb -device usb-tablet,id=input0 -vnc :10 -monitor stdio -incoming tcp:0:5555

3. in guest A:
   (qemu) migrate -d tcp:xx.xx.xx.xx:5555
4. in guest A:
   (qemu) info migrate

Actual results:
----------------
After step 4, info migrate suggest that migrate failed, no dmesg on guest A and B.


Verify this bug on qemu-kvm-0.12.1.2-2.113.el6.pbfixed.x86_64.rpm at http://people.redhat.com/pbonzini/bz671100/ and qemu-kvm-0.12.1.2-2.146.el6 as the same steps above.

Actual results:
----------------
After step 4, info migrate suggest migrate completed.

Conclusion:
-------------
According to above results, this bug has been resolved.

Comment 11 Dor Laor 2011-03-24 10:31:29 UTC
*** Bug 629453 has been marked as a duplicate of this bug. ***

Comment 12 Paolo Bonzini 2011-05-05 13:18:02 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: a possible ambiguity in the migration format is handled incorrectly by the receiving end.

Consequence: in very rare cases migration may fail.

Fix: the ambiguity is resolved correctly.

Result: incoming migration data is interpreted correctly and migration succeeds.

Comment 13 errata-xmlrpc 2011-05-19 11:34:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html

Comment 14 errata-xmlrpc 2011-05-19 13:01:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0534.html


Note You need to log in before you can comment on or make changes to this bug.