Red Hat Bugzilla – Bug 671100
possible migration failure due to erroneous interpretation of subsection
Last modified: 2011-05-19 09:01:07 EDT
See http://permalink.gmane.org/gmane.comp.emulators.qemu/87767 and thread. > Although it's rare to happen in live migration, when the head of a > byte stream contains 0x05 which is the marker of subsection, the > loader gets corrupted because vmstate_subsection_load() continues even > the device doesn't require it. This patch adds a checker whether > subsection is needed, and skips following routines if not needed. This was reported with Kemari, but it is not limited to it. After a VMS_STRUCT a 0x5 byte is part of the parent data stream, but it is parsed as a subsection of the data stream. Having subsection nested and under VMS_STRUCT is simply not going to work, so the patch linked above is the right solution.
The bug is very rare; we can provide a patched package that will always fail without this patch and always pass with it. Is that okay?
(In reply to comment #3) > The bug is very rare; we can provide a patched package that will always fail > without this patch and always pass with it. Is that okay? I will use this workround to verify this bug .Could you provide me the scratch build ? Mike
I placed them at http://people.redhat.com/pbonzini/bz671100/ The ".pbtest" rpms won't pass a save/restore (virsh save/virsh restore), the ".pbfixed" rpms will. Unfortunately, due to bug 677712, you won't be able to restore a ".pbtest" vm with the ".pbfixed" rpms, which would also be a nice test.
Reproduced on qemu-kvm-0.12.1.2-2.113.el6.pbtest.x86_64.rpm at http://people.redhat.com/pbonzini/bz671100/ as following steps. Reproduce Procedure: --------------------- 1. boot guest A with: /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name RHEL-Server-6.0_64_raw -uuid `uuidgen` -rtc base=utc -boot order=cd -drive file=./RHEL-Server-6.0_64_raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:fa -usb -device usb-tablet,id=input0 -vnc :10 -monitor stdio 2. boot guest B with: /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 2G -smp 2 -name RHEL-Server-6.0_64_raw -uuid `uuidgen` -rtc base=utc -boot order=cd -drive file=./RHEL-Server-6.0_64_raw,if=none,id=drive-virtio-disk0,format=raw,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:7b:a2:fa -usb -device usb-tablet,id=input0 -vnc :10 -monitor stdio -incoming tcp:0:5555 3. in guest A: (qemu) migrate -d tcp:xx.xx.xx.xx:5555 4. in guest A: (qemu) info migrate Actual results: ---------------- After step 4, info migrate suggest that migrate failed, no dmesg on guest A and B. Verify this bug on qemu-kvm-0.12.1.2-2.113.el6.pbfixed.x86_64.rpm at http://people.redhat.com/pbonzini/bz671100/ and qemu-kvm-0.12.1.2-2.146.el6 as the same steps above. Actual results: ---------------- After step 4, info migrate suggest migrate completed. Conclusion: ------------- According to above results, this bug has been resolved.
*** Bug 629453 has been marked as a duplicate of this bug. ***
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: a possible ambiguity in the migration format is handled incorrectly by the receiving end. Consequence: in very rare cases migration may fail. Fix: the ambiguity is resolved correctly. Result: incoming migration data is interpreted correctly and migration succeeds.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0534.html