Bug 725565

Summary: migration subsections are still broken
Product: Red Hat Enterprise Linux 6 Reporter: Paolo Bonzini <pbonzini>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: anderson, juzhang, kwolf, mkenneth, qzhang, shu, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.206.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 15:54:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 748554    

Description Paolo Bonzini 2011-07-25 20:49:10 UTC
While a few subsection problems involving bogus migration failures have been fixed in 6.1, others remain in the form of silent migration success when only part of the data stream has been read.  The problem occurs when the destination expects no subsections and reads one into subsequent fields of the data stream. If a zero byte happens at the right time, it is interpreted as end-of-data and migration erroneously succeeds.

The fix is to add explicit end-of-subsections markers so that unexpected subsections can cause migration to fail.

Kevin reproduced this on a 6.2 to 6.1 migration, but no testcase is available right now. It is possible to construct one artificially; if required, patched packages triggering the bug can be provided to QE.

Comment 1 Kevin Wolf 2011-07-26 08:14:54 UTC
(In reply to comment #0)
> Kevin reproduced this on a 6.2 to 6.1 migration, but no testcase is available
> right now. It is possible to construct one artificially; if required, patched
> packages triggering the bug can be provided to QE.

For the record: This was an IDE disk with werror=stop,rerror=stop, VM stopped after an artificially provoked I/O error and migration while the VM was still stopped. This means that the error_status subsection is transferred, which 6.1 doesn't know.

I think this way of reproducing it should still work.

Comment 2 Paolo Bonzini 2011-07-26 09:16:23 UTC
Yes, the problem is that you need the rhel6.2.0 machine in order to trigger the new migration format.  So migrating to 6.1 would fail anyway due to the unknown machine type.  I was thinking of giving QE a patched 6.1 package which also defines the rhel6.2.0 machine type, so that they can reproduce the bug.

Comment 7 Shaolong Hu 2011-10-20 13:06:06 UTC
Reproduce on qemu-kvm-0.12.1.2-2.196.el6.x86_64:

Steps:
--------
1.boot guest on src with:
/usr/libexec/qemu-kvm -enable-kvm -M rhel6.2.0 -smp 4 -m 6G -name rhel6.2 -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=RHEL-Server-6.2-64-virtio.qcow2,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=qcow2 -device ide-drive,drive=drive-virtio-disk0,id=device-virtio-disk0 -boot order=cd -monitor stdio -vnc :10

2.boot guest on des with same command line except for "-M rhel6.1.0 -incoming tcp:0:5555"

3.consume guest block space to cause guest paused:
(qemu) block I/O error in device 'drive-virtio-disk0': No space left on device (28)

4.migrate to des, migration succeeds.


Verify on:
---------------
src host: qemu-kvm-0.12.1.2-2.199.el6.x86_64
des host: qemu-kvm-0.12.1.2-2.160.el6.shu.bz725565.x86_64

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3726275

*This scratch build based on qemu-kvm-0.12.1.2-2.160.el6_1.8, patched with patches in Comment 4 and patch "pc: add rhel 6.2 pc and make it the default".

After step 4, migration fails, on des:
(qemu) qemu: warning: error while loading state for instance 0x0 of device '0000:00:01.1/ide'
load of migration failed

On src:
(qemu) info migrate
Migration status: completed

Better on src, info migrate shows migration failed, however, this won't keep the bug from verified.


Additional info:
--------------------
"-M rhel6.2.0" to "-M rhel6.1.0" migration fails if add device has unknown feature, like in step 1, if add a virtio NIC, migration fails, so i think as long as migration can succeed, this is enough to reproduce the bug.

However, to verify the bug, need using the scratch build on des.

Comment 10 Eduardo Habkost 2011-10-28 17:59:10 UTC
Moving to ON_QA because Errata Tool did not do it

Comment 12 errata-xmlrpc 2011-12-06 15:54:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1531.html