Bug 1452148

Summary: Op blockers don't work after postcopy migration
Product: Red Hat Enterprise Linux 7 Reporter: Kevin Wolf <kwolf>
Component: qemu-kvm-rhevAssignee: Kevin Wolf <kwolf>
Status: CLOSED ERRATA QA Contact: xianwang <xianwang>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: bugproxy, chayang, coli, hannsj_uhl, hreitz, jkachuck, juzhang, kwolf, michen, mrezanin, ngu, qzhang, virt-bugs, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-6.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1441684 Environment:
Last Closed: 2017-08-02 04:41:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1440030, 1441684, 1446211    
Attachments:
Description Flags
qemu log none

Description Kevin Wolf 2017-05-18 12:36:28 UTC
This clone of originally reported bug deals only with part concerning postcopy
migration, for which patches have been merged upstream. The original bug is left
open to fix the rest later.


+++ This bug was initially created as a clone of Bug #1441684 +++

In commit e3e0003a, upstream qemu disabled the op blocker assertions for the
2.9 release because some bugs could not be fixed in time. After rebasing to
2.9, we'll want to revert the commit and include proper fixes for the bugs.
Without the bugs fixed, op blockers can't keep the promises they are making.

Known problems with op blockers so far that need to be fixed before the commit
can be safely reverted:

* Old style block migration (migrate -b) triggers an assertion because it
  reuses the guest device's BlockBackend. During migration, this BlockBackend
  is not ready to be used yet (its real permissions are only enabled in
  blk_resume_after_migration() immediately before the guest starts to run).
  Block migration needs to use its own BlockBackend here.

* Postcopy migration. Commit d35ff5e6 added blk_resume_after_migration() in two
  places, but postcopy migration uses loadvm_postcopy_handle_run_bh(), which is
  the third one. In order to avoid assertion failures, the call needs to be
  added there as well. Without this fix, the guest device's op blockers are
  ineffective after postcopy migration.

Comment 1 xianwang 2017-05-22 03:41:20 UTC
Hi, Kevin,
After reviewing bug description, I am not clear how to reproduce this bug , could you help to give the steps of reproducing or verifying this bug? thanks

Comment 2 Kevin Wolf 2017-05-22 08:49:11 UTC
Essentially just do the same op blocker tests as in bug 1293975, just after
postcopy live migration.

Comment 3 Miroslav Rezanina 2017-05-23 08:16:22 UTC
Fix included in qemu-kvm-rhev-2.9.0-6.el7

Comment 5 xianwang 2017-05-26 09:15:01 UTC
This bug is verified pass for two scenarios, but I am not sure whether other scenario is expected.

Host:
3.10.0-671.el7.x86_64
qemu-kvm-rhev-2.9.0-6.el7.x86_64
seabios-1.10.2-3.el7.x86_64

scenario I:
# /usr/libexec/qemu-kvm -name vm1 -m 4096 -smp 2 -drive node-name=disk1,if=none,cache=none,media=disk,format=qcow2,werror=stop,rerror=stop,file=/root/mount_point/rhel73-64-virtio.qcow2 -device virtio-blk-pci,drive=disk1,id=virtio-blk-0 -device virtio-blk-pci,drive=disk1,id=virtio-blk-1 -monitor stdio
QEMU 2.9.0 monitor - type 'help' for more information
(qemu) qemu-kvm: -device virtio-blk-pci,drive=disk1,id=virtio-blk-1: Conflicts with use by /machine/peripheral/virtio-blk-0/virtio-backend as 'root', which does not allow 'write' on disk1

# /usr/libexec/qemu-kvm -name vm1 -m 4096 -smp 2 -drive node-name=disk1,if=none,cache=none,media=disk,format=qcow2,werror=stop,rerror=stop,file=/root/mount_point/rhel73-64-virtio.qcow2 -device virtio-blk-pci,drive=disk1,id=virtio-blk-0,share-rw=on -device virtio-blk-pci,drive=disk1,id=virtio-blk-1,share-rw=on -monitor stdio
QEMU 2.9.0 monitor - type 'help' for more information
(qemu) VNC server running on ::1:5900

scenario II:
src host:
# /usr/libexec/qemu-kvm -name vm1 -m 4096 -smp 2 -drive id=drive0,if=none,cache=none,media=disk,format=qcow2,werror=stop,rerror=stop,file=/root/rhel74-64-virtio.qcow2 -device virtio-blk-pci,drive=drive0,id=disk0 -drive id=drive1,if=none,cache=none,media=disk,format=qcow2,werror=stop,rerror=stop,file=/root/r1.qcow2 -device virtio-blk-pci,drive=drive1,id=disk1 -monitor stdio -qmp tcp:0:8881,server,nowait -vnc :1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15

dst host:
# /usr/libexec/qemu-kvm -name vm1 -m 4096 -smp 2 -drive id=drive0,if=none,cache=none,media=disk,format=qcow2,werror=stop,rerror=stop,file=/root/mount_point/rhel74-64-virtio.qcow2 -device virtio-blk-pci,drive=drive0,id=disk0 -drive id=drive1,if=none,cache=none,media=disk,format=qcow2,werror=stop,rerror=stop,file=/root/mount_point/r1.qcow2 -device virtio-blk-pci,drive=drive1,id=disk1 -monitor stdio -qmp tcp:0:8881,server,nowait -vnc :1 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:15 -incoming tcp:0:5801

1.in guest:
(1)execute a program to generate dirty pages
(2)stress --cpu 8 --io 8 --vm 5 --vm-bytes 256M
2.in src host:
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.66.10.208:5801
after generating some dirty pages, switch to postcopy mode.
(qemu) migrate_start_postcopy
at the same time, in src host execute following operation no matter whether migration completed.
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute": "blockdev-backup", "arguments": {"device": "drive0", "target": "drive1","sync": "full"}}
{"error": {"class": "GenericError", "desc": "Conflicts with use by drive1 as 'root', which does not allow 'write' on #block333"}}
{"execute": "blockdev-mirror", "arguments": {"device": "drive0", "target": "drive1","sync": "full"}}
{"error": {"class": "GenericError", "desc": "Conflicts with use by drive1 as 'root', which does not allow 'write' on #block333"}}
{"execute": "blockdev-snapshot-sync","arguments":{"device":"drive1","snapshot-file":"sn1","mode":"absolute-paths","format":"qcow2"}}
{"return": {}}
{"execute":"drive-mirror","arguments":{"device":"drive1","target":"m_top","format":"qcow2","mode":"absolute-paths","sync":"top"}}
{"timestamp": {"seconds": 1495788291, "microseconds": 250747}, "event": "BLOCK_JOB_READY", "data": {"device": "drive1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
{"return": {}}
{"execute": "block-job-complete", "arguments":{"device": "drive1"}}
{"return": {}}
{"timestamp": {"seconds": 1495788308, "microseconds": 4628}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive1", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}

{"execute": "blockdev-backup", "arguments": {"device": "drive0", "target": "drive1","sync": "full"}}
{"error": {"class": "GenericError", "desc": "Conflicts with use by drive1 as 'root', which does not allow 'write' on #block1885"}}
{"execute": "blockdev-mirror", "arguments": {"device": "drive0", "target": "drive1","sync": "full"}}
{"error": {"class": "GenericError", "desc": "Conflicts with use by drive1 as 'root', which does not allow 'write' on #block1885"}}
{"timestamp": {"seconds": 1495788418, "microseconds": 574931}, "event": "SHUTDOWN"}

for scenario II, "blockdev-backup" and "blockdev-mirror" operation are failed, but the "blockdev-snapshot-sync" and "drive-mirror" are successfully.
Kevin, does this test result are expected ? and, is this bug fixed?

Comment 6 Kevin Wolf 2017-05-26 09:26:30 UTC
Yes, this is correct behaviour. There is no reason for blockdev-snapshot-sync or
drive-mirror to be blocked because both commands do not change the disk content
that the guest sees.

Comment 7 xianwang 2017-05-26 09:40:47 UTC
(In reply to Kevin Wolf from comment #6)
> Yes, this is correct behaviour. There is no reason for
> blockdev-snapshot-sync or
> drive-mirror to be blocked because both commands do not change the disk
> content
> that the guest sees.

OK, thanks Kevin's reply in time, I will modify this bug to verifed.

Comment 8 Kevin Wolf 2017-05-31 09:59:47 UTC
*** Bug 1455986 has been marked as a duplicate of this bug. ***

Comment 9 IBM Bug Proxy 2017-05-31 10:20:20 UTC
Created attachment 1283692 [details]
qemu log

Comment 11 errata-xmlrpc 2017-08-02 04:41:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392