Bug 1506494 - Use pause-before-switchover migration capability
Summary: Use pause-before-switchover migration capability
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: zhe peng
URL:
Whiteboard:
Keywords:
Depends On: 1497120
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-26 08:28 UTC by Jiri Denemark
Modified: 2018-04-10 11:00 UTC (History)
15 users (show)

(edit)
Clone Of: 1497120
(edit)
Last Closed: 2018-04-10 10:59:09 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0704 None None None 2018-04-10 11:00 UTC

Description Jiri Denemark 2017-10-26 08:28:41 UTC
The following QEMU bug can only be fixed in cooperation with libvirt. This BZ tracks the libvirt changes needed to utilize the new migration capability and states.

+++ This bug was initially created as a clone of Bug #1497120 +++

Description of problem:
There's a race between the end of migration (and it inactivating disks) and the end of drive-mirroring that can lead to a fatal assert of the source.
It seems rare to trigger, but the theory is that it happens if there's write activity right near the end of migration as we stop the CPUs then those writes will take a little time to bubble through the drive-mirror, but if we hit the end of the migration code before then we end up calling bdrv_inactivate_all before those writes are done.

Version-Release number of selected component (if applicable):
Found upstream on >2.10, bz 1456086 might be this and hit it on 2.9

How reproducible:
Reported upstream, possible we've seen it once

Steps to Reproduce:
1. Not yet known

Actual results:
Source crashes with bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed

Expected results:
A successful migration

Additional info:
This was reported upstream by wangjie88@huawei.com but could have also been https://bugzilla.redhat.com/show_bug.cgi?id=1456086  which was closed because we didn't find a way to reproduce it

Current discussions with kwolf are that we're going to need some changes at the visible interface to libvirt to fix this; e.g. a new migration state and extra command.

--- Additional comment from Dr. David Alan Gilbert on 2017-10-25 16:25:54 UTC ---

upstream commits 93fbd0314ec060ffaf90..0331c8cabf6168aa263a

Comment 1 Jiri Denemark 2017-10-26 08:48:40 UTC
This is now implemented upstream by

commit b5ec33b832d8e3703cddc6598314de24344999c3
Refs: v3.8.0-243-gb5ec33b83
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Fri Oct 20 09:17:09 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Thu Oct 26 10:36:02 2017 +0200

    qemu: Add support for migrate-continue QMP command

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>

commit 6addde24bee54f2502a942b6cc4a5e61f5e6ba4a
Refs: v3.8.0-244-g6addde24b
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Fri Oct 20 10:11:32 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Thu Oct 26 10:36:02 2017 +0200

    qemu: Add pause-before-switchover migration capability

    This new capability enables a pause before device state serialization so
    that we can finish all block jobs without racing with the end of the
    migration. The pause is indicated by "pre-switchover" state. Once we're
    done QEMU enters "device" migration state.

    This patch just defines the new capability and QEMU migration states and
    their mapping to our job states.

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>

commit 32c29f10db3f7e847508564d8d91848f41f43abb
Refs: v3.8.0-245-g32c29f10d
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Fri Oct 20 10:12:21 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Thu Oct 26 10:36:02 2017 +0200

    qemu: Enabled pause-before-switchover migration capability

    QEMU identified a race condition between the device state serialization
    and the end of storage migration. Both QEMU and libvirt needs to be
    updated to fix this.

    Our migration work flow is modified so that after starting the migration
    we to wait for QEMU to enter "pre-switchover", "postcopy-active", or
    "completed" state. Once there, we cancel all block jobs as usual. But if
    QEMU is in "pre-switchover", we need to resume the migration afterwards
    and wait again for the real end (either "postcopy-active" or
    "completed" state).

    Old QEMU will just enter either "postcopy-active" or "completed"
    directly, which is still correctly handled even by new libvirt. The
    "pre-switchover" state will only be entered if QEMU supports it and the
    pause-before-switchover capability was enabled. Thus all combinations of
    libvirt and QEMU will work, but only new QEMU with new libvirt will
    avoid the race condition.

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>

Comment 3 zhe peng 2017-12-20 10:05:07 UTC
I can't reproduce this bug.

test with build:
libvirt-3.9.0-5.el7.x86_64
qemu-kvm-rhev-2.10.0-9.el7.x86_64

check the work flow:
The precopy flow is:
active->pre-switchover->device->completed

The postcopy flow is:
active->pre-switchover->postcopy-active->completed

s1: pre-copy migration
# virsh migrate --live rhel7.3 qemu+ssh://$target_host/system --verbose
Migration: [100 %]

check source libvirtd.log
2017-12-20 08:10:51.870+0000: 1712: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe65c000fb0 buf={"return": [{"state": false, "capability": "xbzrle"}, {"state": false, "capab      ility": "rdma-pin-all"}, {"state": false, "capability": "auto-converge"}, {"state": false, "capability": "zero-blocks"}, {"state": false, "capability": "compress"}, {"state": true, "capability":       "events"}, {"state": false, "capability": "postcopy-ram"}, {"state": false, "capability": "x-colo"}, {"state": false, "capability": "release-ram"}, {"state": false, "capability": "return-path"},       {"state": false, "capability": "pause-before-switchover"}], "id": "libvirt-2"}

-----
can see default capability "pause-before-switchover" is disable
-----

2017-12-20 08:10:51.870+0000: 1782: debug : qemuDomainCheckMigrationCapabilities:10741 : Found migration capability: 'pause-before-switchover'
-----
can found "pause-before-switchover"
-----
.....

2017-12-20 08:11:47.010+0000: 1714: debug : qemuMonitorSetMigrationCapability:3962 : capability=pause-before-switchover, state=1

2017-12-20 08:11:47.010+0000: 1714: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe65c000fb0 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-20"}

.....

2017-12-20 08:11:53.711+0000: 1712: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1513757513, "microseconds": 711606}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}]

2017-12-20 08:11:53.711+0000: 1712: debug : virJSONValueToString:1914 : result={"status":"pre-switchover"}

2017-12-20 08:11:53.711+0000: 1712: debug : qemuProcessHandleMigrationStatus:1644 : Migration of domain 0x7fe66437e190 rhel7.3 changed state to pre-switchover

-----
change to pre-switchover
-----

2017-12-20 08:11:53.712+0000: 1712: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe65c000fb0 buf={"return": {"expected-downtime": 300, "status": "pre-switchover", "setup-time      ": 7, "total-time": 6691, "ram": {"total": 1183662080, "postcopy-requests": 0, "dirty-sync-count": 2, "page-size": 4096, "remaining": 8675328, "mbps": 939.060971, "transferred": 788080284, "duplicate": 97166, "dirty-pages-rate": 43584, "skipped": 0, "normal-bytes": 785670144, "normal": 191814}}, "id": "libvirt-38"}

2017-12-20 08:11:53.712+0000: 1714: debug : qemuMonitorJSONCommandWithFd:298 : Send command '{"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-39"}' for write with FD -1

2017-12-20 08:11:53.712+0000: 1714: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe65c000fb0 msg={"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-39"}

s2: post-copy migration
# virsh migrate --live rhel7.3 qemu+ssh://$target_host/system --postcopy --postcopy-after-precopy --verbose 
Migration: [100 %]

2017-12-20 08:56:54.821+0000: 1716: debug : qemuMonitorSetMigrationCapability:3962 : capability=postcopy-ram, state=1
4175 2017-12-20 08:56:54.821+0000: 1716: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c000f740 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"postcopy-ram","state":true}]},"id":"libvirt-27"}

2017-12-20 08:56:54.821+0000: 1712: info : qemuMonitorIOWrite:544 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6c000f740 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability": "postcopy-ram","state":true}]},"id":"libvirt-27"}
4320 2017-12-20 08:56:54.822+0000: 1716: debug : qemuMonitorSetMigrationCapability:3962 : capability=pause-before-switchover, state=1

2017-12-20 08:56:54.822+0000: 1716: debug : qemuMonitorJSONCommandWithFd:298 : Send command '{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-28"}' for write with FD -1

2017-12-20 08:56:54.822+0000: 1716: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c000f740 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-28"}

2017-12-20 08:56:54.823+0000: 1712: info : qemuMonitorIOWrite:544 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6c000f740 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-28"}

2017-12-20 08:57:01.400+0000: 1714: debug : qemuDomainMigrateStartPostCopy:13690 : Starting post-copy

2017-12-20 08:57:01.400+0000: 1714: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c000f740 msg={"execute":"migrate-start-postcopy","id":"libvirt-46"}

2017-12-20 08:57:01.457+0000: 1712: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6c000f740 buf={"timestamp": {"seconds": 1513760221, "microseconds": 457307}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}

-----
start postcopy, then hit the pre-switchover
-----

2017-12-20 08:57:01.457+0000: 1712: debug : virJSONValueToString:1914 : result={"status":"pre-switchover"}

2017-12-20 08:57:01.457+0000: 1712: debug : qemuMonitorEmitMigrationStatus:1599 : mon=0x7fe6c000f740, status=pre-switchover

2017-12-20 08:57:01.457+0000: 1712: debug : qemuProcessHandleMigrationStatus:1644 : Migration of domain 0x7fe66437e190 rhel7.3 changed state to pre-switchover

2017-12-20 08:57:01.457+0000: 1716: debug : qemuMigrationCompleted:1534 : Migration paused before switchover


2017-12-20 08:57:01.458+0000: 1712: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6c000f740 buf={"return": {"expected-downtime": 1567, "status": "pre-switchover", "setup-time": 7, "total-time": 6625, "ram": {"total": 1183662080, "postcopy-requests": 0, "dirty-sync-count": 2, "page-size": 4096, "remaining": 68734976, "mbps": 979.37469, "transferred": 780253359, "duplicate": 100746, "dirty-pages-rate": 46863, "skipped": 0, "normal-bytes": 777826304, "normal": 189899}}, "id": "libvirt-47"}

2017-12-20 08:57:01.458+0000: 1716: debug : qemuMonitorMigrateContinue:4277 : status=pre-switchover

2017-12-20 08:57:01.458+0000: 1716: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c000f740 msg={"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-48"}

2017-12-20 08:57:01.458+0000: 1712: info : qemuMonitorIOWrite:544 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6c000f740 buf={"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-48"}

2017-12-20 08:57:01.459+0000: 1712: debug : qemuProcessHandleMigrationStatus:1644 : Migration of domain 0x7fe66437e190 rhel7.3 changed state to postcopy-active


2017-12-20 08:57:01.459+0000: 1716: debug : qemuMigrationCompleted:1544 : Migration switched to post-copy

s3: storage migration
# virsh migrate --live rhel7.3 qemu+ssh://$target_host/system --copy-storage-all --verbose 
Migration: [100 %]

check source libvirtd log
2017-12-20 09:32:32.853+0000: 1715: debug : qemuMonitorDriveMirror:3344 : device=drive-virtio-disk0, file=nbd:$target_host:49153:exportname=drive-virtio-disk0, format=raw, bandwidth=9223372036853727232, granularity=0, buf_size=0, flags=0x2

2017-12-20 09:32:32.853+0000: 1715: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c8011460 msg={"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"nbd:$target_host:49153:exportname=drive-virtio-disk0","speed":9223372036853727232,"sync":"full","mode":"existing","format":"raw"},"id":"libvirt-42"}

2017-12-20 09:32:32.853+0000: 1712: info : qemuMonitorIOWrite:544 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6c8011460 buf={"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"nbd:$target_host:49153:exportname=drive-virtio-disk0","speed":9223372036853727232,"sync":"full","mode":"existing","format":"raw"},"id":"libvirt-42"}


2017-12-20 09:33:55.225+0000: 1715: debug : qemuMonitorSetMigrationCapability:3962 : capability=pause-before-switchover, state=1


2017-12-20 09:33:55.225+0000: 1715: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c8011460 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-213"}


2017-12-20 09:33:55.225+0000: 1712: info : qemuMonitorIOWrite:544 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6c8011460 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-213"}

2017-12-20 09:34:01.965+0000: 1712: debug : qemuProcessHandleMigrationStatus:1644 : Migration of domain 0x7fe66437e190 rhel7.3 changed state to pre-switchover

2017-12-20 09:34:01.965+0000: 1715: debug : qemuMigrationCompleted:1534 : Migration paused before switchover

2017-12-20 09:34:01.966+0000: 1712: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6c8011460 buf={"return": {"expected-downtime": 300, "status": "pre-switchover", "setup-time": 7, "total-time": 6730, "ram": {"total": 1183662080, "postcopy-requests": 0, "dirty-sync-count": 2, "page-size": 4096, "remaining": 29597696, "mbps": 942.506621, "transferred": 792048355, "dupl      icate": 96197, "dirty-pages-rate": 44124, "skipped": 0, "normal-bytes": 789639168, "normal": 192783}}, "id": "libvirt-232"}

2017-12-20 09:34:01.967+0000: 1712: info : qemuMonitorIOWrite:544 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6c8011460 buf={"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-233"}

2017-12-20 09:34:01.985+0000: 1715: info : qemuMonitorSend:1064 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6c8011460 msg={"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-236"}

2017-12-20 09:34:02.249+0000: 1712: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6c8011460 buf={"timestamp": {"seconds": 1513762442, "microseconds": 248806}, "event": "MIGRATION", "data": {"status": "completed"}}

Comment 4 zhe peng 2017-12-26 03:06:54 UTC
Do some free testing, no issues found, move to verified.

Comment 8 errata-xmlrpc 2018-04-10 10:59:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704


Note You need to log in before you can comment on or make changes to this bug.