1568407 – Guest is left paused on source host sometimes if kill source libvirtd during live migration due to QEMU image locking

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1568407 - Guest is left paused on source host sometimes if kill source libvirtd during live migration due to QEMU image locking

Summary: Guest is left paused on source host sometimes if kill source libvirtd during ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Denemark
QA Contact:	Fangge Jin
Docs Contact:
URL:
Whiteboard:
Depends On:	1560854
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-17 12:40 UTC by Jiri Denemark
Modified:	2018-10-30 09:55 UTC (History)
CC List:	23 users (show)
Fixed In Version:	libvirt-4.5.0-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1560854
Environment:
Last Closed:	2018-10-30 09:53:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:3113	0	None	None	None	2018-10-30 09:55:50 UTC

Description Jiri Denemark 2018-04-17 12:40:15 UTC

+++ This bug was initially created as a clone of Bug #1560854 +++

Description of problem:
Guest is left paused on source host sometimes if kill source libvirtd during live migration

Version-Release number of selected component (if applicable):
libvirt-3.9.0-14.el7_5.2.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64

How reproducible:
>50%

Steps to Reproduce:
1.Start a guest on source host

2.Monitor guest event:
# virsh event foo --all --loop --timestamp

3.Do live migration
# virsh  -c qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system  "migrate-setmaxdowntime foo 20000;migrate foo qemu+ssh://intel-e52650-16-2.englab.nay.redhat.com/system --live --verbose"

4.Check guest event, when guest is "suspended migrated", kill source libvirtd

5.Check migration status, migration fails.

error: Disconnected from qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system due to end of file
error: End of file while reading data: Ncat: Broken pipe.: Input/output error

6. Check guest status on source, guest is paused 

# virsh list

# virsh qemu-monitor-command foo '{execute:query-status}';date
{return:{status:postmigrate,singlestep:false,running:false},id:libvirt-23}

7. Check guest status on target, guest doesn't exist

Actual results:
As step 6, guest is paused on source host after killing source libvirtd during migration

Expected results:
Guest is running on source host after killing source libvirtd during migration

Additional info:
Error info from libvirtd log:
2018-03-26 14:45:29.794+0000: 17083: error : qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU command 'cont': Failed to get "write" lock

--- Additional comment from Jiri Denemark on 2018-03-27 11:55:48 UTC ---

The source libvirtd was killed just after migration entered "device" state:

2018-03-26 14:45:26.335+0000: 12956: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f050431a410 event={"timestamp": {"seconds": 1522075526, "microseconds": 334814}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}
2018-03-26 14:45:26.342+0000: 12960: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x7f050431a410 msg={"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-52"}
2018-03-26 14:45:26.345+0000: 12956: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f050431a410 event={"timestamp": {"seconds": 1522075526, "microseconds": 343420}, "event": "MIGRATION", "data": {"status": "device"}}
2018-03-26 14:45:26.347+0000: 12956: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f050431a410 event={"timestamp": {"seconds": 1522075526, "microseconds": 344015}, "event": "MIGRATION_PASS", "data": {"pass": 3}}

that is before libvirtd finished all its work on the source and before it
could notify the destination that the domain can be started there. In other
words, the domain is not able to run on the destination because, e.g., the
associated sanlock/virtlockd locks were not transferred yet.

Once libvirtd started again on the source a bit later

2018-03-26 14:45:28.685+0000: 17002: info : libvirt version: 3.9.0, package: 14.virtcov.el7_5.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-03-20-15:13:45, x86-034.build.eng.bos.redhat.com)

it sees an unfinished migration and knows it can just cancel the migration and
resume the domain on the source because the destination domain was not started
yet:

2018-03-26 14:45:29.669+0000: 17083: debug : qemuProcessReconnect:7224 : Reconnect monitor to 0x7f8268225870 'foo'
2018-03-26 14:45:29.765+0000: 17083: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x55e32471cc50 msg={"execute":"query-cpus","id":"libvirt-6"}
2018-03-26 14:45:29.761+0000: 17002: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x55e32471cc50 reply={"return": {"status": "postmigrate", "singlestep": false, "running": false}, "id": "libvirt-4"}
2018-03-26 14:45:29.782+0000: 17083: debug : qemuProcessRecoverMigrationOut:3110 : Cancelling unfinished migration of domain foo
2018-03-26 14:45:29.782+0000: 17083: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x55e32471cc50 msg={"execute":"migrate_cancel","id":"libvirt-13"}
2018-03-26 14:45:29.783+0000: 17002: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x55e32471cc50 reply={"return": {}, "id": "libvirt-13"}
2018-03-26 14:45:29.785+0000: 17083: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x55e32471cc50 msg={"execute":"cont","id":"libvirt-14"}
2018-03-26 14:45:29.793+0000: 17002: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x55e32471cc50 reply={"id": "libvirt-14", "error": {"class": "GenericError", "desc": "Failed to get \"write\" lock"}}
2018-03-26 14:45:29.794+0000: 17083: warning : qemuProcessRecoverMigrationOut:3158 : Could not resume domain foo

QEMU finished the migration in the meantime and the QEMU process on the
destination apparently acquired write locks for disk images and because it is
still running, the QEMU process on the source cannot reacquire them. The logs
on the destination confirm this:

2018-03-26 14:45:25.961+0000: 13131: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x7f8ba40116d0 msg={"execute":"migrate-incoming","arguments":{"uri":"tcp:[::]:49152"},"id":"libvirt-18"}
2018-03-26 14:45:25.962+0000: 13129: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f8ba40116d0 event={"timestamp": {"seconds": 1522075525, "microseconds": 962068}, "event": "MIGRATION", "data": {"status": "setup"}}
2018-03-26 14:45:26.018+0000: 13129: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f8ba40116d0 event={"timestamp": {"seconds": 1522075526, "microseconds": 18660}, "event": "MIGRATION", "data": {"status": "active"}}
2018-03-26 14:45:28.558+0000: 13129: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f8ba40116d0 event={"timestamp": {"seconds": 1522075528, "microseconds": 558129}, "event": "MIGRATION", "data": {"status": "completed"}}

In a while the client controlling this non-p2p migration notices the migration
was interrupted and tells the destination to kill the QEMU process:

2018-03-26 14:45:31.726+0000: 13132: debug : virDomainMigrateFinish3Params:5010 : dconn=0x7f8b9c016bf0, params=0x7f8b9c005b00, nparams=3, cookiein=(nil), cookieinlen=0, cookieout=0x7f8bb26bdb00, cookieoutlen=0x7f8bb26bdaf4, flags=0x1, cancelled=1
2018-03-26 14:45:31.727+0000: 13132: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x7f8ba40116d0 msg={"execute":"query-status","id":"libvirt-19"}
2018-03-26 14:45:31.728+0000: 13129: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7f8ba40116d0 reply={"return": {"status": "paused", "singlestep": false, "running": false}, "id": "libvirt-19"}
2018-03-26 14:45:31.728+0000: 13132: debug : qemuProcessStop:6422 : Shutting down vm=0x7f8ba400a8b0 name=foo id=92 pid=6051, reason=failed, asyncJob=migration in, flags=0x1
2018-03-26 14:45:31.730+0000: 13132: debug : qemuProcessKill:6338 : vm=0x7f8ba400a8b0 name=foo pid=6051 flags=0x5
2018-03-26 14:45:31.954+0000: 13132: error : qemuMigrationFinish:5511 : migration successfully aborted

The end result is the domain stays paused on the source even though it is
killed on the destination and libvirt knows it is safe to resume the domain on
the source. The domain on the source can stayed paused only if libvirt already
told the destination that the migration finished, but didn't receive any
reply. In this split-brain case the domain can be left paused on both sides or
it may be paused on the source and running on the destination. It's up to the
higher management layer to decide what to do with both domains.

In other words QEMU should never acquire write locks by itself before we tell
it migration completed (either by "cont" command or even a special command
introduced for this case if it's needed) since it is not allowed to start or
write into the disks before then. As I said, until "cont" is send to QEMU,
sanlock locks may not be transferred yet and the destination process can be
killed any time while the source still holds the current state and can be
safely resumed.

And before anyone comes with the idea, we can't do some kind of coordinated
killing of the domains to make sure the domain on the destination is killed
before the one on the source is resumed. This whole scenario is about
recovering from network (or other) failures in which case there is no
connection (either direct or indirect) between the source and the destination
libvirt daemons which would be needed for such coordination.

--- Additional comment from Dr. David Alan Gilbert on 2018-03-28 10:43:27 UTC ---

OK, after discussions with Fam I think I'm reasonably happy.
'cont' already includes a bdrv_invalidate_cache_all which will grab the lock - so that means we don't need to change cont which makes me happier.
All I have to do is avoid calling it at the end of migration in the case where we don't autostart

Comment 1 Jiri Denemark 2018-04-17 12:42:45 UTC

We'll need to enable a new migration capability (late-block-activate) to request delayed activation of block devices.

Comment 3 Jiri Denemark 2018-06-04 14:52:22 UTC

Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-June/msg00228.html

Comment 4 Jiri Denemark 2018-06-05 07:44:49 UTC

Fixed upstream by

commit 4370ac84f86e8f2d71a1d1e1cc65f7238359c36e
Refs: v4.4.0-89-g4370ac84f8
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Apr 17 14:46:29 2018 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Tue Jun 5 09:39:24 2018 +0200

	qemu: Fix domain resume after failed migration

    Libvirt relies on being able to kill the destination domain and resume
    the source one during migration until we called "cont" on the
    destination. Unfortunately, QEMU automatically activates block devices
    at the end of migration even when it's called with -S. This wasn't a big
    issue in the past since the guest is not running and thus no data are
    written to the block devices. However, when QEMU introduced its internal
    block device locks, we can no longer resume the source domain once the
    destination domain already activated the block devices (and thus
    acquired all locks) unless the destination domain is killed first.

    Since it's impossible to synchronize the destination and the source
    libvirt daemons after a failed migration, QEMU introduced a new
    migration capability called "late-block-activate" which ensures QEMU
    won't activate block devices until it gets "cont". The only thing we
    need to do is to enable this capability whenever QEMU supports it.

    https://bugzilla.redhat.com/show_bug.cgi?id=1568407

    QEMU commit implementing the capability: v2.12.0-952-g0f073f44df

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

Comment 6 Fangge Jin 2018-07-26 11:34:17 UTC

Verified with libvirt-4.5.0-4.el7.x86_64 and qemu-kvm-rhev-2.12.0-8.el7.x86_64.
Steps are same as https://bugzilla.redhat.com/show_bug.cgi?id=1560854#c10, except  that the breakpoint changed to:qemu/qemu_migration.c:4695

Comment 7 Fangge Jin 2018-07-27 01:44:55 UTC

Also did regression test for cross migration, migration succeeds:
7.6<->7.5
7.6<->7.4

Comment 9 errata-xmlrpc 2018-10-30 09:53:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113

Note You need to log in before you can comment on or make changes to this bug.

chayang
coli
dgilbert
dyuan
famz
fjin
hhuang
jdenemar
jinzhao
juzhang
knoel
kwolf
lmen
michen
ngu
pingl
qzhang
virt-bugs
virt-maint
xianwang
xuzhang
yafu
yuhuang