Bug 1560854 - Guest is left paused on source host sometimes if kill source libvirtd during live migration due to QEMU image locking
Summary: Guest is left paused on source host sometimes if kill source libvirtd during ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: Yumei Huang
URL:
Whiteboard:
Depends On:
Blocks: 1568407
TreeView+ depends on / blocked
 
Reported: 2018-03-27 06:43 UTC by Fangge Jin
Modified: 2018-11-01 11:07 UTC (History)
23 users (show)

Fixed In Version: qemu-kvm-rhev-2.12.0-7.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1568407 (view as bug list)
Environment:
Last Closed: 2018-11-01 11:07:15 UTC
Target Upstream Version:


Attachments (Terms of Use)
libvirtd log on both hosts (212.17 KB, application/x-bzip)
2018-03-27 06:43 UTC, Fangge Jin
no flags Details
domain XML extracted from the logs (4.36 KB, text/plain)
2018-03-27 08:16 UTC, Jiri Denemark
no flags Details

Description Fangge Jin 2018-03-27 06:43:10 UTC
Created attachment 1413567 [details]
libvirtd log on both hosts

Description of problem:
Guest is left paused on source host sometimes if kill source libvirtd during live migration

Version-Release number of selected component (if applicable):
libvirt-3.9.0-14.el7_5.2.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64

How reproducible:
>50%

Steps to Reproduce:
1.Start a guest on source host

2.Monitor guest event:
# virsh event foo --all --loop --timestamp

3.Do live migration
# virsh  -c qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system  "migrate-setmaxdowntime foo 20000;migrate foo qemu+ssh://intel-e52650-16-2.englab.nay.redhat.com/system --live --verbose"

4.Check guest event, when guest is "suspended migrated", kill source libvirtd

5.Check migration status, migration fails.

error: Disconnected from qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system due to end of file
error: End of file while reading data: Ncat: Broken pipe.: Input/output error

6. Check guest status on source, guest is paused 

# virsh list

# virsh qemu-monitor-command foo '{execute:query-status}';date
{return:{status:postmigrate,singlestep:false,running:false},id:libvirt-23}

7. Check guest status on target, guest doesn't exist

Actual results:
As step 6, guest is paused on source host after killing source libvirtd during migration

Expected results:
Guest is running on source host after killing source libvirtd during migration

Additional info:
Error info from libvirtd log:
2018-03-26 14:45:29.794+0000: 17083: error : qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU command 'cont': Failed to get "write" lock

Comment 2 Jiri Denemark 2018-03-27 08:16:09 UTC
Created attachment 1413602 [details]
domain XML extracted from the logs

Comment 3 Jiri Denemark 2018-03-27 11:55:48 UTC
The source libvirtd was killed just after migration entered "device" state:

2018-03-26 14:45:26.335+0000: 12956: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f050431a410 event={"timestamp": {"seconds": 1522075526, "microseconds": 334814}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}
2018-03-26 14:45:26.342+0000: 12960: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x7f050431a410 msg={"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-52"}
2018-03-26 14:45:26.345+0000: 12956: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f050431a410 event={"timestamp": {"seconds": 1522075526, "microseconds": 343420}, "event": "MIGRATION", "data": {"status": "device"}}
2018-03-26 14:45:26.347+0000: 12956: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f050431a410 event={"timestamp": {"seconds": 1522075526, "microseconds": 344015}, "event": "MIGRATION_PASS", "data": {"pass": 3}}

that is before libvirtd finished all its work on the source and before it
could notify the destination that the domain can be started there. In other
words, the domain is not able to run on the destination because, e.g., the
associated sanlock/virtlockd locks were not transferred yet.

Once libvirtd started again on the source a bit later

2018-03-26 14:45:28.685+0000: 17002: info : libvirt version: 3.9.0, package: 14.virtcov.el7_5.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-03-20-15:13:45, x86-034.build.eng.bos.redhat.com)

it sees an unfinished migration and knows it can just cancel the migration and
resume the domain on the source because the destination domain was not started
yet:

2018-03-26 14:45:29.669+0000: 17083: debug : qemuProcessReconnect:7224 : Reconnect monitor to 0x7f8268225870 'foo'
2018-03-26 14:45:29.765+0000: 17083: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x55e32471cc50 msg={"execute":"query-cpus","id":"libvirt-6"}
2018-03-26 14:45:29.761+0000: 17002: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x55e32471cc50 reply={"return": {"status": "postmigrate", "singlestep": false, "running": false}, "id": "libvirt-4"}
2018-03-26 14:45:29.782+0000: 17083: debug : qemuProcessRecoverMigrationOut:3110 : Cancelling unfinished migration of domain foo
2018-03-26 14:45:29.782+0000: 17083: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x55e32471cc50 msg={"execute":"migrate_cancel","id":"libvirt-13"}
2018-03-26 14:45:29.783+0000: 17002: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x55e32471cc50 reply={"return": {}, "id": "libvirt-13"}
2018-03-26 14:45:29.785+0000: 17083: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x55e32471cc50 msg={"execute":"cont","id":"libvirt-14"}
2018-03-26 14:45:29.793+0000: 17002: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x55e32471cc50 reply={"id": "libvirt-14", "error": {"class": "GenericError", "desc": "Failed to get \"write\" lock"}}
2018-03-26 14:45:29.794+0000: 17083: warning : qemuProcessRecoverMigrationOut:3158 : Could not resume domain foo

QEMU finished the migration in the meantime and the QEMU process on the
destination apparently acquired write locks for disk images and because it is
still running, the QEMU process on the source cannot reacquire them. The logs
on the destination confirm this:

2018-03-26 14:45:25.961+0000: 13131: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x7f8ba40116d0 msg={"execute":"migrate-incoming","arguments":{"uri":"tcp:[::]:49152"},"id":"libvirt-18"}
2018-03-26 14:45:25.962+0000: 13129: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f8ba40116d0 event={"timestamp": {"seconds": 1522075525, "microseconds": 962068}, "event": "MIGRATION", "data": {"status": "setup"}}
2018-03-26 14:45:26.018+0000: 13129: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f8ba40116d0 event={"timestamp": {"seconds": 1522075526, "microseconds": 18660}, "event": "MIGRATION", "data": {"status": "active"}}
2018-03-26 14:45:28.558+0000: 13129: info : qemuMonitorJSONIOProcessLine:208 : QEMU_MONITOR_RECV_EVENT: mon=0x7f8ba40116d0 event={"timestamp": {"seconds": 1522075528, "microseconds": 558129}, "event": "MIGRATION", "data": {"status": "completed"}}

In a while the client controlling this non-p2p migration notices the migration
was interrupted and tells the destination to kill the QEMU process:

2018-03-26 14:45:31.726+0000: 13132: debug : virDomainMigrateFinish3Params:5010 : dconn=0x7f8b9c016bf0, params=0x7f8b9c005b00, nparams=3, cookiein=(nil), cookieinlen=0, cookieout=0x7f8bb26bdb00, cookieoutlen=0x7f8bb26bdaf4, flags=0x1, cancelled=1
2018-03-26 14:45:31.727+0000: 13132: info : qemuMonitorSend:1061 : QEMU_MONITOR_SEND_MSG: mon=0x7f8ba40116d0 msg={"execute":"query-status","id":"libvirt-19"}
2018-03-26 14:45:31.728+0000: 13129: info : qemuMonitorJSONIOProcessLine:213 : QEMU_MONITOR_RECV_REPLY: mon=0x7f8ba40116d0 reply={"return": {"status": "paused", "singlestep": false, "running": false}, "id": "libvirt-19"}
2018-03-26 14:45:31.728+0000: 13132: debug : qemuProcessStop:6422 : Shutting down vm=0x7f8ba400a8b0 name=foo id=92 pid=6051, reason=failed, asyncJob=migration in, flags=0x1
2018-03-26 14:45:31.730+0000: 13132: debug : qemuProcessKill:6338 : vm=0x7f8ba400a8b0 name=foo pid=6051 flags=0x5
2018-03-26 14:45:31.954+0000: 13132: error : qemuMigrationFinish:5511 : migration successfully aborted

The end result is the domain stays paused on the source even though it is
killed on the destination and libvirt knows it is safe to resume the domain on
the source. The domain on the source can stayed paused only if libvirt already
told the destination that the migration finished, but didn't receive any
reply. In this split-brain case the domain can be left paused on both sides or
it may be paused on the source and running on the destination. It's up to the
higher management layer to decide what to do with both domains.

In other words QEMU should never acquire write locks by itself before we tell
it migration completed (either by "cont" command or even a special command
introduced for this case if it's needed) since it is not allowed to start or
write into the disks before then. As I said, until "cont" is send to QEMU,
sanlock locks may not be transferred yet and the destination process can be
killed any time while the source still holds the current state and can be
safely resumed.

And before anyone comes with the idea, we can't do some kind of coordinated
killing of the domains to make sure the domain on the destination is killed
before the one on the source is resumed. This whole scenario is about
recovering from network (or other) failures in which case there is no
connection (either direct or indirect) between the source and the destination
libvirt daemons which would be needed for such coordination.

Comment 4 Dr. David Alan Gilbert 2018-03-27 12:24:59 UTC
Adding in kwolf, since I think he has the best grip on the locking.

What I'm not sure of is whether it's actually safe to 'cont' the source until we know the destination really is dead; I'm thinking it has got the qcow2 files open at that point - so I'm not sure.
I'd thought 'device' state would have simplified this problem since the destination now knows it's in that last phase.

I don't understand how, in this situation, you made the decision to kill the destination and restart the source, vs the opposite case of allowing the destination to finish.
Avoiding the two of them both running is the more critical case.

Comment 5 Dr. David Alan Gilbert 2018-03-27 12:30:20 UTC
Also adding in Fam

Comment 6 Jiri Denemark 2018-03-27 18:14:39 UTC
> What I'm not sure of is whether it's actually safe to 'cont' the source
> until we know the destination really is dead

Well, that's the whole point of starting the domain on the destination in the
way it does not automatically run once migration is finished, isn't it? The
destination process will have files open, but it's not supposed to touch them
until it's instructed to do so.

> I'd thought 'device' state would have simplified this problem since the
> destination now knows it's in that last phase.

This state does not really help here. Sure, we could start using it to note
that it is no longer possible to resume the source. But this is not a solution
to anything, it would just turn this bug into "can't fix" and libvirt would
just intentionally leave the domain paused instead of trying to resume it.

> I don't understand how, in this situation, you made the decision to kill the
> destination and restart the source, vs the opposite case of allowing the
> destination to finish.

It's quite simple, each side can decide this on its own depending on the
current phase of the migration.

- the source resumes the domain iff it didn't send the confirmation the
  migration finished
- the destination kills the domain as long as it didn't get the confirmation
  from the source
- if the destination gets the confirmation, it will resume the domain (at
  which point a confirmation about it is sent back to the source)
- the source kills the domain if it got the confirmation from the
  destination

- the domain remains paused on the source if the source sent a confirmation,
  but didn't get a reply from the destination
- the domain never remains paused on the destination, it's either resumed or
  killed

Of course, a lot of these actions change into "leave the domain paused" when
the migration was in post-copy phase. And just for clarification, this is all
about recovering from a migration started earlier by a no longer running
instance of libvirtd.

Comment 7 Dr. David Alan Gilbert 2018-03-28 10:43:27 UTC
OK, after discussions with Fam I think I'm reasonably happy.
'cont' already includes a bdrv_invalidate_cache_all which will grab the lock - so that means we don't need to change cont which makes me happier.
All I have to do is avoid calling it at the end of migration in the case where we don't autostart

Comment 8 Dr. David Alan Gilbert 2018-03-28 17:03:25 UTC
Posted upstream:
migration: Don't activate block devices if using -S

Comment 10 Fangge Jin 2018-03-29 03:16:46 UTC
A reliable way to reproduce this bug:

1. Attach source libvirtd to gdb and set breakpoint in function qemuMigrationPerformPhase:

# gdb -p `pidof libvirtd`
(gdb) b qemu/qemu_migration.c:5014
Breakpoint 1 at 0x7f0b71e30c1d: file qemu/qemu_migration.c, line 5014.
(gdb) c
Continuing.

2. Start a guest on source host

3. Do migration and Ctrl+Z when it is in perform phase:
# virsh -c qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system  "migrate foo qemu+ssh://intel-e52650-16-2.englab.nay.redhat.com/system --live --verbose"
Migration: [  3 %]^Z
[1]+  Stopped

4. When source libvirtd reaches breakpoint, kill it and quit gdb
1)
Breakpoint 1, qemuMigrationPerformPhase (resource=0, flags=257, cookieoutlen=0x7f0b8364da7c, cookieout=0x7f0b8364da88, cookieinlen=299, 
    cookiein=0x7f0b5c0008c0 "<qemu-migration>\n  <name>foo</name>\n  <uuid>f16dd5c5-f0ca-40b2-82a2-38089cc0e12b</uuid>\n  <hostname>intel-e52650-16-2.englab.nay.redhat.com</hostname>\n  <hostuuid>cf1a4a00-4f88-11e3-b7a6-2c44fd2e29dc<"..., migParams=0x7f0b5c000b50, compression=0x7f0b5c000ba0, migrate_disks=0x0, nmigrate_disks=0, graphicsuri=0x0, 
    uri=0x7f0b5c000b10 "tcp:intel-e52650-16-2.englab.nay.redhat.com:49152", persist_xml=0x0, vm=0x7f0b6433f700, conn=0x7f0b60000a00, driver=0x7f0b6410b2f0) at qemu/qemu_migration.c:5014
5014	    if (ret < 0) {
(gdb) 

2) # pkill -9 libvirtd

3) (gdb) q

5. Recover "virsh"
# fg
virsh -c qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system "migrate foo qemu+ssh://intel-e52650-16-2.englab.nay.redhat.com/system --live --verbose "
Migration: [100 %]error: Disconnected from qemu+ssh://intel-i72600-03.qe.lab.eng.nay.redhat.com/system due to end of file
error: End of file while reading data: Ncat: Broken pipe.: Input/output error

6. Check guest on source host:
# virsh list
 Id    Name                           State
----------------------------------------------------
 7     foo                            paused

Comment 12 Dr. David Alan Gilbert 2018-04-10 15:07:14 UTC
Just reverted the upstream fix; so we need to think about this a bit more for a larger fix after discussion with kwolf and jdenemar.

Comment 13 Dr. David Alan Gilbert 2018-04-16 17:18:31 UTC
v2 posted, now tied to new migration capability 'late-block-activate'

Comment 15 Miroslav Rezanina 2018-07-04 08:24:02 UTC
Fix included in qemu-kvm-rhev-2.12.0-7.el7

Comment 20 Yumei Huang 2018-07-05 10:11:40 UTC
Verify:
qemu-kvm-rhev-2.12.0-7.el7

Scenario 1: Don't set late-block-activate on

Boot guest with "-S", and migrate to destination. After migration, both guest in src and dst are in paused status.  

Src: 
(qemu) info status 
VM status: paused (postmigrate)

Dst:
(qemu) info status 
VM status: paused (prelaunch)


Then resume src guest, get error message:

(qemu) c
Failed to get "write" lock
Is another process using the image?


Scenario 2: Set late-block-activate on before migrate

Boot guest with "-S", set late-block-activate on, 

(qemu) migrate_set_capability late-block-activate on

Then do migrate. After migration, resume guest in src, guest boot up successfully.

Comment 24 errata-xmlrpc 2018-11-01 11:07:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443


Note You need to log in before you can comment on or make changes to this bug.