Bug 1593137

Summary: libvirtd crashed if destroy the guest on the source host in perform phase of live migration
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Fangge Jin <fjin>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: dyuan, fjin, lmen, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-7.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1615854 (view as bug list) Environment:
Last Closed: 2018-10-30 09:56:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1615854    
Attachments:
Description Flags
libvirtd log on source and target host none

Description yafu 2018-06-20 07:42:14 UTC
Description of problem:
libvirtd crashed if destroy the guest on the source host when migration back with --persistent

Version-Release number of selected component (if applicable):
libvirt-4.4.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Do migartion from A to B:
#virsh migrate iommu1 qemu+ssh://10.73.130.49/system --live --verbose --p2p --tunnelled --persistent

2.Set migration speed on the host B:
#virsh migrate-setspeed iommu1 5 

3.Migration back from B to A:
##virsh migrate iommu1 qemu+ssh://10.66.5.76/system --live --verbose  --p2p --tunnelled

4.Open another terimnal, destroy the guest on the host B while do migration:
#virsh destroy iommu1
error: Disconnected from qemu:///system due to end of file
error: Failed to destroy domain iommu1
error: End of file while reading data: Input/output error

5.Check the status of step 4:
#virsh migrate iommu1 qemu+ssh://10.66.5.76/system --live --verbose   --p2p --tunnelled
Migration: [ 28 %]error: Disconnected from qemu:///system due to end of file
error: End of file while reading data: Input/output error

Actual results:
libvirtd crashed if destroy the guest on the source host when migration back with --persistent

Expected results:
libvirtd should not crash and do migration successfully


Additional info:
1.If can not reproduce, just do migration A->B->A, and destroy guest on the host A;

2.The backtrace of libvirtd:
(gdb) bt
#0  0x00007f38b029d845 in _int_malloc () from /lib64/libc.so.6
#1  0x00007f38b02a078c in malloc () from /lib64/libc.so.6
#2  0x00007f38b02996bf in __GI__IO_str_overflow () from /lib64/libc.so.6
#3  0x00007f38b0297d41 in __GI__IO_default_xsputn () from /lib64/libc.so.6
#4  0x00007f38b0267e13 in vfprintf () from /lib64/libc.so.6
#5  0x00007f38b0332fc5 in __vasprintf_chk () from /lib64/libc.so.6
#6  0x00007f38b32c91fe in vasprintf (__ap=0x7f38a359ed20, __ap@entry=0x7f38a359ec30, __fmt=__fmt@entry=0x7f38b35667a2 "%s: %s", __ptr=0x7f38a359ee20) at /usr/include/bits/stdio2.h:210
#7  virVasprintfInternal (report=report@entry=false, domcode=0, filename=0x0, funcname=0x0, linenr=0, strp=0x7f38a359ee20, fmt=fmt@entry=0x7f38b35667a2 "%s: %s", list=list@entry=0x7f38a359ed20)
    at util/virstring.c:744
#8  0x00007f38b32c9323 in virAsprintfInternal (report=report@entry=false, domcode=domcode@entry=0, filename=filename@entry=0x0, funcname=funcname@entry=0x0, linenr=linenr@entry=0, 
    strp=strp@entry=0x7f38a359ee20, fmt=fmt@entry=0x7f38b35667a2 "%s: %s") at util/virstring.c:765
#9  0x00007f38b327b48f in virLogOutputToFd (source=<optimized out>, priority=<optimized out>, filename=<optimized out>, linenr=<optimized out>, funcname=<optimized out>, timestamp=<optimized out>, 
    metadata=0x0, flags=0, rawstr=0x7f3894024620 "OBJECT_DISPOSE: obj=0x7f38940267a0", str=0x7f38940263a0 "124776: info : virObjectUnref:346 : OBJECT_DISPOSE: obj=0x7f38940267a0\n", data=0x4)
    at util/virlog.c:730
#10 0x00007f38b327c97a in virLogVMessage (source=0x7f38b3868410 <virLogSelf>, priority=VIR_LOG_INFO, filename=0x7f38b355abe2 "util/virobject.c", linenr=346, 
    funcname=0x7f38b355ae75 <__func__.8245> "virObjectUnref", metadata=0x0, fmt=fmt@entry=0x7f38b355ac39 "OBJECT_DISPOSE: obj=%p", vargs=vargs@entry=0x7f38a359ef70) at util/virlog.c:651
#11 0x00007f38b327cf6f in virLogMessage (source=source@entry=0x7f38b3868410 <virLogSelf>, priority=priority@entry=VIR_LOG_INFO, filename=filename@entry=0x7f38b355abe2 "util/virobject.c", 
    linenr=linenr@entry=346, funcname=funcname@entry=0x7f38b355ae75 <__func__.8245> "virObjectUnref", metadata=metadata@entry=0x0, fmt=fmt@entry=0x7f38b355ac39 "OBJECT_DISPOSE: obj=%p") at util/virlog.c:551
#12 0x00007f38b329e7a6 in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:346
#13 0x00007f38b330dae1 in virDomainChrSourceDefFree (def=<optimized out>) at conf/domain_conf.c:2322
#14 0x00007f38b330db59 in virDomainChrDefFree (def=0x7f3894026710) at conf/domain_conf.c:2416
#15 0x00007f38b332ee7c in virDomainDefFree (def=0x7f3894023710) at conf/domain_conf.c:3013
#16 0x00007f38790b2604 in qemuMigrationSrcRun (driver=driver@entry=0x7f384c114390, vm=vm@entry=0x7f384c1c5330, persist_xml=persist_xml@entry=0x0, 
    cookiein=cookiein@entry=0x7f38940009a0 "<qemu-migration>\n  <name>iommu1</name>\n  <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n  <hostname>yafu-laptop</hostname>\n  <hostuuid>5cd9f881-5529-11cb-b989-b4c8b0f5dd17</hostuuid>\n  <graphics ty"..., cookieinlen=cookieinlen@entry=557, cookieout=cookieout@entry=0x7f38a359f5c0, cookieoutlen=cookieoutlen@entry=0x7f38a359f59c, flags=flags@entry=11, 
    resource=resource@entry=0, spec=spec@entry=0x7f38a359f3f0, dconn=dconn@entry=0x7f389400ac00, graphicsuri=graphicsuri@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, 
    migParams=migParams@entry=0x7f3894009280) at qemu/qemu_migration.c:3680
#17 0x00007f38790b4335 in qemuMigrationSrcPerformNative (driver=driver@entry=0x7f384c114390, vm=vm@entry=0x7f384c1c5330, persist_xml=persist_xml@entry=0x0, uri=uri@entry=0x7f389400a490 "tcp:yafu-laptop:49152", 
    cookiein=0x7f38940009a0 "<qemu-migration>\n  <name>iommu1</name>\n  <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n  <hostname>yafu-laptop</hostname>\n  <hostuuid>5cd9f881-5529-11cb-b989-b4c8b0f5dd17</hostuuid>\n  <graphics ty"..., cookieinlen=557, cookieout=cookieout@entry=0x7f38a359f5c0, cookieoutlen=cookieoutlen@entry=0x7f38a359f59c, flags=11, resource=resource@entry=0, dconn=dconn@entry=0x7f389400ac00, 
    graphicsuri=graphicsuri@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, migParams=migParams@entry=0x7f3894009280) at qemu/qemu_migration.c:3791
#18 0x00007f38790b6a0c in qemuMigrationSrcPerformPeer2Peer3 (flags=11, useParams=true, bandwidth=0, migParams=0x7f3894009280, nbdPort=0, migrate_disks=0x0, nmigrate_disks=0, listenAddress=<optimized out>, 
    graphicsuri=0x0, uri=<optimized out>, dname=0x0, persist_xml=0x0, xmlin=<optimized out>, vm=0x7f384c1c5330, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", dconn=0x7f389400ac00, 
    sconn=0x7f389c000c90, driver=0x7f384c114390) at qemu/qemu_migration.c:4213
#19 qemuMigrationSrcPerformPeer2Peer (v3proto=<synthetic pointer>, resource=0, dname=0x0, flags=11, migParams=0x7f3894009280, nbdPort=0, migrate_disks=0x0, nmigrate_disks=0, listenAddress=<optimized out>, 
    graphicsuri=0x0, uri=0x0, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", persist_xml=0x0, xmlin=<optimized out>, vm=0x7f384c1c5330, sconn=0x7f389c000c90, driver=0x7f384c114390)
    at qemu/qemu_migration.c:4517
#20 qemuMigrationSrcPerformJob (driver=driver@entry=0x7f384c114390, conn=conn@entry=0x7f389c000c90, vm=vm@entry=0x7f384c1c5330, xmlin=xmlin@entry=0x0, persist_xml=persist_xml@entry=0x0, 
    dconnuri=dconnuri@entry=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", uri=uri@entry=0x0, graphicsuri=graphicsuri@entry=0x0, listenAddress=listenAddress@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, 
    migrate_disks=migrate_disks@entry=0x0, nbdPort=nbdPort@entry=0, migParams=migParams@entry=0x7f3894009280, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, 
    cookieout=cookieout@entry=0x7f38a359fa88, cookieoutlen=cookieoutlen@entry=0x7f38a359fa7c, flags=flags@entry=11, dname=dname@entry=0x0, resource=resource@entry=0, v3proto=<optimized out>, v3proto@entry=true)
    at qemu/qemu_migration.c:4594
#21 0x00007f38790b74f4 in qemuMigrationSrcPerform (driver=driver@entry=0x7f384c114390, conn=0x7f389c000c90, vm=0x7f384c1c5330, xmlin=0x0, persist_xml=0x0, 
    dconnuri=dconnuri@entry=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", uri=0x0, graphicsuri=0x0, listenAddress=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=0x0, nbdPort=0, 
    migParams=migParams@entry=0x7f3894009280, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7f38a359fa88, cookieoutlen=cookieoutlen@entry=0x7f38a359fa7c, 
    flags=flags@entry=11, dname=0x0, resource=0, v3proto=v3proto@entry=true) at qemu/qemu_migration.c:4777
#22 0x00007f38790f6c55 in qemuDomainMigratePerform3Params (dom=0x7f38940476a0, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", params=<optimized out>, nparams=<optimized out>, cookiein=0x0, 
    cookieinlen=0, cookieout=0x7f38a359fa88, cookieoutlen=0x7f38a359fa7c, flags=11) at qemu/qemu_driver.c:12863
#23 0x00007f38b34fedcd in virDomainMigratePerform3Params (domain=domain@entry=0x7f38940476a0, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", params=0x7f3894009260, nparams=0, cookiein=0x0, 
---Type <return> to continue, or q <return> to quit---
    cookieinlen=0, cookieout=cookieout@entry=0x7f38a359fa88, cookieoutlen=cookieoutlen@entry=0x7f38a359fa7c, flags=11) at libvirt-domain.c:4976
#24 0x00005645af63ce46 in remoteDispatchDomainMigratePerform3Params (server=0x5645b180ffa0, msg=0x5645b185fe10, ret=0x7f3894047620, args=0x7f3894047640, rerr=0x7f38a359fbc0, client=<optimized out>)
    at remote/remote_daemon_dispatch.c:5436
#25 remoteDispatchDomainMigratePerform3ParamsHelper (server=0x5645b180ffa0, client=<optimized out>, msg=0x5645b185fe10, rerr=0x7f38a359fbc0, args=0x7f3894047640, ret=0x7f3894047620)
    at remote/remote_daemon_dispatch_stubs.h:8128
#26 0x00007f38b33e9a45 in virNetServerProgramDispatchCall (msg=0x5645b185fe10, client=0x5645b1860fa0, server=0x5645b180ffa0, prog=0x5645b185e630) at rpc/virnetserverprogram.c:437
#27 virNetServerProgramDispatch (prog=0x5645b185e630, server=server@entry=0x5645b180ffa0, client=client@entry=0x5645b1860fa0, msg=msg@entry=0x5645b185fe10) at rpc/virnetserverprogram.c:304
#28 0x00007f38b33f27aa in virNetServerProcessMsg (srv=srv@entry=0x5645b180ffa0, client=0x5645b1860fa0, prog=<optimized out>, msg=0x5645b185fe10) at rpc/virnetserver.c:145
#29 0x00007f38b33f2bf8 in virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5645b180ffa0) at rpc/virnetserver.c:166
#30 0x00007f38b32d4ec1 in virThreadPoolWorker (opaque=opaque@entry=0x5645b180f820) at util/virthreadpool.c:167
#31 0x00007f38b32d3c90 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#32 0x00007f38b05efdd5 in start_thread () from /lib64/libpthread.so.0
#33 0x00007f38b0319aed in clone () from /lib64/libc.so.6

Comment 2 yafu 2018-06-20 07:49:58 UTC
Created attachment 1453142 [details]
libvirtd log on source and target host

Comment 3 Jiri Denemark 2018-08-02 15:07:50 UTC
I was able to reproduce this bug even without any flags. That is, any
migration is affected, although sometimes libvirtd doesn't crash. It's caused
by writing to a freed memory so depending what was later placed at the same
address in the memory the daemon may either crash, abort with heap corruption,
or just keep running happily. The bug can be seen in valgrind, however, you
still need to be lucky enough to kill the domain while migration is in
virCondWait called from qemuMigrationSrcWaitForCompletion. That said the
reproducer is not 100% although sometimes it appears so.

    ==21535== Invalid write of size 4
    ==21535==    at 0x31DEDBF6: qemuMigrationSrcWaitForCompletion (qemu_migration.c:1587)
    ==21535==    by 0x31DF3C99: qemuMigrationSrcRun (qemu_migration.c:3588)
    ...

The following code in qemuMigrationSrcWaitForCompletion does the invalid write
to jobInfo->status in case virDomainObjWait returned -1 because the domain is
not running anymore and libvirt removed all run-time state including
priv->job.current:

    if (virDomainObjWait(vm) < 0) {
        jobInfo->status = QEMU_DOMAIN_JOB_STATUS_FAILED;
        return -2;
    }

Comment 4 Jiri Denemark 2018-08-02 15:08:54 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-August/msg00106.html

Comment 5 Jiri Denemark 2018-08-13 09:29:48 UTC
This is now fixed upstream by

commit dddcb601ebf97ef222a03bb27b2357e831e8a0cc
Refs: v4.6.0-93-gdddcb601eb
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Aug 2 16:56:02 2018 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Mon Aug 13 11:29:09 2018 +0200

    qemu_migration: Avoid writing to freed memory

    When a domain is killed on the source host while it is being migrated
    and libvirtd is waiting for the migration to finish (waiting for the
    domain condition in qemuMigrationSrcWaitForCompletion), the run-time
    state including priv->job.current may already be freed once
    virDomainObjWait returns with -1. Thus the priv->job.current pointer
    cached in jobInfo is no longer valid and setting jobInfo->status may
    crash the daemon.

    https://bugzilla.redhat.com/show_bug.cgi?id=1593137

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

Comment 8 Fangge Jin 2018-09-04 09:25:26 UTC
Reproduce this bug with libvirt-4.5.0-6.virtcov.el7.x86_64

Steps:
1. Start a guest, do migration:
# virsh start rhel7-min; sleep 5; virsh migrate-setspeed rhel7-min 10;virsh migrate rhel7-min qemu+ssh://10.66.5.190/system --live --verbose --p2p

2.When the migration is in perform phase(see the progress percentage), destroy guest on source host:
# virsh destroy rhel7-min
error: Disconnected from qemu:///system due to end of file
error: Failed to destroy domain rhel7-min
error: End of file while reading data: Input/output error

3.Check migration status:
Migration: [ 28 %]error: Disconnected from qemu:///system due to end of file
error: End of file while reading data: Input/output error

The reproduce rate is >=80%

Comment 9 Fangge Jin 2018-09-04 09:30:46 UTC
Verify with libvirt-4.5.0-8.virtcov.el7.x86_64

Repeat the steps in comment 8 for over 10 times, no crash

Comment 11 errata-xmlrpc 2018-10-30 09:56:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113