Bug 1593137
| Summary: | libvirtd crashed if destroy the guest on the source host in perform phase of live migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | yafu <yafu> | ||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Fangge Jin <fjin> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.6 | CC: | dyuan, fjin, lmen, xuzhang | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-4.5.0-7.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1615854 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-10-30 09:56:58 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1615854 | ||||||
| Attachments: |
|
||||||
Created attachment 1453142 [details]
libvirtd log on source and target host
I was able to reproduce this bug even without any flags. That is, any
migration is affected, although sometimes libvirtd doesn't crash. It's caused
by writing to a freed memory so depending what was later placed at the same
address in the memory the daemon may either crash, abort with heap corruption,
or just keep running happily. The bug can be seen in valgrind, however, you
still need to be lucky enough to kill the domain while migration is in
virCondWait called from qemuMigrationSrcWaitForCompletion. That said the
reproducer is not 100% although sometimes it appears so.
==21535== Invalid write of size 4
==21535== at 0x31DEDBF6: qemuMigrationSrcWaitForCompletion (qemu_migration.c:1587)
==21535== by 0x31DF3C99: qemuMigrationSrcRun (qemu_migration.c:3588)
...
The following code in qemuMigrationSrcWaitForCompletion does the invalid write
to jobInfo->status in case virDomainObjWait returned -1 because the domain is
not running anymore and libvirt removed all run-time state including
priv->job.current:
if (virDomainObjWait(vm) < 0) {
jobInfo->status = QEMU_DOMAIN_JOB_STATUS_FAILED;
return -2;
}
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-August/msg00106.html This is now fixed upstream by
commit dddcb601ebf97ef222a03bb27b2357e831e8a0cc
Refs: v4.6.0-93-gdddcb601eb
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu Aug 2 16:56:02 2018 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Aug 13 11:29:09 2018 +0200
qemu_migration: Avoid writing to freed memory
When a domain is killed on the source host while it is being migrated
and libvirtd is waiting for the migration to finish (waiting for the
domain condition in qemuMigrationSrcWaitForCompletion), the run-time
state including priv->job.current may already be freed once
virDomainObjWait returns with -1. Thus the priv->job.current pointer
cached in jobInfo is no longer valid and setting jobInfo->status may
crash the daemon.
https://bugzilla.redhat.com/show_bug.cgi?id=1593137
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Ján Tomko <jtomko>
Reproduce this bug with libvirt-4.5.0-6.virtcov.el7.x86_64 Steps: 1. Start a guest, do migration: # virsh start rhel7-min; sleep 5; virsh migrate-setspeed rhel7-min 10;virsh migrate rhel7-min qemu+ssh://10.66.5.190/system --live --verbose --p2p 2.When the migration is in perform phase(see the progress percentage), destroy guest on source host: # virsh destroy rhel7-min error: Disconnected from qemu:///system due to end of file error: Failed to destroy domain rhel7-min error: End of file while reading data: Input/output error 3.Check migration status: Migration: [ 28 %]error: Disconnected from qemu:///system due to end of file error: End of file while reading data: Input/output error The reproduce rate is >=80% Verify with libvirt-4.5.0-8.virtcov.el7.x86_64 Repeat the steps in comment 8 for over 10 times, no crash Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |
Description of problem: libvirtd crashed if destroy the guest on the source host when migration back with --persistent Version-Release number of selected component (if applicable): libvirt-4.4.0-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1.Do migartion from A to B: #virsh migrate iommu1 qemu+ssh://10.73.130.49/system --live --verbose --p2p --tunnelled --persistent 2.Set migration speed on the host B: #virsh migrate-setspeed iommu1 5 3.Migration back from B to A: ##virsh migrate iommu1 qemu+ssh://10.66.5.76/system --live --verbose --p2p --tunnelled 4.Open another terimnal, destroy the guest on the host B while do migration: #virsh destroy iommu1 error: Disconnected from qemu:///system due to end of file error: Failed to destroy domain iommu1 error: End of file while reading data: Input/output error 5.Check the status of step 4: #virsh migrate iommu1 qemu+ssh://10.66.5.76/system --live --verbose --p2p --tunnelled Migration: [ 28 %]error: Disconnected from qemu:///system due to end of file error: End of file while reading data: Input/output error Actual results: libvirtd crashed if destroy the guest on the source host when migration back with --persistent Expected results: libvirtd should not crash and do migration successfully Additional info: 1.If can not reproduce, just do migration A->B->A, and destroy guest on the host A; 2.The backtrace of libvirtd: (gdb) bt #0 0x00007f38b029d845 in _int_malloc () from /lib64/libc.so.6 #1 0x00007f38b02a078c in malloc () from /lib64/libc.so.6 #2 0x00007f38b02996bf in __GI__IO_str_overflow () from /lib64/libc.so.6 #3 0x00007f38b0297d41 in __GI__IO_default_xsputn () from /lib64/libc.so.6 #4 0x00007f38b0267e13 in vfprintf () from /lib64/libc.so.6 #5 0x00007f38b0332fc5 in __vasprintf_chk () from /lib64/libc.so.6 #6 0x00007f38b32c91fe in vasprintf (__ap=0x7f38a359ed20, __ap@entry=0x7f38a359ec30, __fmt=__fmt@entry=0x7f38b35667a2 "%s: %s", __ptr=0x7f38a359ee20) at /usr/include/bits/stdio2.h:210 #7 virVasprintfInternal (report=report@entry=false, domcode=0, filename=0x0, funcname=0x0, linenr=0, strp=0x7f38a359ee20, fmt=fmt@entry=0x7f38b35667a2 "%s: %s", list=list@entry=0x7f38a359ed20) at util/virstring.c:744 #8 0x00007f38b32c9323 in virAsprintfInternal (report=report@entry=false, domcode=domcode@entry=0, filename=filename@entry=0x0, funcname=funcname@entry=0x0, linenr=linenr@entry=0, strp=strp@entry=0x7f38a359ee20, fmt=fmt@entry=0x7f38b35667a2 "%s: %s") at util/virstring.c:765 #9 0x00007f38b327b48f in virLogOutputToFd (source=<optimized out>, priority=<optimized out>, filename=<optimized out>, linenr=<optimized out>, funcname=<optimized out>, timestamp=<optimized out>, metadata=0x0, flags=0, rawstr=0x7f3894024620 "OBJECT_DISPOSE: obj=0x7f38940267a0", str=0x7f38940263a0 "124776: info : virObjectUnref:346 : OBJECT_DISPOSE: obj=0x7f38940267a0\n", data=0x4) at util/virlog.c:730 #10 0x00007f38b327c97a in virLogVMessage (source=0x7f38b3868410 <virLogSelf>, priority=VIR_LOG_INFO, filename=0x7f38b355abe2 "util/virobject.c", linenr=346, funcname=0x7f38b355ae75 <__func__.8245> "virObjectUnref", metadata=0x0, fmt=fmt@entry=0x7f38b355ac39 "OBJECT_DISPOSE: obj=%p", vargs=vargs@entry=0x7f38a359ef70) at util/virlog.c:651 #11 0x00007f38b327cf6f in virLogMessage (source=source@entry=0x7f38b3868410 <virLogSelf>, priority=priority@entry=VIR_LOG_INFO, filename=filename@entry=0x7f38b355abe2 "util/virobject.c", linenr=linenr@entry=346, funcname=funcname@entry=0x7f38b355ae75 <__func__.8245> "virObjectUnref", metadata=metadata@entry=0x0, fmt=fmt@entry=0x7f38b355ac39 "OBJECT_DISPOSE: obj=%p") at util/virlog.c:551 #12 0x00007f38b329e7a6 in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:346 #13 0x00007f38b330dae1 in virDomainChrSourceDefFree (def=<optimized out>) at conf/domain_conf.c:2322 #14 0x00007f38b330db59 in virDomainChrDefFree (def=0x7f3894026710) at conf/domain_conf.c:2416 #15 0x00007f38b332ee7c in virDomainDefFree (def=0x7f3894023710) at conf/domain_conf.c:3013 #16 0x00007f38790b2604 in qemuMigrationSrcRun (driver=driver@entry=0x7f384c114390, vm=vm@entry=0x7f384c1c5330, persist_xml=persist_xml@entry=0x0, cookiein=cookiein@entry=0x7f38940009a0 "<qemu-migration>\n <name>iommu1</name>\n <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n <hostname>yafu-laptop</hostname>\n <hostuuid>5cd9f881-5529-11cb-b989-b4c8b0f5dd17</hostuuid>\n <graphics ty"..., cookieinlen=cookieinlen@entry=557, cookieout=cookieout@entry=0x7f38a359f5c0, cookieoutlen=cookieoutlen@entry=0x7f38a359f59c, flags=flags@entry=11, resource=resource@entry=0, spec=spec@entry=0x7f38a359f3f0, dconn=dconn@entry=0x7f389400ac00, graphicsuri=graphicsuri@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, migParams=migParams@entry=0x7f3894009280) at qemu/qemu_migration.c:3680 #17 0x00007f38790b4335 in qemuMigrationSrcPerformNative (driver=driver@entry=0x7f384c114390, vm=vm@entry=0x7f384c1c5330, persist_xml=persist_xml@entry=0x0, uri=uri@entry=0x7f389400a490 "tcp:yafu-laptop:49152", cookiein=0x7f38940009a0 "<qemu-migration>\n <name>iommu1</name>\n <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n <hostname>yafu-laptop</hostname>\n <hostuuid>5cd9f881-5529-11cb-b989-b4c8b0f5dd17</hostuuid>\n <graphics ty"..., cookieinlen=557, cookieout=cookieout@entry=0x7f38a359f5c0, cookieoutlen=cookieoutlen@entry=0x7f38a359f59c, flags=11, resource=resource@entry=0, dconn=dconn@entry=0x7f389400ac00, graphicsuri=graphicsuri@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, migParams=migParams@entry=0x7f3894009280) at qemu/qemu_migration.c:3791 #18 0x00007f38790b6a0c in qemuMigrationSrcPerformPeer2Peer3 (flags=11, useParams=true, bandwidth=0, migParams=0x7f3894009280, nbdPort=0, migrate_disks=0x0, nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=0x0, uri=<optimized out>, dname=0x0, persist_xml=0x0, xmlin=<optimized out>, vm=0x7f384c1c5330, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", dconn=0x7f389400ac00, sconn=0x7f389c000c90, driver=0x7f384c114390) at qemu/qemu_migration.c:4213 #19 qemuMigrationSrcPerformPeer2Peer (v3proto=<synthetic pointer>, resource=0, dname=0x0, flags=11, migParams=0x7f3894009280, nbdPort=0, migrate_disks=0x0, nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=0x0, uri=0x0, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", persist_xml=0x0, xmlin=<optimized out>, vm=0x7f384c1c5330, sconn=0x7f389c000c90, driver=0x7f384c114390) at qemu/qemu_migration.c:4517 #20 qemuMigrationSrcPerformJob (driver=driver@entry=0x7f384c114390, conn=conn@entry=0x7f389c000c90, vm=vm@entry=0x7f384c1c5330, xmlin=xmlin@entry=0x0, persist_xml=persist_xml@entry=0x0, dconnuri=dconnuri@entry=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", uri=uri@entry=0x0, graphicsuri=graphicsuri@entry=0x0, listenAddress=listenAddress@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, nbdPort=nbdPort@entry=0, migParams=migParams@entry=0x7f3894009280, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7f38a359fa88, cookieoutlen=cookieoutlen@entry=0x7f38a359fa7c, flags=flags@entry=11, dname=dname@entry=0x0, resource=resource@entry=0, v3proto=<optimized out>, v3proto@entry=true) at qemu/qemu_migration.c:4594 #21 0x00007f38790b74f4 in qemuMigrationSrcPerform (driver=driver@entry=0x7f384c114390, conn=0x7f389c000c90, vm=0x7f384c1c5330, xmlin=0x0, persist_xml=0x0, dconnuri=dconnuri@entry=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", uri=0x0, graphicsuri=0x0, listenAddress=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=0x0, nbdPort=0, migParams=migParams@entry=0x7f3894009280, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7f38a359fa88, cookieoutlen=cookieoutlen@entry=0x7f38a359fa7c, flags=flags@entry=11, dname=0x0, resource=0, v3proto=v3proto@entry=true) at qemu/qemu_migration.c:4777 #22 0x00007f38790f6c55 in qemuDomainMigratePerform3Params (dom=0x7f38940476a0, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", params=<optimized out>, nparams=<optimized out>, cookiein=0x0, cookieinlen=0, cookieout=0x7f38a359fa88, cookieoutlen=0x7f38a359fa7c, flags=11) at qemu/qemu_driver.c:12863 #23 0x00007f38b34fedcd in virDomainMigratePerform3Params (domain=domain@entry=0x7f38940476a0, dconnuri=0x7f38940476e0 "qemu+ssh://10.66.5.76/system", params=0x7f3894009260, nparams=0, cookiein=0x0, ---Type <return> to continue, or q <return> to quit--- cookieinlen=0, cookieout=cookieout@entry=0x7f38a359fa88, cookieoutlen=cookieoutlen@entry=0x7f38a359fa7c, flags=11) at libvirt-domain.c:4976 #24 0x00005645af63ce46 in remoteDispatchDomainMigratePerform3Params (server=0x5645b180ffa0, msg=0x5645b185fe10, ret=0x7f3894047620, args=0x7f3894047640, rerr=0x7f38a359fbc0, client=<optimized out>) at remote/remote_daemon_dispatch.c:5436 #25 remoteDispatchDomainMigratePerform3ParamsHelper (server=0x5645b180ffa0, client=<optimized out>, msg=0x5645b185fe10, rerr=0x7f38a359fbc0, args=0x7f3894047640, ret=0x7f3894047620) at remote/remote_daemon_dispatch_stubs.h:8128 #26 0x00007f38b33e9a45 in virNetServerProgramDispatchCall (msg=0x5645b185fe10, client=0x5645b1860fa0, server=0x5645b180ffa0, prog=0x5645b185e630) at rpc/virnetserverprogram.c:437 #27 virNetServerProgramDispatch (prog=0x5645b185e630, server=server@entry=0x5645b180ffa0, client=client@entry=0x5645b1860fa0, msg=msg@entry=0x5645b185fe10) at rpc/virnetserverprogram.c:304 #28 0x00007f38b33f27aa in virNetServerProcessMsg (srv=srv@entry=0x5645b180ffa0, client=0x5645b1860fa0, prog=<optimized out>, msg=0x5645b185fe10) at rpc/virnetserver.c:145 #29 0x00007f38b33f2bf8 in virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5645b180ffa0) at rpc/virnetserver.c:166 #30 0x00007f38b32d4ec1 in virThreadPoolWorker (opaque=opaque@entry=0x5645b180f820) at util/virthreadpool.c:167 #31 0x00007f38b32d3c90 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #32 0x00007f38b05efdd5 in start_thread () from /lib64/libpthread.so.0 #33 0x00007f38b0319aed in clone () from /lib64/libc.so.6