Bug 1949869
Summary: | Migration hangs if vm is shutdown during live migration | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Fangge Jin <fjin> | ||||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fangge Jin <fjin> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 8.4 | CC: | jdenemar, lmen, mzamazal, virt-maint, xuzhang, ymankad | ||||||
Target Milestone: | rc | Keywords: | Regression, Triaged, VerifiedUpstream, ZStream | ||||||
Target Release: | 8.4 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1983694 (view as bug list) | Environment: | |||||||
Last Closed: | 2021-11-16 07:52:40 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | 7.6.0 | ||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1983694 | ||||||||
Attachments: |
|
Created attachment 1772108 [details]
logs from both src and dest hosts
It can also be reproduced by "virsh destroy vm" libvirtd can respond to other virsh command, it is just migration process hangs. A fix was sent upstream for review: https://listman.redhat.com/archives/libvir-list/2021-July/msg00451.html *** Bug 1967715 has been marked as a duplicate of this bug. *** This issue is fixed upstream by commit 364995ed5708b71f2cab09c0416a66013f0a283f Refs: v7.5.0-117-g364995ed57 Author: Jiri Denemark <jdenemar> AuthorDate: Fri Jul 16 15:52:50 2021 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Jul 19 15:49:16 2021 +0200 qemu: Signal domain condition in qemuProcessStop a bit later Signaling the condition before vm->def->id is reset to -1 is dangerous: in case a waiting thread wakes up, it does not see anything interesting (the domain is still marked as running) and just enters virDomainObjWait where it waits forever because the condition will never be signalled again. Originally it was impossible to get into such situation because the vm object was locked all the time between signaling the condition and resetting vm->def->id, but after commit 860a999802 released in 6.8.0, qemuDomainObjStopWorker called in qemuProcessStop between virDomainObjBroadcast and setting vm->def->id to -1 unlocks the vm object giving other threads a chance to wake up and possibly hang. In real world, this can be easily reproduced by killing, destroying, or just shutting down (from the guest OS) a domain while it is being migrated somewhere else. The migration job would never finish. So let's make sure we delay signaling the domain condition to the point when a woken up thread can detect the domain is not active anymore. https://bugzilla.redhat.com/show_bug.cgi?id=1949869 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Michal Privoznik <mprivozn> Pre-verified with libvirt-7.6.0-1.fc34.x86_64 Verified with libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684 |
Created attachment 1772107 [details] libvirtd backtrace Description of problem: Do live migration, shutdown vm from inside vm before migration completes, then src libvirtd will hang there. Version-Release number of selected component (if applicable): libvirt-7.0.0-13 How reproducible: 100% Steps to Reproduce: 1.Start a vm and do live migration # virsh migrate vm1 qemu+ssh://***/system --live --verbose --p2p 2.Shutdown vm from inside vm: [in vm] # shutdown -h now 3. Check migration result, it hangs there # virsh migrate vm1 qemu+ssh://***/system --live --verbose --p2p Migration: [ 80 %]^C^C^C^C^C Actual results: Expected results: Additional info: 1.Can't reproduce it with libvirt-6.6.0-13.1, it can report error and return: Migration: [ 85 %]error: operation failed: domain is not running 2. Backtrace: Thread 5 (Thread 0x7fba47b04700 (LWP 245983)): #0 0x00007fba632982fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fba66f9922a in virCondWait (c=c@entry=0x7fb9ec3a60e0, m=m@entry=0x7fb9ec3a60b8) at ../src/util/virthread.c:148 #2 0x00007fba66fcf485 in virDomainObjWait (vm=vm@entry=0x7fb9ec3a60a0) at ../src/conf/domain_conf.c:3758 #3 0x00007fba1d8fc05d in qemuMigrationSrcWaitForCompletion (driver=driver@entry=0x7fb9ec11d7f0, vm=vm@entry=0x7fb9ec3a60a0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_MIGRATION_OUT, dconn=dconn@entry=0x7fb9ec016490, flags=flags@entry=8) at ../src/qemu/qemu_migration.c:1878 #4 0x00007fba1d902844 in qemuMigrationSrcRun (driver=0x7fb9ec11d7f0, vm=0x7fb9ec3a60a0, persist_xml=<optimized out>, cookiein=<optimized out>, cookieinlen=<optimized out>, cookieout=0x7fba47b03558, cookieoutlen=0x7fba47b03528, flags=3, resource=0, spec=0x7fba47b03380, dconn=0x7fb9ec016490, graphicsuri=<optimized out>, nmigrate_disks=0, migrate_disks=0x0, migParams=<optimized out>, nbdURI=<optimized out>) at ../src/qemu/qemu_migration.c:4261 #5 0x00007fba1d9035d4 in qemuMigrationSrcPerformNative (driver=0x7fb9ec11d7f0, vm=0x7fb9ec3a60a0, persist_xml=0x0, uri=<optimized out>, cookiein=0x7fba38010a70 "<qemu-migration>\n <name>vm1</name>\n <uuid>2907ada3-fb7b-43e1-be05-004bc37f2df3</uuid>\n <hostname>fjin-3-vgpu</hostname>\n <hostuuid>df9986b0-c0f7-11e6-9c43-bc0000b40000</hostuuid>\n <graphics type="..., cookieinlen=639, cookieout=0x7fba47b03558, cookieoutlen=0x7fba47b03528, flags=3, resource=0, dconn=0x7fb9ec016490, graphicsuri=0x0, nmigrate_disks=0, migrate_disks=0x0, migParams=0x7fba38061840, nbdURI=0x0) at ../src/qemu/qemu_migration.c:4471 #6 0x00007fba1d9051f3 in qemuMigrationSrcPerformPeer2Peer3 (flags=<optimized out>, useParams=true, bandwidth=<optimized out>, migParams=0x7fba38061840, nbdURI=0x0, nbdPort=0, migrate_disks=0x0, nmigrate_disks=<optimized out>, listenAddress=<optimized out>, graphicsuri=0x0, uri=<optimized out>, dname=0x0, persist_xml=0x0, xmlin=<optimized out>, vm=0x7fb9ec3a60a0, dconnuri=0x7fba38011b20 "qemu+ssh://fjin3-vgpu.usersys.redhat.com/system", dconn=0x7fb9ec016490, sconn=0x7fba3400a270, driver=0x7fb9ec11d7f0) at ../src/qemu/qemu_migration.c:4888 #7 qemuMigrationSrcPerformPeer2Peer (v3proto=<synthetic pointer>, resource=<optimized out>, dname=0x0, flags=3, migParams=0x7fba38061840, nbdURI=0x0, nbdPort=0, migrate_disks=0x0, nmigrate_disks=<optimized out>, listenAddress=<optimized out>, graphicsuri=0x0, uri=<optimized out>, dconnuri=0x7fba38011b20 "qemu+ssh://fjin3-vgpu.usersys.redhat.com/system", persist_xml=0x0, xmlin=<optimized out>, vm=0x7fb9ec3a60a0, sconn=0x7fba3400a270, driver=0x7fb9ec11d7f0) at ../src/qemu/qemu_migration.c:5197 #8 qemuMigrationSrcPerformJob (driver=0x7fb9ec11d7f0, conn=0x7fba3400a270, vm=0x7fb9ec3a60a0, xmlin=<optimized out>, persist_xml=0x0, dconnuri=0x7fba38011b20 "qemu+ssh://fjin3-vgpu.usersys.redhat.com/system", uri=<optimized out>, graphicsuri=<optimized out>, listenAddress=<optimized out>, nmigrate_disks=<optimized out>, migrate_disks=<optimized out>, nbdPort=0, nbdURI=<optimized out>, migParams=<optimized out>, cookiein=<optimized out>, cookieinlen=0, cookieout=<optimized out>, cookieoutlen=<optimized out>, flags=<optimized out>, dname=<optimized out>, resource=<optimized out>, v3proto=<optimized out>) at ../src/qemu/qemu_migration.c:5272 #9 0x00007fba1d90592f in qemuMigrationSrcPerform (driver=driver@entry=0x7fb9ec11d7f0, conn=0x7fba3400a270, vm=0x7fb9ec3a60a0, xmlin=0x0, persist_xml=0x0, dconnuri=dconnuri@entry=0x7fba38011b20 "qemu+ssh://fjin3-vgpu.usersys.redhat.com/system", uri=0x0, graphicsuri=0x0, listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, nbdURI=0x0, migParams=0x7fba38061840, cookiein=0x0, cookieinlen=0, cookieout=0x7fba47b038f8, cookieoutlen=0x7fba47b038ec, flags=3, dname=0x0, resource=0, v3proto=true) at ../src/qemu/qemu_migration.c:5453 #10 0x00007fba1d8bdb12 in qemuDomainMigratePerform3Params (dom=0x7fba38009500, dconnuri=0x7fba38011b20 "qemu+ssh://fjin3-vgpu.usersys.redhat.com/system", params=<optimized out>, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=0x7fba47b038f8, cookieoutlen=0x7fba47b038ec, flags=3) at ../src/qemu/qemu_driver.c:11840 #11 0x00007fba6715c645 in virDomainMigratePerform3Params (domain=domain@entry=0x7fba38009500, dconnuri=0x7fba38011b20 "qemu+ssh://fjin3-vgpu.usersys.redhat.com/system", params=0x0, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=0x7fba47b038f8, cookieoutlen=0x7fba47b038ec, flags=3) at ../src/libvirt-domain.c:5120 #12 0x0000561583e35333 in remoteDispatchDomainMigratePerform3Params (server=<optimized out>, msg=0x5615852ac9c0, ret=0x7fba3805acd0, ret=0x7fba3805acd0, args=0x7fba38012030, rerr=0x7fba47b039f0, client=<optimized out>) at ../src/remote/remote_daemon_dispatch.c:5722 #13 remoteDispatchDomainMigratePerform3ParamsHelper (server=<optimized out>, client=<optimized out>, msg=0x5615852ac9c0, rerr=0x7fba47b039f0, args=0x7fba38012030, ret=0x7fba3805acd0) at src/remote/remote_daemon_dispatch_stubs.h:8734 #14 0x00007fba6705bee7 in virNetServerProgramDispatchCall (msg=0x5615852ac9c0, client=0x5615852401e0, server=0x5615851f6080, prog=0x561585257810) at ../src/rpc/virnetserverprogram.c:428 #15 virNetServerProgramDispatch (prog=0x561585257810, server=server@entry=0x5615851f6080, client=0x5615852401e0, msg=0x5615852ac9c0) at ../src/rpc/virnetserverprogram.c:302 #16 0x00007fba67061276 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x5615851f6080) at ../src/rpc/virnetserver.c:137 #17 virNetServerHandleJob (jobOpaque=0x56158520da20, opaque=0x5615851f6080) at ../src/rpc/virnetserver.c:154 #18 0x00007fba66f99d0f in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:163 #19 0x00007fba66f9937b in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:233 #20 0x00007fba6329214a in start_thread () from /lib64/libpthread.so.0 #21 0x00007fba65a40db3 in clone () from /lib64/libc.so.6