Bug 1589730
| Summary: | libvirtd crashed sometimes if the guest crash on the source host at the end of migration | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | yafu <yafu> | ||||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Fangge Jin <fjin> | ||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.6 | CC: | dyuan, fjin, jdenemar, lmen, xuzhang | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | libvirt-4.5.0-1.el7 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-10-30 09:56:58 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1449966 [details]
libvirtd log on source and target host
Steps to reproduce this bug without relying on QEMU to crash: 1. start a new domain 2. attach gdb to libvirtd on the destination host and set breakpoint to qemuMigrationDstFinish 3. migrate $DOM --p2p ... 4. wait until gdb hits the breakpoint 5. kill QEMU process on the source host 6. run "continue" command in gdb Caused by commit v3.4.0-36-g49d30bc2e2, specifically by + if (flags & QEMU_MIGRATION_COOKIE_STATS && mig->jobInfo) + mig->jobInfo->operation = priv->job.current->operation; in qemuMigrationEatCookie because once libvirtd sees EOF from QEMU monitor while it is executing a remote API, it will clear the migration job. Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-June/msg01815.html Fixed upstream by
commit 5f998219111327dc4f66586ce8953a6d7ecfb619
Refs: v4.5.0-rc2-2-g5f99821911
Author: Jiri Denemark <jdenemar>
AuthorDate: Thu Jun 28 11:38:52 2018 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon Jul 2 11:53:21 2018 +0200
qemu_migration: Check for active domain after talking to remote daemon
Once we called qemuDomainObjEnterRemote to talk to the destination
daemon during a peer to peer migration, the vm lock is released and we
only hold an async job. If the source domain dies at this point the
monitor EOF callback is allowed to do its job and (among other things)
clear all private data irrelevant for stopped domain. Thus when we call
qemuDomainObjExitRemote, the domain may already be gone and we should
avoid touching runtime private data (such as current job info).
In other words after acquiring the lock in qemuDomainObjExitRemote, we
need to check the domain is still alive. Unless we're doing offline
migration.
https://bugzilla.redhat.com/show_bug.cgi?id=1589730
Signed-off-by: Jiri Denemark <jdenemar>
Reproduced with libvirt-4.3.0-1.el7.x86_64 Verify with libvirt-4.5.0-4.el7.x86_64, refer to the steps in comment 3, migration succeeds, source libvirtd doesn't crash. Also tested offline migration with p2p, migration succeeds: # virsh -k0 migrate rhel7-min qemu+ssh://dell-per730-37.lab.eng.pek2.redhat.com/system --offline --persistent --p2p But I found sometimes guest image ownership is changed to root:root after several rounds of migration. I'm not sure whether it's caused by the patch. Source to target(kill source qemu during migration as comment 3): [root@fjin-5-190 ~]# virsh -k0 migrate rhel7-min qemu+ssh://dell-per730-37.lab.eng.pek2.redhat.com/system --verbose --live --p2p Target to source: # virsh migrate rhel7-min qemu+ssh://10.66.5.190/system --live --verbose --p2p --migrateuri tcp://10.66.5.190 Migration: [100 %] Source to target(kill source qemu during migration as comment 3): # virsh -k0 migrate rhel7-min qemu+ssh://dell-per730-37.lab.eng.pek2.redhat.com/system --verbose --live --p2p Migration: [100 %] # ll /nfs/RHEL-7.5-x86_64-latest.qcow2 -rw-r--r--. 1 root root 1403650048 Jul 25 23:49 /nfs/RHEL-7.5-x86_64-latest.qcow2 Created attachment 1470583 [details] libvirtd log - for comment 9 Please, file a separate bug for this and explain what steps you made. It's not very clear from comment 9. I filed new Bug 1608931 for issue in comment 9. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |
Description of problem: libvirtd crashed sometimes if the guest crash on the source host at the end of migration Version-Release number of selected component (if applicable): libvirt-4.3.0-1.el7.x86_64 How reproducible: occasional Steps to Reproduce: 1.Do migration(I use a issue guest and the guest will be crashed sometimes during migration): #virsh migrate iommu1 qemu+ssh://10.66.5.76/system --live --verbose --p2p --tunnelled Migration: [100 %]error: Disconnected from qemu:///system due to end of file error: End of file while reading data: Input/output error 2.Check the qemu log on the source host: #cat /var/log/libvirt/qemu/iommu1.log 2018-06-11T03:42:28.152233Z qemu-kvm: warning: TSC frequency mismatch between VM (2593995 kHz) and host (3311999 kHz), and TSC scaling unavailable 2018-06-11T03:42:28.152322Z qemu-kvm: warning: TSC frequency mismatch between VM (2593995 kHz) and host (3311999 kHz), and TSC scaling unavailable 2018-06-11T03:42:28.152382Z qemu-kvm: warning: TSC frequency mismatch between VM (2593995 kHz) and host (3311999 kHz), and TSC scaling unavailable 2018-06-11T03:42:28.152454Z qemu-kvm: warning: TSC frequency mismatch between VM (2593995 kHz) and host (3311999 kHz), and TSC scaling unavailable main_channel_handle_message: agent start 2018-06-11 03:42:47.825+0000: shutting down, reason=crashed 3.The backtrace of libvirtd: Core was generated by `/usr/sbin/libvirtd'. Program terminated with signal 11, Segmentation fault. #0 0x00007fa573ca819b in qemuMigrationEatCookie (driver=driver@entry=0x7fa5441115f0, dom=dom@entry=0x7fa5441b9f20, cookiein=cookiein@entry=0x7fa594015af0 "<qemu-migration>\n <name>iommu1</name>\n <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n <hostname>fjin-4-101</hostname>\n <hostuuid>600aa6af-c0f5-11e6-9c43-bc0000340000</hostuuid>\n <statistics>\n"..., cookieinlen=cookieinlen@entry=1119, flags=flags@entry=32) at qemu/qemu_migration_cookie.c:1534 1534 mig->jobInfo->operation = priv->job.current->operation; (gdb) bt #0 0x00007fa573ca819b in qemuMigrationEatCookie (driver=driver@entry=0x7fa5441115f0, dom=dom@entry=0x7fa5441b9f20, cookiein=cookiein@entry=0x7fa594015af0 "<qemu-migration>\n <name>iommu1</name>\n <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n <hostname>fjin-4-101</hostname>\n <hostuuid>600aa6af-c0f5-11e6-9c43-bc0000340000</hostuuid>\n <statistics>\n"..., cookieinlen=cookieinlen@entry=1119, flags=flags@entry=32) at qemu/qemu_migration_cookie.c:1534 #1 0x00007fa573c98a71 in qemuMigrationSrcConfirmPhase (driver=driver@entry=0x7fa5441115f0, vm=vm@entry=0x7fa5441b9f20, cookiein=0x7fa594015af0 "<qemu-migration>\n <name>iommu1</name>\n <uuid>1b3268d6-b59c-406b-a14c-33b000b15b6c</uuid>\n <hostname>fjin-4-101</hostname>\n <hostuuid>600aa6af-c0f5-11e6-9c43-bc0000340000</hostuuid>\n <statistics>\n"..., cookieinlen=1119, flags=7, retcode=retcode@entry=0) at qemu/qemu_migration.c:2805 #2 0x00007fa573ca4527 in qemuMigrationSrcPerformPeer2Peer3 (flags=7, useParams=<optimized out>, bandwidth=0, migParams=0x7fa5940008c0, nbdPort=0, migrate_disks=<optimized out>, nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=<optimized out>, uri=<optimized out>, dname=0x0, persist_xml=0x0, xmlin=<optimized out>, vm=0x7fa5441b9f20, dconnuri=0x7fa59401e010 "qemu+ssh://10.66.4.101/system", dconn=0x7fa59402d520, sconn=0x7fa5900347a0, driver=0x7fa5441115f0) at qemu/qemu_migration.c:4207 #3 qemuMigrationSrcPerformPeer2Peer (v3proto=<synthetic pointer>, resource=0, dname=<optimized out>, flags=7, migParams=0x7fa5940008c0, nbdPort=0, migrate_disks=<optimized out>, nmigrate_disks=0, listenAddress=<optimized out>, graphicsuri=<optimized out>, uri=0x0, dconnuri=0x7fa59401e010 "qemu+ssh://10.66.4.101/system", persist_xml=0x0, xmlin=<optimized out>, vm=0x7fa5441b9f20, sconn=0x7fa5900347a0, driver=0x7fa5441115f0) at qemu/qemu_migration.c:4398 #4 qemuMigrationSrcPerformJob (driver=driver@entry=0x7fa5441115f0, conn=conn@entry=0x7fa5900347a0, vm=vm@entry=0x7fa5441b9f20, xmlin=xmlin@entry=0x0, persist_xml=persist_xml@entry=0x0, dconnuri=dconnuri@entry=0x7fa59401e010 "qemu+ssh://10.66.4.101/system", uri=uri@entry=0x0, graphicsuri=graphicsuri@entry=0x0, listenAddress=listenAddress@entry=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, nbdPort=nbdPort@entry=0, migParams=migParams@entry=0x7fa5940008c0, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7fa5a22c4a88, cookieoutlen=cookieoutlen@entry=0x7fa5a22c4a7c, flags=flags@entry=7, dname=dname@entry=0x0, resource=resource@entry=0, v3proto=<optimized out>, v3proto@entry=true) at qemu/qemu_migration.c:4478 #5 0x00007fa573ca5394 in qemuMigrationSrcPerform (driver=driver@entry=0x7fa5441115f0, conn=0x7fa5900347a0, vm=0x7fa5441b9f20, xmlin=0x0, persist_xml=0x0, dconnuri=dconnuri@entry=0x7fa59401e010 "qemu+ssh://10.66.4.101/system", uri=0x0, graphicsuri=0x0, listenAddress=0x0, nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=0x0, nbdPort=0, migParams=migParams@entry=0x7fa5940008c0, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7fa5a22c4a88, cookieoutlen=cookieoutlen@entry=0x7fa5a22c4a7c, flags=flags@entry=7, dname=0x0, resource=0, v3proto=v3proto@entry=true) at qemu/qemu_migration.c:4661 #6 0x00007fa573cea065 in qemuDomainMigratePerform3Params (dom=0x7fa59401dfd0, dconnuri=0x7fa59401e010 "qemu+ssh://10.66.4.101/system", params=<optimized out>, nparams=<optimized out>, cookiein=0x0, cookieinlen=0, cookieout=0x7fa5a22c4a88, cookieoutlen=0x7fa5a22c4a7c, flags=7) at qemu/qemu_driver.c:12781 #7 0x00007fa5b242e26d in virDomainMigratePerform3Params (domain=domain@entry=0x7fa59401dfd0, dconnuri=0x7fa59401e010 "qemu+ssh://10.66.4.101/system", params=0x7fa594001e20, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=cookieout@entry=0x7fa5a22c4a88, cookieoutlen=cookieoutlen@entry=0x7fa5a22c4a7c, flags=7) at libvirt-domain.c:4976 #8 0x000055a77cbf6276 in remoteDispatchDomainMigratePerform3Params (server=0x55a77de12f20, msg=0x55a77de703b0, ret=0x7fa594001d50, args=0x7fa594001d70, rerr=0x7fa5a22c4bc0, client=<optimized out>) at remote/remote_daemon_dispatch.c:5436 #9 remoteDispatchDomainMigratePerform3ParamsHelper (server=0x55a77de12f20, client=<optimized out>, msg=0x55a77de703b0, rerr=0x7fa5a22c4bc0, args=0x7fa594001d70, ret=0x7fa594001d50) at remote/remote_daemon_dispatch_stubs.h:7944 #10 0x00007fa5b23193b5 in virNetServerProgramDispatchCall (msg=0x55a77de703b0, client=0x55a77de71ab0, server=0x55a77de12f20, prog=0x55a77de68c70) at rpc/virnetserverprogram.c:437 #11 virNetServerProgramDispatch (prog=0x55a77de68c70, server=server@entry=0x55a77de12f20, client=client@entry=0x55a77de71ab0, msg=msg@entry=0x55a77de703b0) at rpc/virnetserverprogram.c:304 #12 0x00007fa5b232217a in virNetServerProcessMsg (srv=srv@entry=0x55a77de12f20, client=0x55a77de71ab0, prog=<optimized out>, msg=0x55a77de703b0) at rpc/virnetserver.c:145 #13 0x00007fa5b23225c8 in virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a77de12f20) at rpc/virnetserver.c:166 #14 0x00007fa5b22069d1 in virThreadPoolWorker (opaque=opaque@entry=0x55a77de127a0) at util/virthreadpool.c:167 #15 0x00007fa5b22057a0 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #16 0x00007fa5af523dd5 in start_thread (arg=0x7fa5a22c5700) at pthread_create.c:308 #17 0x00007fa5af24db3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Actual results: Expected results: Additional info: