Bug 1593137
Summary: | libvirtd crashed if destroy the guest on the source host in perform phase of live migration | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yafu <yafu> | ||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fangge Jin <fjin> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.6 | CC: | dyuan, fjin, lmen, xuzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-4.5.0-7.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1615854 (view as bug list) | Environment: | |||||
Last Closed: | 2018-10-30 09:56:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1615854 | ||||||
Attachments: |
|
Description
yafu
2018-06-20 07:42:14 UTC
Created attachment 1453142 [details]
libvirtd log on source and target host
I was able to reproduce this bug even without any flags. That is, any migration is affected, although sometimes libvirtd doesn't crash. It's caused by writing to a freed memory so depending what was later placed at the same address in the memory the daemon may either crash, abort with heap corruption, or just keep running happily. The bug can be seen in valgrind, however, you still need to be lucky enough to kill the domain while migration is in virCondWait called from qemuMigrationSrcWaitForCompletion. That said the reproducer is not 100% although sometimes it appears so. ==21535== Invalid write of size 4 ==21535== at 0x31DEDBF6: qemuMigrationSrcWaitForCompletion (qemu_migration.c:1587) ==21535== by 0x31DF3C99: qemuMigrationSrcRun (qemu_migration.c:3588) ... The following code in qemuMigrationSrcWaitForCompletion does the invalid write to jobInfo->status in case virDomainObjWait returned -1 because the domain is not running anymore and libvirt removed all run-time state including priv->job.current: if (virDomainObjWait(vm) < 0) { jobInfo->status = QEMU_DOMAIN_JOB_STATUS_FAILED; return -2; } Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2018-August/msg00106.html This is now fixed upstream by commit dddcb601ebf97ef222a03bb27b2357e831e8a0cc Refs: v4.6.0-93-gdddcb601eb Author: Jiri Denemark <jdenemar> AuthorDate: Thu Aug 2 16:56:02 2018 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Mon Aug 13 11:29:09 2018 +0200 qemu_migration: Avoid writing to freed memory When a domain is killed on the source host while it is being migrated and libvirtd is waiting for the migration to finish (waiting for the domain condition in qemuMigrationSrcWaitForCompletion), the run-time state including priv->job.current may already be freed once virDomainObjWait returns with -1. Thus the priv->job.current pointer cached in jobInfo is no longer valid and setting jobInfo->status may crash the daemon. https://bugzilla.redhat.com/show_bug.cgi?id=1593137 Signed-off-by: Jiri Denemark <jdenemar> Reviewed-by: Ján Tomko <jtomko> Reproduce this bug with libvirt-4.5.0-6.virtcov.el7.x86_64 Steps: 1. Start a guest, do migration: # virsh start rhel7-min; sleep 5; virsh migrate-setspeed rhel7-min 10;virsh migrate rhel7-min qemu+ssh://10.66.5.190/system --live --verbose --p2p 2.When the migration is in perform phase(see the progress percentage), destroy guest on source host: # virsh destroy rhel7-min error: Disconnected from qemu:///system due to end of file error: Failed to destroy domain rhel7-min error: End of file while reading data: Input/output error 3.Check migration status: Migration: [ 28 %]error: Disconnected from qemu:///system due to end of file error: End of file while reading data: Input/output error The reproduce rate is >=80% Verify with libvirt-4.5.0-8.virtcov.el7.x86_64 Repeat the steps in comment 8 for over 10 times, no crash Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |