Bug 1162208 - libvirtd occasionally crashes at the end of migration
Summary: libvirtd occasionally crashes at the end of migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1174869 (view as bug list)
Depends On:
Blocks: 1151953 1171124
TreeView+ depends on / blocked
 
Reported: 2014-11-10 14:16 UTC by Tomas Jamrisko
Modified: 2015-03-05 07:47 UTC (History)
11 users (show)

Fixed In Version: libvirt-1.2.8-1.el7
Doc Type: Bug Fix
Doc Text:
Cause: Libvirt did not properly check whether a DAC security label is non-NULL before trying to parse user/group ownership from it. Consequence: When virDomainGetBlockInfo API is called on a transient domain that has just finished migration to another host, its DAC security label may already be NULL, which crashes libvirtd. Since RHEV uses transient domains and periodically calls virDomainGetBlockInfo, it's just a matter of timing if the API is called at the right time to crash libvirtd. Fix: Properly check DAC label before trying to parse it. Result: Libvirtd no longer crashes in the described scenario.
Clone Of:
: 1171124 (view as bug list)
Environment:
Last Closed: 2015-03-05 07:47:20 UTC
Target Upstream Version:


Attachments (Terms of Use)
libvirtd log (90.60 KB, text/plain)
2014-11-10 14:18 UTC, Tomas Jamrisko
no flags Details
backtrace (14.36 KB, text/plain)
2014-11-10 15:11 UTC, Tomas Jamrisko
no flags Details
patch for reproducing the bug on libvirt from 7.0.z (423 bytes, patch)
2014-11-20 10:09 UTC, Jiri Denemark
no flags Details | Diff
patch for reproducing the bug on libvirt from 7.1 (468 bytes, patch)
2014-11-20 10:10 UTC, Jiri Denemark
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0323 0 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2015-03-05 12:10:54 UTC

Description Tomas Jamrisko 2014-11-10 14:16:27 UTC
Description of problem:
Migration of VMs in RHEV3.5 on RHEL7.0 hosts occasionally fails because of a libvirtd crash (segfault) happening near the end.

Version-Release number of selected component (if applicable):
libvirt-1.1.1-29.el7_0.3.x86_64

How reproducible:
about 1/10th of the time

Steps to Reproduce:
1. Connect to a VM running in RHEV3.5 environment on RHEL7.0 hosts using spice
   (tested just with 64 bit windows 7 as a guest, default configuration)
2. Migrate the VM
3. Repeat until error appears.

Actual results:
The VM shuts down, libvirtd segfaults and is restarted on source.

Comment 1 Tomas Jamrisko 2014-11-10 14:18:00 UTC
Created attachment 955828 [details]
libvirtd log

coredump too large to attach, you can find it here: http://download.eng.brq.redhat.com/scratch/tjamrisk/libvirt-coredump

Comment 3 Jiri Denemark 2014-11-10 15:06:29 UTC
Could you install the required debuginfo packages, and get the output of "thread apply all backtrace" gdb command from the coredump? That wouldn't be so big...

Comment 4 Tomas Jamrisko 2014-11-10 15:11:17 UTC
Created attachment 955852 [details]
backtrace

Comment 5 Jiri Denemark 2014-11-12 13:09:22 UTC
So this is what happens in the two threads involved in this crash:

Thread 10                               Thread 1
qemuMigrationPerform
  qemuMigrationPerformJob
    doPeer2PeerMigrate
      doPeer2PeerMigrate3
        qemuMigrationConfirmPhase
          qemuProcessStop
            qemuProcessKill
            /* Clear out dynamically assigned labels */
      qemuDomainObjEnterRemote
        virObjectRef(vm)
        virObjectUnlock(vm)
                                        qemuDomainGetBlockInfo
      virConnectClose                     qemuDomObjFromDomain
                                            virDomainObjListFindByUUID
                                              virObjectLock(vm)
                                          qemuOpenFile
                                            /* use cleared labels */
                                            SIGSEGV
      qemuDomainObjExitRemote
    qemuDomainRemoveInactive

The problem is in qemuProcessStop which clears out dynamically assigned labels but the seclabel structures full of NULL pointers remain in vm->def->seclabels. The second thread then tries to use one of the seclabels. The problem does not affect persistent domains because vm->def is completely removed and replaced with a persistent version of def pointed to by vm->newdef.

Comment 6 Jiri Denemark 2014-11-20 09:56:45 UTC
This is already fixed upstream by v1.2.5-112-g7eb0ee1 (the scenario described in the commit message is obviously not the only possible way to hit the crash):

commit 7eb0ee175b278a4439cee65a7a554767f0be9cd1
Author: Ján Tomko <jtomko>
Date:   Thu Jun 12 10:50:43 2014 +0200

    Fix crash when saving a domain with type none dac label
    
    qemuDomainGetImageIds did not check if there was a label
    in the seclabel, thus crashing on
    <seclabel type='none' model='dac'/>
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1108590

Comment 7 Jiri Denemark 2014-11-20 10:00:00 UTC
The patch is included in 7.1, feel free to request a backport for 7.0.z...

Comment 8 Jiri Denemark 2014-11-20 10:06:21 UTC
Steps to reliably reproduce this crash:

0. patch libvirt to make the race condition window bigger (see attached patches), build it, and start the modified libvirtd
1. create a transient domain
2. one "doPeer2PeerMigrate:??? : SLEEPING" debug log appears, run "virsh domblkinfo $DOMAIN $DISK"

libvirtd just segfaults without the fix mentioned in comment 6.

Comment 9 Jiri Denemark 2014-11-20 10:09:04 UTC
Created attachment 959275 [details]
patch for reproducing the bug on libvirt from 7.0.z

Comment 10 Jiri Denemark 2014-11-20 10:10:58 UTC
Created attachment 959276 [details]
patch for reproducing the bug on libvirt from 7.1

Comment 14 Luyao Huang 2014-12-16 08:30:55 UTC
I can reproduce this bug with libvirt-1.1.1-29.el7.x86_64 :

Steps :
1.rebuild the libvirt and add a patch(jiri offered in comment 9).

2.install the libvirt we build in step 1 and restart libvirtd in source host
# service libvirtd start
Redirecting to /bin/systemctl start  libvirtd.service

3.prepare a transient vm from source host(doesn't set <seclabel type='none' model='dac'/> in guest xml, because this will cause another crash):

# virsh create test6.xml
Domain test6 created from test6.xml

4. migrate the vm from source host(libvirtd have been rebuilt) to target host(p2p migrate):

# virsh migrate test6 qemu+ssh://lhuang/system --p2p

5. before migrate success(during sleep time) do domblkinfo on source host(open another terminal):

# virsh domblkinfo test6 hda
error: End of file while reading data: Input/output error
error: One or more references were leaked after disconnect from the hypervisor
error: Failed to reconnect to the hypervisor

6.check the coredump file via gdb in source host have the same cause with comment 5.



And cannot verify this bug with libvirt-1.2.8-10.el7.x86_64 :

1.rebuild the libvirt and add a patch(jiri offered in comment 9).

2.install the libvirt we build in step 1 and restart libvirtd in source host
# service libvirtd start
Redirecting to /bin/systemctl start  libvirtd.service

3.prepare a transient vm from source host(doesn't set <seclabel type='none' model='dac'/> in guest xml, because this will cause another crash):

# virsh create test6.xml
Domain test6 created from test6.xml

4. migrate the vm from source host(libvirtd have been rebuilt) to target host(p2p migrate):

# virsh migrate test6 qemu+ssh://lhuang/system --p2p

5. before migrate success(during sleep time) do domblkinfo on source host(open another terminal):

# virsh domblkinfo test6 hda
Capacity:       4294967296
Allocation:     4294967296
Physical:       4294967296

Comment 16 Luyao Huang 2014-12-16 08:57:10 UTC
Sorry i make a mistake in comment 14.

>And cannot verify this bug with libvirt-1.2.8-10.el7.x86_64 :

s/cannot verify this bug/cannot reproduce

Comment 21 Jiri Denemark 2014-12-17 08:11:01 UTC
*** Bug 1174869 has been marked as a duplicate of this bug. ***

Comment 25 dyuan 2015-01-14 02:18:27 UTC
Thanks Tomas. Move the bug to VERIFIED.
And we also re-verify it PASS with the latest libvirt version.

Comment 27 errata-xmlrpc 2015-03-05 07:47:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html


Note You need to log in before you can comment on or make changes to this bug.