Bug 1171124
Summary: | libvirtd occasionally crashes at the end of migration | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Kurik <jkurik> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | CC: | dyuan, jdenemar, mzhan, pm-eus, pzhang, rbalakri, tdosek, tjamrisk, wzhang, xuzhang, zpeng |
Target Milestone: | rc | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-1.1.1-29.el7_0.4 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Libvirt did not properly check whether a DAC security label is non-NULL before trying to parse user/group ownership from it.
Consequence: When virDomainGetBlockInfo API is called on a transient domain that has just finished migration to another host, its DAC security label may already be NULL, which crashes libvirtd. Since RHEV uses transient domains and periodically calls virDomainGetBlockInfo, it's just a matter of timing if the API is called at the right time to crash libvirtd.
Fix: Properly check DAC label before trying to parse it.
Result: Libvirtd no longer crashes in the described scenario.
|
Story Points: | --- |
Clone Of: | 1162208 | Environment: | |
Last Closed: | 2015-01-05 20:30:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1162208 | ||
Bug Blocks: |
Description
Jan Kurik
2014-12-05 12:33:40 UTC
Reproduce : version: libvirt-1.1.1-29.el7_0.3.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11 kernel-3.10.0-123.17.1.el7.x86_64 according to comment 4 Doc Text: Cause: Libvirt did not properly check whether a DAC security label is non-NULL before trying to parse user/group ownership from it. Consequence: When virDomainGetBlockInfo API is called on a transient domain that has just finished migration to another host, its DAC security label may already be NULL, which crashes libvirtd. and the title of patch : Fix crash when saving a domain with type none dac label. reproduce issue as following steps : 1>try to migrate 1.create a transient domain with seclabel type is none and model is dac # virsh dumpxml r7 | grep seclabel <seclabel model='selinux' labelskip='yes'/> <seclabel type='none' model='dac'/> 2. do migration migrate to dest host: # virsh migrate r7 qemu+ssh://$ip/system root@ip's password: # virsh list --all Id Name State ---------------------------------------------------- migrate back to source host: # virsh list --all Id Name State ---------------------------------------------------- 9 r7 running check domblkinfo ,libvirt will crash : # virsh domblklist r7 Target Source ------------------------------------------------ vda /tmp/zp/r7.img # virsh domblkinfo r7 vda error: End of file while reading data: Input/output error error: One or more references were leaked after disconnect from the hypervisor error: Failed to reconnect to the hypervisor 2> try to save / restore # virsh list Id Name State ---------------------------------------------------- 17 r7 running # virsh save r7 r7.save error: Failed to save domain r7 to r7.save error: End of file while reading data: Input/output error error: One or more references were leaked after disconnect from the hypervisor error: Failed to reconnect to the hypervisor # service libvirtd status Redirecting to /bin/systemctl status libvirtd.service libvirtd.service - Virtualization daemon Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled) Active: failed (Result: signal) since Thu 2014-12-11 18:10:06 CST; 15s ago Process: 11911 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=killed, signal=SEGV) Main PID: 11911 (code=killed, signal=SEGV) Verify: verify version : libvirt-1.1.1-29.el7_0.4.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11 kernel-3.10.0-123.17.1.el7.x86_64 steps: 1>try to migrate 1.create a transient domain with seclabel type is none and model is dac # virsh dumpxml r7 | grep seclabel <seclabel model='selinux' labelskip='yes'/> <seclabel type='none' model='dac'/> 2. do migration migrate to dst host: # virsh migrate r7 qemu+ssh://$ip/system root@$ip's password: # virsh list --all Id Name State ---------------------------------------------------- migrate back to source host: # virsh list --all Id Name State ---------------------------------------------------- 11 r7 running check domblkinfo ,libvirt works well , get domain block info successfully : # virsh domblklist r7 Target Source ------------------------------------------------ vda /tmp/zp/r7.img # virsh domblkinfo r7 vda Capacity: 8589934592 Allocation: 8589938688 Physical: 8589938688 2> try to save / restore , domain can save / restore successfully # virsh list Id Name State ---------------------------------------------------- 16 r7 running # virsh save r7 r7.save Domain r7 saved to r7.save # virsh restore r7.save Domain restored from r7.save # virsh list Id Name State ---------------------------------------------------- 17 r7 running test in another scenario : Reproduce version: libvirt-1.1.1-29.el7_0.3.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11 kernel-3.10.0-123.17.1.el7.x86_64 steps to reproduce : 1.get libvirt-1.1.1-29.el7_0.3.src.rpm , rebuild the libvirt and add a patch (the patch from bug #1162208 ) 2.install new libvirt rpm packets produced in step 1 . #service libvirtd restart . 3.prepare a domain XML and create a transient domain . # virsh list --all Id Name State ---------------------------------------------------- 4 r7 running # virsh dumpxml r7 | grep seclabel <seclabel model='selinux' relabel='yes'/> <seclabel type='none' model='dac'/> 4.in one terminal do --p2p migrate domain from source to host # virsh migrate r7 --live --p2p qemu+ssh://$ip/system 5.in other terminal check libvirt debug log to find sleep flag # tailf libvirt.log | grep SLEEPING 2014-12-17 07:29:56.488+0000: 30884: debug : doPeer2PeerMigrate:4070 : SLEEPING then do domblkinfo : # virsh domblkinfo r7 vda error: End of file while reading data: Input/output error error: One or more references were leaked after disconnect from the hypervisor error: Failed to reconnect to the hypervisor 6.using gdb to get crash backtrace and check two threads info involved in this crash (compared with bug #1162208 comment 5) : ...... (gdb) bt #0 0x00007f189311f158 in __strchr_sse42 () from /lib64/libc.so.6 #1 0x00007f1895f194d0 in virParseOwnershipIds (label=0x0, uidPtr=uidPtr@entry=0x7f1886b8a818, gidPtr=gidPtr@entry=0x7f1886b8a81c) at util/virutil.c:2072 #2 0x00007f187f39129e in qemuOpenFile (driver=driver@entry=0x7f18781567e0, vm=vm@entry=0x7f186c000e90, path=path@entry=0x7f186c00c7e0 "/tmp/zp/r7raw.img", oflags=oflags@entry=0, needUnlink=needUnlink@entry=0x0, bypassSecurityDriver=bypassSecurityDriver@entry=0x0) at qemu/qemu_driver.c:2780 #3 0x00007f187f39a29e in qemuDomainGetBlockInfo (dom=0x7f18781fbba0, path=0x7f186c00c7e0 "/tmp/zp/r7raw.img", info=0x7f1886b8ab40, flags=<optimized out>) at qemu/qemu_driver.c:10124 #4 0x00007f1895f99734 in virDomainGetBlockInfo (domain=domain@entry=0x7f18781fbba0, disk=0x7f18781fe5e0 "vda", info=info@entry=0x7f1886b8ab40, flags=0) at libvirt.c:9110 #5 0x00007f1896992b04 in remoteDispatchDomainGetBlockInfo (server=<optimized out>, msg=<optimized out>, ret=0x7f18781fbf10, args=0x7f18781fbf30, rerr=0x7f1886b8ac80, client=<optimized out>) at remote_dispatch.h:3487 #6 remoteDispatchDomainGetBlockInfoHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f1886b8ac80, args=0x7f18781fbf30, ret=0x7f18781fbf10) at remote_dispatch.h:3463 #7 0x00007f1895ff21ba in virNetServerProgramDispatchCall (msg=0x7f1898781700, client=0x7f1898787500, server=0x7f18987723d0, prog=0x7f189877e150) at rpc/virnetserverprogram.c:435 #8 virNetServerProgramDispatch (prog=0x7f189877e150, server=server@entry=0x7f18987723d0, client=0x7f1898787500, msg=0x7f1898781700) at rpc/virnetserverprogram.c:305 #9 0x00007f1895fecd28 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f18987723d0) at rpc/virnetserver.c:166 #10 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f18987723d0) at rpc/virnetserver.c:187 #11 0x00007f1895f115e5 in virThreadPoolWorker (opaque=opaque@entry=0x7f18987568e0) at util/virthreadpool.c:144 #12 0x00007f1895f10f7e in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:194 #13 0x00007f18937bcdf3 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f18930e33dd in clone () from /lib64/libc.so.6 (gdb) info thread Id Target Id Frame 11 Thread 0x7f188738c700 (LWP 30882) "libvirtd" 0x00007f18937c0705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 10 Thread 0x7f1886b8b700 (LWP 30883) "libvirtd" 0x00007f189311f158 in __strchr_sse42 () from /lib64/libc.so.6 9 Thread 0x7f188638a700 (LWP 30884) "libvirtd" 0x00007f18930aa8ad in nanosleep () from /lib64/libc.so.6 (gdb) thread 9 [Switching to thread 9 (Thread 0x7f188638a700 (LWP 30884))] #0 0x00007f18930aa8ad in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007f18930aa8ad in nanosleep () from /lib64/libc.so.6 #1 0x00007f18930aa744 in sleep () from /lib64/libc.so.6 #2 0x00007f187f3710f1 in doPeer2PeerMigrate (v3proto=<synthetic pointer>, resource=0, dname=<optimized out>, flags=<optimized out>, listenAddress=<optimized out>, graphicsuri=<optimized out>, uri=<optimized out>, dconnuri=<optimized out>, xmlin=<optimized out>, vm=0x7f186c000e90, sconn=0x7f18781fed00, driver=0x7f18781567e0) at qemu/qemu_migration.c:4071 #3 qemuMigrationPerformJob (driver=driver@entry=0x7f18781567e0, conn=0x7f18781fed00, conn@entry=0x0, vm=vm@entry=0x7f186c000e90, xmlin=xmlin@entry=0x0, dconnuri=<optimized out>, uri=<optimized out>, graphicsuri=0x0, listenAddress=0x0, cookiein=0x0, cookieinlen=0, cookieout=0x7f1886389b58, cookieoutlen=0x7f1886389b54, flags=3, dname=0x0, resource=0, v3proto=true) at qemu/qemu_migration.c:4129 #4 0x00007f187f3723d9 in qemuMigrationPerform (driver=driver@entry=0x7f18781567e0, conn=0x0, vm=vm@entry=0x7f186c000e90, xmlin=0x0, dconnuri=dconnuri@entry=0x7f1858000d80 "qemu+ssh://$ip/system", uri=0x7f18580008c0 "\220\037", graphicsuri=0x0, listenAddress=0x0, cookiein=cookiein@entry=0x0, cookieinlen=cookieinlen@entry=0, cookieout=cookieout@entry=0x7f1886389b58, cookieoutlen=cookieoutlen@entry=0x7f1886389b54, flags=flags@entry=3, dname=0x0, resource=0, v3proto=v3proto@entry=true) at qemu/qemu_migration.c:4313 #5 0x00007f187f393b0d in qemuDomainMigratePerform3Params (dom=0x7f1858000d40, dconnuri=0x7f1858000d80 "qemu+ssh://$ip/system", params=<optimized out>, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=0x7f1886389b58, cookieoutlen=0x7f1886389b54, flags=3) at qemu/qemu_driver.c:10910 #6 0x00007f1895f94bdf in virDomainMigratePerform3Params (domain=domain@entry=0x7f1858000d40, dconnuri=0x7f1858000d80 "qemu+ssh://$ip/system", params=params@entry=0x7f1858000ee0, nparams=0, cookiein=0x0, cookieinlen=0, cookieout=cookieout@entry=0x7f1886389b58, cookieoutlen=cookieoutlen@entry=0x7f1886389b54, flags=3) at libvirt.c:7401 #7 0x00007f18969866ef in remoteDispatchDomainMigratePerform3Params (server=<optimized out>, msg=<optimized out>, ret=0x7f1858000cc0, args=0x7f1858000ce0, rerr=0x7f1886389c80, client=<optimized out>) at remote.c:4978 #8 remoteDispatchDomainMigratePerform3ParamsHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f1886389c80, args=0x7f1858000ce0, ret=0x7f1858000cc0) at remote_dispatch.h:5631 #9 0x00007f1895ff21ba in virNetServerProgramDispatchCall (msg=0x7f18987734b0, client=0x7f1898781e30, server=0x7f18987723d0, prog=0x7f189877e150) at rpc/virnetserverprogram.c:435 #10 virNetServerProgramDispatch (prog=0x7f189877e150, server=server@entry=0x7f18987723d0, client=0x7f1898781e30, msg=0x7f18987734b0) at rpc/virnetserverprogram.c:305 #11 0x00007f1895fecd28 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f18987723d0) at rpc/virnetserver.c:166 #12 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f18987723d0) at rpc/virnetserver.c:187 #13 0x00007f1895f115e5 in virThreadPoolWorker (opaque=opaque@entry=0x7f1898756780) at util/virthreadpool.c:144 #14 0x00007f1895f10f7e in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:194 #15 0x00007f18937bcdf3 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f18930e33dd in clone () from /lib64/libc.so.6 Verify version: libvirt-1.1.1-29.el7_0.4.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.11 kernel-3.10.0-123.17.1.el7.x86_64 Verify steps: 1.get libvirt-1.1.1-29.el7_0.4.src.rpm . rebuild the libvirt and add a patch (the patch from bug #1162208 ) 2.install new libvirt rpm packets produced in step 1 . #service libvirtd restart . 3.prepare a domain XML and create a transient domain . # virsh list --all Id Name State ---------------------------------------------------- 4 r7 running # virsh dumpxml r7 | grep seclabel <seclabel model='selinux' relabel='yes'/> <seclabel type='none' model='dac'/> 4.in one terminal do --p2p migrate domain from source to host # virsh migrate r7 --live --p2p qemu+ssh://$ip/system 5.in other terminal check libvirt debug log to find sleep flag # tailf libvirt.log | grep SLEEPING 2014-12-17 07:29:56.488+0000: 30884: debug : doPeer2PeerMigrate:4070 : SLEEPING then do domblkinfo : # virsh domblkinfo r7 vda Capacity: 8589934592 Allocation: 8589934592 Physical: 8589934592 6.check in dest host , domain running well . Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0008.html |