Bug 1277781
| Summary: | Libvirtd segment fault when create and destroy a fc_host pool with a short pause | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Han Han <hhan> |
| Component: | libvirt | Assignee: | John Ferlan <jferlan> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.2 | CC: | dyuan, jferlan, rbalakri, xuzhang, yanyang, yisun |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-1.3.1-1.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-03 18:29:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
While you're correct it's not a normal scenario, it does point out a flaw that could happen at other times. As you note if you wait 3 seconds the bug doesn't happen and if you wait 5 seconds there'd be even lesser chance (if at all). Long story short, FC/NPIV/SCSI is dependent upon udev to create the infrastructure necessary. That occurs asynchronously, so rather than "wait" for that to finish we create a thread to handle that which runs once a second over the next 5 seconds until done. See the following: https://bugzilla.redhat.com/show_bug.cgi?id=1152382 and upstream commit message: http://www.redhat.com/archives/libvir-list/2014-November/msg00695.html I do have a couple patches ready to post - I just wanted to test that they worked prior to sending them. I didn't want to interrupt anything you were doing though. Patches posted upstream: http://www.redhat.com/archives/libvir-list/2015-November/msg00139.html Patches pushed. $ git describe d3fa510a759b180a2a87b11d9ed57e437d1914e1 v1.2.21-62-gd3fa510 $ Verified on libvirt-1.3.2-1.el7.x86_64 and PASSED
scenario 1, using pool name.
# cat pool.xml
<pool type="scsi">
<name>p1</name>
<source>
<adapter type='fc_host' wwnn='2101001b32a9da4e' wwpn='2101001b32a90001' managed='yes'/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>
============ do not sleep ===============
# date +%s
1458101818
# for i in {1..100}; do virsh pool-create pool.xml; virsh pool-list | grep p1; virsh pool-destroy p1; virsh pool-list --all | grep p1; done
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
...
# abrt-cli list --since 1458101818
//nothing output
============= sleep in a short time (0.1 sec) =================
# date +%s
1458101534
# for i in {1..100}; do virsh pool-create pool.xml; virsh pool-list | grep p1; sleep 0.1; virsh pool-destroy p1; virsh pool-list --all | grep p1; done
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
...
# abrt-cli list --since 1458101534
//nothing output
===================================
sleep in a longer time (3 sec)
# date +%s
1458101908
# for i in {1..100}; do virsh pool-create pool.xml; virsh pool-list | grep p1; sleep 3; virsh pool-destroy p1; virsh pool-list --all | grep p1; done
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
Pool p1 created from pool.xml
p1 active no
Pool p1 destroyed
...
# abrt-cli list --since 1458101908
// nothing output
=======================
scenario 2, using pool uuid (just test no-sleep case is enough)
# cat pool_uuid.xml
<pool type='scsi'>
<name>p1</name>
<uuid>823de2fd-2e24-4eea-a1ca-888888888888</uuid>
<source>
<adapter type='fc_host' managed='yes' wwnn='2101001b32a9da4e' wwpn='2101001b32a90001'/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>
# date +%s
1458102689
# for i in {1..100}; do virsh pool-create pool_uuid.xml; virsh pool-list | grep p1; virsh pool-destroy 823de2fd-2e24-4eea-a1ca-888888888888; virsh pool-list --all | grep p1; done
Pool p1 created from pool_uuid.xml
p1 active no
Pool 823de2fd-2e24-4eea-a1ca-888888888888 destroyed
Pool p1 created from pool_uuid.xml
p1 active no
Pool 823de2fd-2e24-4eea-a1ca-888888888888 destroyed
Pool p1 created from pool_uuid.xml
p1 active no
Pool 823de2fd-2e24-4eea-a1ca-888888888888 destroyed
Pool p1 created from pool_uuid.xml
p1 active no
Pool 823de2fd-2e24-4eea-a1ca-888888888888 destroyed
# abrt-cli list --since 1458102689
//nothing output
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |
Description of problem: As subject Version-Release number of selected component (if applicable): libvirt-1.2.17-13.el7.x86_64 qemu-kvm-rhev-2.3.0-31.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a fc_host pool xml: <pool type="scsi"> <name>p1</name> <source> <adapter type='fc_host' wwnn='2101001b32a9da4e' wwpn='2101001b32a90001' managed='yes'/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool> 2. Run following cmd: # for i in 1 2 3; do virsh pool-create test.xml; sleep 1; virsh pool-destroy p1; sleep 1; done Pool p1 created from test.xml Pool p1 destroyed Pool p1 created from test.xml error: failed to connect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer error: Failed to create pool from test.xml error: operation failed: pool 'p1' already exists with uuid ec55efe5-2f6e-49c5-98c2-330d4b0b4e8f Pool p1 destroyed 3. Show abrt reports: # abrt-cli list --since 1446450941 id 38617217512ee894c089bd641f937e10711b0815 reason: libvirtd killed by SIGSEGV time: Mon 02 Nov 2015 04:12:42 PM CST cmdline: /usr/sbin/libvirtd package: libvirt-daemon-1.2.17-13.el7 uid: 0 (root) Directory: /var/spool/abrt/ccpp-2015-11-02-16:12:42-28812 Run 'abrt-cli report /var/spool/abrt/ccpp-2015-11-02-16:12:42-28812' for creating a case in Red Hat Customer Portal id f1894e50fc0e46d0bd8fe3109b4769e5236b9a2c reason: libvirtd killed by SIGSEGV time: Mon 02 Nov 2015 03:39:26 PM CST cmdline: /usr/sbin/libvirtd package: libvirt-daemon-1.2.17-13.el7 uid: 0 (root) count: 6 Directory: /var/spool/abrt/ccpp-2015-11-02-15:39:26-1494 Run 'abrt-cli report /var/spool/abrt/ccpp-2015-11-02-15:39:26-1494' for creating a case in Red Hat Customer Portal id 9155fb4aee1b0001caf3e3c6c97837fc68eb6a80 reason: libvirtd killed by SIGSEGV time: Mon 02 Nov 2015 03:40:57 PM CST cmdline: /usr/sbin/libvirtd package: libvirt-daemon-1.2.17-13.el7 uid: 0 (root) count: 3 Directory: /var/spool/abrt/ccpp-2015-11-02-15:40:57-3654 Run 'abrt-cli report /var/spool/abrt/ccpp-2015-11-02-15:40:57-3654' for creating a case in Red Hat Customer Portal Actual results: As above. Expected results: No segment fault. Additional info: Without sleep or sleep more than 2s, bug not reproduced. The gdb backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f81202af700 (LWP 18853)] 0x00007f812c9d5aad in malloc_consolidate () from /lib64/libc.so.6 (gdb) bt #0 0x00007f812c9d5aad in malloc_consolidate () from /lib64/libc.so.6 #1 0x00007f812c9d7e35 in _int_malloc () from /lib64/libc.so.6 #2 0x00007f812c9d987c in malloc () from /lib64/libc.so.6 #3 0x00007f812ca13381 in __alloc_dir () from /lib64/libc.so.6 #4 0x00007f812f6c0cda in virGetFCHostNameByWWN (sysfs_prefix=sysfs_prefix@entry=0x0, wwnn=wwnn@entry=0x7f8110006f30 "2101001b32a9da4e", wwpn=wwpn@entry=0x7f8110001260 "2101001b32a90005") at util/virutil.c:2113 #5 0x00007f81160fd353 in deleteVport (conn=<optimized out>, adapter=...) at storage/storage_backend_scsi.c:841 #6 virStorageBackendSCSIStopPool (conn=0x7f81080039d0, pool=<optimized out>) at storage/storage_backend_scsi.c:959 #7 0x00007f81160ed487 in storagePoolDestroy (obj=0x7f81180d6270) at storage/storage_driver.c:976 #8 0x00007f812f779738 in virStoragePoolDestroy (pool=pool@entry=0x7f81180d6270) at libvirt-storage.c:736 #9 0x00007f81303a29cc in remoteDispatchStoragePoolDestroy (server=0x7f8132121f30, msg=0x7f81321b6f70, args=<optimized out>, rerr=0x7f81202aec30, client=0x7f81321b7f00) at remote_dispatch.h:14607 #10 remoteDispatchStoragePoolDestroyHelper (server=0x7f8132121f30, client=0x7f81321b7f00, msg=0x7f81321b6f70, rerr=0x7f81202aec30, args=<optimized out>, ret=0x7f81180d5e10) at remote_dispatch.h:14583 #11 0x00007f812f7c2342 in virNetServerProgramDispatchCall (msg=0x7f81321b6f70, client=0x7f81321b7f00, server=0x7f8132121f30, prog=0x7f813213e180) at rpc/virnetserverprogram.c:437 #12 virNetServerProgramDispatch (prog=0x7f813213e180, server=server@entry=0x7f8132121f30, client=0x7f81321b7f00, msg=0x7f81321b6f70) at rpc/virnetserverprogram.c:307 #13 0x00007f812f7bd5bd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f8132121f30) at rpc/virnetserver.c:135 #14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f8132121f30) at rpc/virnetserver.c:156 #15 0x00007f812f6b84c5 in virThreadPoolWorker (opaque=opaque@entry=0x7f8132121a70) at util/virthreadpool.c:145 #16 0x00007f812f6b79e8 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #17 0x00007f812cd22dc5 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f812ca501cd in clone () from /lib64/libc.so.6