Bug 1654696
Summary: | segmentation fault when "qemu-img convert" a local image to a rbd server | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | yisun | ||||||
Component: | qemu-kvm | Assignee: | Stefano Garzarella <sgarzare> | ||||||
qemu-kvm sub component: | General | QA Contact: | Tingting Mao <timao> | ||||||
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | urgent | CC: | aliang, areis, chayang, coli, ddepaula, dzheng, eblake, fweimer, hhan, juzhang, knoel, meili, mtessun, mwest, rbalakri, sgarzare, timao, virt-maint, yanqzhan, yisun, zhenyzha | ||||||
Version: | --- | Keywords: | Automation, Regression, TestBlocker, TestOnly | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-02-19 20:34:53 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1680963, 1690500 | ||||||||
Bug Blocks: | 1652753 | ||||||||
Attachments: |
|
Description
yisun
2018-11-29 12:55:06 UTC
Created attachment 1510055 [details]
The full backtrace
Created attachment 1510063 [details]
The full backtrace - v2
Reproduced this issue: Tested packages: qemu-kvm-2.12.0-42.module+el8+2173+537e5cb5 ceph-common-12.2.7-9.el8 Steps: # qemu-img create -f qcow2 source.qcow2 1G # qemu-img convert -f qcow2 -O raw source.qcow2 rbd:rbd/convert.img -p (100.00/100%) Segmentation fault (core dumped) Hi Ademar, I also hit the issue[1] in below scenario in fast train. (For his is a basic scenario for whole luks+rbd test, it will block all of the remain cases of the test run.) So I would like to know whether we need clone the bug to fast train? Or just move it to fast train? Thanks in advance. Version-Release number of selected component: qemu-kvm-3.1.0-1.module+el8+2538+1516be75 ceph-common-12.2.7-9.el8 [1]# qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 rbd:rbd/base.luks 20G Formatting 'rbd:rbd/base.luks', fmt=luks size=21474836480 key-secret=sec0 Segmentation fault (core dumped) [1]# cat gdb.txt Thread 2 (Thread 0x7efee0544700 (LWP 21707)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 No locals. #1 0x000055bb6baa532f in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at util/qemu-thread-posix.c:438 No locals. #2 qemu_event_wait (ev=ev@entry=0x55bb6bd2f248 <rcu_call_ready_event>) at util/qemu-thread-posix.c:442 value = <optimized out> __PRETTY_FUNCTION__ = "qemu_event_wait" #3 0x000055bb6babc982 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261 tries = 0 n = <optimized out> node = <optimized out> #4 0x000055bb6baa4b04 in qemu_thread_start (args=0x55bb6d700a70) at util/qemu-thread-posix.c:498 __clframe = {__cancel_routine = <optimized out>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <optimized out>} qemu_thread_args = 0x55bb6d700a70 start_routine = 0x55bb6babc900 <call_rcu_thread> arg = 0x0 r = <optimized out> #5 0x00007efee1f6c2de in start_thread (arg=<optimized out>) at pthread_create.c:486 ret = <optimized out> pd = <optimized out> now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139633150412544, 1310286024443784465, 140731147635390, 140731147635391, 140731147635536, 139633150409664, -1165637794549109487, -1165638593702826735}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #6 0x00007efee1c9ca03 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals. Thread 1 (Thread 0x7efee46eeb40 (LWP 21706)): #0 0x00007efee16bd9f3 in next_entry (iter=0x7ffe861049e0) at common/dict.c:70 bucket = 0x0 bucket = <optimized out> #1 p11_dict_next (iter=0x7ffe861049e0, key=0x7ffe861049d8, value=0x0) at common/dict.c:84 bucket = <optimized out> #2 0x00007efee163f404 in free_modules_when_no_refs_unlocked () at p11-kit/modules.c:789 mod = 0x6498c125274d7600 iter = {dict = 0x0, next = 0x0, index = 0} #3 0x00007efee16413b9 in p11_modules_release_inlock_reentrant (modules=modules@entry=0x55bb6d832140) at p11-kit/modules.c:1883 ret = <optimized out> rv = <optimized out> i = <optimized out> __PRETTY_FUNCTION__ = "p11_modules_release_inlock_reentrant" #4 0x00007efee16419cd in p11_kit_modules_release (modules=0x55bb6d832140) at p11-kit/modules.c:2278 __func__ = "p11_kit_modules_release" __PRETTY_FUNCTION__ = "p11_kit_modules_release" #5 0x00007efee162aa7b in p11_proxy_module_cleanup () at p11-kit/proxy.c:1733 state = 0x55bb6d8327b0 next = 0x0 #6 0x00007efee1627b2d in _p11_kit_fini () at ./common/init.h:61 No locals. #7 0x00007efee44f3106 in _dl_fini () at dl-fini.c:138 array = 0x7efee191ae90 i = <optimized out> l = 0x7efee46f8b30 maps = 0x7ffe86104aa0 i = <optimized out> l = <optimized out> nmaps = <optimized out> nloaded = <optimized out> ns = 0 do_audit = <optimized out> __PRETTY_FUNCTION__ = "_dl_fini" #8 0x00007efee1bda0cc in __run_exit_handlers (status=0, listp=0x7efee1f5e738 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108 atfct = <optimized out> onfct = <optimized out> cxafct = <optimized out> f = <optimized out> new_exitfn_called = 573 cur = 0x7efee1f5fda0 <initial> #9 0x00007efee1bda200 in __GI_exit (status=<optimized out>) at exit.c:139 No locals. #10 0x00007efee1bc381a in __libc_start_main (main=0x55bb6b9f0d00 <main>, argc=10, argv=0x7ffe86104e18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe86104e08) at ../csu/libc-start.c:342 result = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -5090820863364600559, 94263452833472, 140731147636240, 0, 0, -1309951474999923439, -1165639237262938863}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7ffe86104e70, 0x7efee470d150}, data = {prev = 0x0, cleanup = 0x0, canceltype = -2045751696}}} not_first_call = <optimized out> #11 0x000055bb6b9f16ee in _start () No symbol table info available. When create qcow2 image on rbd backend, also hit this issue: # qemu-img create -f qcow2 rbd:rbd/win10-32:mon_host=10.66.144.31 40G Formatting 'rbd:rbd/win10-32:mon_host=10.66.144.31', fmt=qcow2 size=42949672960 cluster_size=65536 lazy_refcounts=off refcount_bits=16 qemu-img: rbd:rbd/win10-32:mon_host=10.66.144.31: Could not write qcow2 header: Invalid argument Segmentation fault (core dumped) Coredump same as #c7(Thread 1) qemu-kvm version:qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64 Still hit this issue in latest qemu package: # qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 rbd:rbd/timao_base.luks 20G Formatting 'rbd:rbd/timao_base.luks', fmt=luks size=21474836480 key-secret=sec0 Segmentation fault (core dumped) # qemu-img --version qemu-img version 3.1.0 (qemu-kvm-3.1.0-7.module+el8+2715+f4b84bed) Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers I am able to reproduce it and I discovered that it crashes only when QEMU is compiled with --enable-gnutls. The QEMU rbd backend creates two connections to the server: one to create the file and then close it, and the second one to write to the file. With --disable-gnutls, the rados library loads the p11-kit library during the connection to the server, and then unload it during the shutdown of the connection. When QEMU is compiled with --enable-gnutls, the p11-kit dynamic library is also linked (gnutls pkg-config requires p11-kit pkg-config) to the executable. In this case, the finalize code of p11-kit library is called two times (because of two connection) when the application exit (not during the shutdown like before) and the second time it crashes. I'm investigating where is the issue: rados, p11-kit or NSS (it loads the p11-kit library) All of the changes needed are in nss-pk11, Bug 1680963 for RHEL8 0day z-stream. The BZ is in POST status. Done Can the QA or the ASSIGNEE check if the state of this BZ is correct? I saw that it was moved to ON_QA without a fixed-in field defined. Is this fixed already? Will this get fixed in a rebase? The (In reply to Stefano Garzarella from comment #14) > I am able to reproduce it and I discovered that it crashes only when QEMU is > compiled with --enable-gnutls. > The QEMU rbd backend creates two connections to the server: one to create > the file and then close it, and the second one to write to the file. > > With --disable-gnutls, the rados library loads the p11-kit library during > the connection to the server, and then unload it during the shutdown of the > connection. > When QEMU is compiled with --enable-gnutls, the p11-kit dynamic library is > also linked (gnutls pkg-config requires p11-kit pkg-config) to the > executable. > In this case, the finalize code of p11-kit library is called two times > (because of two connection) when the application exit (not during the > shutdown like before) and the second time it crashes. > > I'm investigating where is the issue: rados, p11-kit or NSS (it loads the > p11-kit library) (In reply to Ademar Reis from comment #15) > All of the changes needed are in nss-pk11, Bug 1680963 for RHEL8 0day > z-stream. The BZ is in POST status. Hi Danilo, Based on comment 14 and comment 15, there is no fix in qemu side, it's blocked on nss-pk11 fix. We have marked this bug as 'TestOnly', QE will have a try when nss-pk11 package is ready. Thanks. Hi Tingting, nss/pk11-kit fix has already been fixed in 0-day errata (BZ1690500). Could you please have a try ? Thanks. Tried to verify this issue as below: Tested with: qemu-kvm-4.0.0-1.module+el8.1.0+3225+a8268fde p11-kit-0.23.14-5.el8_0.x86_64 Scenario 1 # qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 rbd:kvmtest-pool/base.luks 20G Formatting 'rbd:kvmtest-pool/base.luks', fmt=luks size=21474836480 key-secret=sec0 Scenario 2 # qemu-img create -f qcow2 test.qcow2 2G Formatting 'test.qcow2', fmt=qcow2 size=2147483648 cluster_size=65536 lazy_refcounts=off refcount_bits=16 # qemu-img convert -O raw test.qcow2 rbd:kvmtest-pool/tgt.img -p (100.00/100%) # echo $? 0 Result: As above. Work normally and no core dumped any more. QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks This one seems to be fixed already. |