Bug 1654696 - segmentation fault when "qemu-img convert" a local image to a rbd server
Summary: segmentation fault when "qemu-img convert" a local image to a rbd server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Stefano Garzarella
QA Contact: Tingting Mao
URL:
Whiteboard:
Depends On: 1680963 1690500
Blocks: 1652753
TreeView+ depends on / blocked
 
Reported: 2018-11-29 12:55 UTC by yisun
Modified: 2020-02-19 20:34 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-19 20:34:53 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
The full backtrace (55.97 KB, text/plain)
2018-11-30 02:35 UTC, Han Han
no flags Details
The full backtrace - v2 (5.08 KB, text/plain)
2018-11-30 03:26 UTC, Han Han
no flags Details

Description yisun 2018-11-29 12:55:06 UTC
Description of problem:
segmentation fault when "qemu-img convert" a local image to a rbd server

Version-Release number of selected component (if applicable):
ceph-common-12.2.7-9.el8.x86_64
qemu-img-2.12.0-42.module+el8+2173+537e5cb5.x86_64


How reproducible:
100

Steps to Reproduce:
1. create a locao qcow2 image
[root@dell-per730-66 /]# qemu-img create -f qcow2 /tmp/img.qcow2 1G

2. try to convert this image to a rbd server
[root@dell-per730-66 /]# qemu-img convert -O raw /tmp/img.qcow2 rbd:yisun-pool/rbd.img:mon_host=10.73.75.75
2018-11-29 07:49:09.853314 7fbe04f45b00 -1 Errors while parsing config file!
2018-11-29 07:49:09.853316 7fbe04f45b00 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-11-29 07:49:09.853317 7fbe04f45b00 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-11-29 07:49:09.853317 7fbe04f45b00 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2018-11-29 07:49:10.373648 7fbe04f45b00 -1 Errors while parsing config file!
2018-11-29 07:49:10.373651 7fbe04f45b00 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-11-29 07:49:10.373651 7fbe04f45b00 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-11-29 07:49:10.373652 7fbe04f45b00 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Segmentation fault (core dumped)




Actual results:
Segmentation fault happened

Expected results:
convert succeeds. 

Additional info:
Not reproduced with 
qemu-img-rhev-2.12.0-13.el7.x86_64
ceph-common-12.2.4-30.el7cp.x86_64

I am having trouble to generate the coredump file. This is reproduced on all of our rhel8 auto environment, so you can have a try first. Ceph server info provided in test steps, pls recover if you make any changes. thx

Comment 1 Han Han 2018-11-30 02:35:54 UTC
Created attachment 1510055 [details]
The full backtrace

Comment 4 Han Han 2018-11-30 03:26:44 UTC
Created attachment 1510063 [details]
The full backtrace - v2

Comment 5 Tingting Mao 2018-11-30 03:54:18 UTC
Reproduced this issue:


Tested packages:
qemu-kvm-2.12.0-42.module+el8+2173+537e5cb5
ceph-common-12.2.7-9.el8


Steps:

# qemu-img create -f qcow2 source.qcow2 1G

# qemu-img convert -f qcow2 -O raw source.qcow2 rbd:rbd/convert.img -p
    (100.00/100%)
Segmentation fault (core dumped)

Comment 7 Tingting Mao 2018-12-18 09:04:06 UTC
Hi Ademar,

I also hit the issue[1] in below scenario in fast train. (For his is a basic scenario for whole luks+rbd test, it will block all of the remain cases of the test run.)

So I would like to know whether we need clone the bug to fast train? Or just move it to fast train?

Thanks in advance.



Version-Release number of selected component:
qemu-kvm-3.1.0-1.module+el8+2538+1516be75
ceph-common-12.2.7-9.el8


[1]# qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 rbd:rbd/base.luks 20G
Formatting 'rbd:rbd/base.luks', fmt=luks size=21474836480 key-secret=sec0
Segmentation fault (core dumped)

[1]# cat gdb.txt 
Thread 2 (Thread 0x7efee0544700 (LWP 21707)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x000055bb6baa532f in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at util/qemu-thread-posix.c:438
No locals.
#2  qemu_event_wait (ev=ev@entry=0x55bb6bd2f248 <rcu_call_ready_event>) at util/qemu-thread-posix.c:442
        value = <optimized out>
        __PRETTY_FUNCTION__ = "qemu_event_wait"
#3  0x000055bb6babc982 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261
        tries = 0
        n = <optimized out>
        node = <optimized out>
#4  0x000055bb6baa4b04 in qemu_thread_start (args=0x55bb6d700a70) at util/qemu-thread-posix.c:498
        __clframe = {__cancel_routine = <optimized out>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <optimized out>}
        qemu_thread_args = 0x55bb6d700a70
        start_routine = 0x55bb6babc900 <call_rcu_thread>
        arg = 0x0
        r = <optimized out>
#5  0x00007efee1f6c2de in start_thread (arg=<optimized out>) at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139633150412544, 1310286024443784465, 140731147635390, 140731147635391, 140731147635536, 139633150409664, -1165637794549109487, 
                -1165638593702826735}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007efee1c9ca03 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Thread 1 (Thread 0x7efee46eeb40 (LWP 21706)):
#0  0x00007efee16bd9f3 in next_entry (iter=0x7ffe861049e0) at common/dict.c:70
        bucket = 0x0
        bucket = <optimized out>
#1  p11_dict_next (iter=0x7ffe861049e0, key=0x7ffe861049d8, value=0x0) at common/dict.c:84
        bucket = <optimized out>
#2  0x00007efee163f404 in free_modules_when_no_refs_unlocked () at p11-kit/modules.c:789
        mod = 0x6498c125274d7600
        iter = {dict = 0x0, next = 0x0, index = 0}
#3  0x00007efee16413b9 in p11_modules_release_inlock_reentrant (modules=modules@entry=0x55bb6d832140) at p11-kit/modules.c:1883
        ret = <optimized out>
        rv = <optimized out>
        i = <optimized out>
        __PRETTY_FUNCTION__ = "p11_modules_release_inlock_reentrant"
#4  0x00007efee16419cd in p11_kit_modules_release (modules=0x55bb6d832140) at p11-kit/modules.c:2278
        __func__ = "p11_kit_modules_release"
        __PRETTY_FUNCTION__ = "p11_kit_modules_release"
#5  0x00007efee162aa7b in p11_proxy_module_cleanup () at p11-kit/proxy.c:1733
        state = 0x55bb6d8327b0
        next = 0x0
#6  0x00007efee1627b2d in _p11_kit_fini () at ./common/init.h:61
No locals.
#7  0x00007efee44f3106 in _dl_fini () at dl-fini.c:138
        array = 0x7efee191ae90
        i = <optimized out>
        l = 0x7efee46f8b30
        maps = 0x7ffe86104aa0
        i = <optimized out>
        l = <optimized out>
        nmaps = <optimized out>
        nloaded = <optimized out>
        ns = 0
        do_audit = <optimized out>
        __PRETTY_FUNCTION__ = "_dl_fini"
#8  0x00007efee1bda0cc in __run_exit_handlers (status=0, listp=0x7efee1f5e738 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
        atfct = <optimized out>
        onfct = <optimized out>
        cxafct = <optimized out>
        f = <optimized out>
        new_exitfn_called = 573
        cur = 0x7efee1f5fda0 <initial>
#9  0x00007efee1bda200 in __GI_exit (status=<optimized out>) at exit.c:139
No locals.
#10 0x00007efee1bc381a in __libc_start_main (main=0x55bb6b9f0d00 <main>, argc=10, argv=0x7ffe86104e18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffe86104e08) at ../csu/libc-start.c:342
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -5090820863364600559, 94263452833472, 140731147636240, 0, 0, -1309951474999923439, -1165639237262938863}, mask_was_saved = 0}}, 
          priv = {pad = {0x0, 0x0, 0x7ffe86104e70, 0x7efee470d150}, data = {prev = 0x0, cleanup = 0x0, canceltype = -2045751696}}}
        not_first_call = <optimized out>
#11 0x000055bb6b9f16ee in _start ()
No symbol table info available.

Comment 11 aihua liang 2019-01-09 06:28:07 UTC
When create qcow2 image on rbd backend, also hit this issue:
# qemu-img create -f qcow2 rbd:rbd/win10-32:mon_host=10.66.144.31 40G
Formatting 'rbd:rbd/win10-32:mon_host=10.66.144.31', fmt=qcow2 size=42949672960 cluster_size=65536 lazy_refcounts=off refcount_bits=16
qemu-img: rbd:rbd/win10-32:mon_host=10.66.144.31: Could not write qcow2 header: Invalid argument
Segmentation fault (core dumped)

Coredump same as #c7(Thread 1)

qemu-kvm version:qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64

Comment 13 Tingting Mao 2019-01-28 06:14:20 UTC
Still hit this issue in latest qemu package:

# qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 rbd:rbd/timao_base.luks 20G
Formatting 'rbd:rbd/timao_base.luks', fmt=luks size=21474836480 key-secret=sec0
Segmentation fault (core dumped)

# qemu-img --version
qemu-img version 3.1.0 (qemu-kvm-3.1.0-7.module+el8+2715+f4b84bed)
Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers

Comment 14 Stefano Garzarella 2019-02-20 10:31:43 UTC
I am able to reproduce it and I discovered that it crashes only when QEMU is compiled with --enable-gnutls.
The QEMU rbd backend creates two connections to the server: one to create the file and then close it, and the second one to write to the file.

With --disable-gnutls, the rados library loads the p11-kit library during the connection to the server, and then unload it during the shutdown of the connection.
When QEMU is compiled with --enable-gnutls, the p11-kit dynamic library is also linked (gnutls pkg-config requires p11-kit pkg-config) to the executable.
In this case, the finalize code of p11-kit library is called two times (because of two connection) when the application exit (not during the shutdown like before) and the second time it crashes.

I'm investigating where is the issue: rados, p11-kit or NSS (it loads the p11-kit library)

Comment 15 Ademar Reis 2019-03-05 15:46:46 UTC
All of the changes needed are in nss-pk11, Bug 1680963 for RHEL8 0day z-stream. The BZ is in POST status.

Comment 21 Danilo Cesar de Paula 2019-03-15 01:48:40 UTC
Done

Comment 23 Danilo Cesar de Paula 2019-04-30 00:10:44 UTC
Can the QA or the ASSIGNEE check if the state of this BZ is correct? 

I saw that it was moved to ON_QA without a fixed-in field defined. Is this fixed already? Will this get fixed in a rebase?

Comment 24 CongLi 2019-04-30 02:46:41 UTC
The (In reply to Stefano Garzarella from comment #14)
> I am able to reproduce it and I discovered that it crashes only when QEMU is
> compiled with --enable-gnutls.
> The QEMU rbd backend creates two connections to the server: one to create
> the file and then close it, and the second one to write to the file.
> 
> With --disable-gnutls, the rados library loads the p11-kit library during
> the connection to the server, and then unload it during the shutdown of the
> connection.
> When QEMU is compiled with --enable-gnutls, the p11-kit dynamic library is
> also linked (gnutls pkg-config requires p11-kit pkg-config) to the
> executable.
> In this case, the finalize code of p11-kit library is called two times
> (because of two connection) when the application exit (not during the
> shutdown like before) and the second time it crashes.
> 
> I'm investigating where is the issue: rados, p11-kit or NSS (it loads the
> p11-kit library)

(In reply to Ademar Reis from comment #15)
> All of the changes needed are in nss-pk11, Bug 1680963 for RHEL8 0day
> z-stream. The BZ is in POST status.


Hi Danilo,

Based on comment 14 and comment 15, there is no fix in qemu side, it's blocked on 
nss-pk11 fix.
We have marked this bug as 'TestOnly', QE will have a try when nss-pk11 package is ready.

Thanks.

Comment 25 CongLi 2019-05-28 07:47:35 UTC
Hi Tingting,

nss/pk11-kit fix has already been fixed in 0-day errata (BZ1690500).

Could you please have a try ?

Thanks.

Comment 26 Tingting Mao 2019-05-29 11:44:58 UTC
Tried to verify this issue as below:


Tested with:
qemu-kvm-4.0.0-1.module+el8.1.0+3225+a8268fde
p11-kit-0.23.14-5.el8_0.x86_64


Scenario 1
# qemu-img create -f luks --object secret,id=sec0,data=base -o key-secret=sec0 rbd:kvmtest-pool/base.luks 20G
Formatting 'rbd:kvmtest-pool/base.luks', fmt=luks size=21474836480 key-secret=sec0


Scenario 2
# qemu-img create -f qcow2 test.qcow2 2G
Formatting 'test.qcow2', fmt=qcow2 size=2147483648 cluster_size=65536 lazy_refcounts=off refcount_bits=16
# qemu-img convert -O raw test.qcow2 rbd:kvmtest-pool/tgt.img -p 
    (100.00/100%)
# echo $?
0


Result:
As above. Work normally and no core dumped any more.

Comment 28 Ademar Reis 2020-02-05 22:51:56 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 29 Danilo Cesar de Paula 2020-02-19 20:34:53 UTC
This one seems to be fixed already.


Note You need to log in before you can comment on or make changes to this bug.