Bug 1328840 - [RBD] seeing a crash in 'rbd ls -l <pool>' command
Summary: [RBD] seeing a crash in 'rbd ls -l <pool>' command
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 2.0
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: rc
: 2.0
Assignee: Jason Dillaman
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-20 12:05 UTC by Tejas
Modified: 2017-07-30 15:29 UTC (History)
5 users (show)

Fixed In Version: RHEL: ceph-10.2.1-1.el7cp Ubuntu: ceph_10.2.1-2redhat1xenial
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-23 19:36:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 15574 0 None None None 2016-04-27 19:31:10 UTC
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Description Tejas 2016-04-20 12:05:24 UTC
Description of problem:
During adhoc rbd testing I am seeing a crash in 'rbd ls -l' command

Version-Release number of selected component (if applicable):
ceph 10.1.1-1

How reproducible:
seen only once

Steps to Reproduce:
1. Performed a series of rbd operations including export, import of images nad flatten operations.
2. Deleted all the images in the pool
3. seeing a crash in rbd ls -l .

Actual results:
Crash seen and core dumped

Expected results:
No crash should be seen

Additional info:
 Here is some info about the core:

[root@magna009 ~]# gdb /usr/bin/rbd core.8274
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/rbd...Reading symbols from /usr/lib/debug/usr/bin/rbd.debug...done.
done.
[New LWP 8289]
[New LWP 8292]
[New LWP 8295]
[New LWP 8279]
[New LWP 8278]
[New LWP 8293]
[New LWP 8274]
[New LWP 8286]
[New LWP 8283]
[New LWP 8284]
[New LWP 8288]
[New LWP 8281]
[New LWP 8290]
[New LWP 8294]
[New LWP 8275]
[New LWP 8276]
[New LWP 8285]
[New LWP 8282]
[New LWP 8277]
[New LWP 8280]
[New LWP 8287]
[New LWP 8291]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `rbd ls -l Tejas'.
Program terminated with signal 6, Aborted.
#0  0x00007f12287fffcb in raise () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install boost-iostreams-1.53.0-25.el7.x86_64 boost-program-options-1.53.0-25.el7.x86_64 boost-random-1.53.0-25.el7.x86_64 boost-regex-1.53.0-25.el7.x86_64 boost-system-1.53.0-25.el7.x86_64 boost-thread-1.53.0-25.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 elfutils-libs-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-26.el7_2.2.x86_64 libcap-2.22-8.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libicu-50.1.2-15.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 libuuid-2.23.2-26.el7_2.2.x86_64 lttng-ust-2.4.1-1.el7cp.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-3.16.2.3-13.el7_1.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 sqlite-3.7.17-8.el7.x86_64 systemd-libs-219-19.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007f12287fffcb in raise () from /lib64/libpthread.so.0
#1  0x00007f123c403eb5 in reraise_fatal (signum=6) at global/signal_handler.cc:71
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:133
#3  <signal handler called>
#4  0x00007f122684d5f7 in raise () from /lib64/libc.so.6
#5  0x00007f122684ece8 in abort () from /lib64/libc.so.6
#6  0x00007f123295e057 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=212, 
    func=0x7f1232c14ee0 <librbd::AioImageRequestWQ::shut_down(Context*)::__PRETTY_FUNCTION__> "void librbd::AioImageRequestWQ::shut_down(Context*)") at common/assert.cc:78
#7  0x00007f1232764bf4 in librbd::AioImageRequestWQ::shut_down (this=0x7f11fc002450, on_shutdown=0x7f11fc0039e0) at librbd/AioImageRequestWQ.cc:212
#8  0x00007f12327f1830 in librbd::image::CloseRequest<librbd::ImageCtx>::send_shut_down_aio_queue (this=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:74
#9  0x00007f12327f1c85 in librbd::image::CloseRequest<librbd::ImageCtx>::send_unregister_image_watcher (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:42
#10 0x00007f12327f1cd5 in librbd::image::CloseRequest<librbd::ImageCtx>::send (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:36
#11 0x00007f12327f9079 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11fc000bc0) at librbd/image/RefreshParentRequest.cc:199
#12 0x00007f12327fa3a8 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent (this=0x7f11fc000bc0, result=result@entry=0x7f1203ffb1ec) at librbd/image/RefreshParentRequest.cc:135
#13 0x00007f12327fa66a in librbd::util::detail::C_StateCallbackAdapter<librbd::image::RefreshParentRequest<librbd::ImageCtx>, &librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent, false>::complete (this=0x7f11fc0027a0, r=-2) at librbd/Utils.h:68
#14 0x00007f12327f66df in librbd::util::detail::C_StateCallbackAdapter<librbd::image::OpenRequest<librbd::ImageCtx>, &librbd::image::OpenRequest<librbd::ImageCtx>::handle_close_image, true>::complete (
    this=0x7f11ec002160, r=-2) at librbd/Utils.h:70
#15 0x00007f12327f36c9 in librbd::image::CloseRequest<librbd::ImageCtx>::finish (this=0x7f11ec000af0) at librbd/image/CloseRequest.cc:259
#16 0x00007f12327f3a16 in librbd::image::CloseRequest<librbd::ImageCtx>::send_flush_image_watcher (this=<optimized out>) at librbd/image/CloseRequest.cc:236
#17 0x00007f12327f3c55 in librbd::image::CloseRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11ec000af0) at librbd/image/CloseRequest.cc:208
#18 0x00007f12327f3d0b in librbd::image::CloseRequest<librbd::ImageCtx>::handle_flush_op_work_queue (this=0x7f11ec000af0, r=0) at librbd/image/CloseRequest.cc:202
#19 0x00007f123275bf29 in Context::complete (this=0x7f11fc004f60, r=<optimized out>) at include/Context.h:64
#20 0x00007f123278c014 in ContextWQ::process (this=0x7f11fc0025f0, ctx=0x7f11fc004f60) at common/WorkQueue.h:603
#21 0x00007f123294e9fe in ThreadPool::worker (this=0x7f12477dc1b0, wt=0x7f12477dc490) at common/WorkQueue.cc:128
#22 0x00007f123294f8d0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:440
#23 0x00007f12287f8dc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f122690e28d in clone () from /lib64/libc.so.6
(gdb) f 7
#7  0x00007f1232764bf4 in librbd::AioImageRequestWQ::shut_down (this=0x7f11fc002450, on_shutdown=0x7f11fc0039e0) at librbd/AioImageRequestWQ.cc:212
212	    assert(!m_shutdown);
(gdb) p *this
$1 = {<ThreadPool::PointerWQ<librbd::AioImageRequest<librbd::ImageCtx> >> = {<ThreadPool::WorkQueue_> = {_vptr.WorkQueue_ = 0x7f1232fda590 <vtable for librbd::AioImageRequestWQ+16>, 
      name = "librbd::aio_work_queue", timeout_interval = 60, suicide_interval = 0}, m_pool = 0x7f12477dc1b0, m_items = empty std::list, m_processing = 0}, m_image_ctx = @0x7f11fc000ec0, m_lock = {L = {
      __data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 8289, __shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0}, 
      __size = '\000' <repeats 24 times>, "a ", '\000' <repeats 29 times>, __align = 0}, name = "AioImageRequestWQ::m_lock (0x7f11fc002450)", id = -1, nrlock = {val = 0}, nwlock = {val = 1}, track = true, 
    lockdep = true}, m_write_blocker_contexts = empty std::list, m_write_blockers = 0, m_require_lock_on_read = false, m_in_progress_writes = {val = 0}, m_queued_reads = {val = 0}, m_queued_writes = {val = 0}, 
  m_in_flight_ops = {val = 0}, m_refresh_in_progress = false, m_shutdown = true, m_on_shutdown = 0x0}
(gdb) bt full
#0  0x00007f12287fffcb in raise () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f123c403eb5 in reraise_fatal (signum=6) at global/signal_handler.cc:71
        ret = <optimized out>
        buf = '\000' <repeats 40 times>, "\260|\377\003\022\177\000\000X\r\206&\022\177", '\000' <repeats 26 times>, "\364\377\377\377\377\377\377\377\000\000\000\000\020\000\000\000\215\342\220&\022\177\000\000\000\000\000\000\000\000\000\000 ", '\000' <repeats 19 times>, "\022\177\000\000\000\000\000\000\000\000\000\000\354\200\377\003\022\177", '\000' <repeats 14 times>, "\022\177\000\000\001", '\000' <repeats 15 times>, "\377\377\377\377\377\377\377\377X\r\206&\022\177\000\000"...
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:133
        buf = "*** Caught signal (Aborted) **\n in thread 7f1203fff700 thread_name:tp_librbd\n", '\000' <repeats 811 times>...
        pthread_name = "tp_librbd\000\000\000\000\000\000"
#3  <signal handler called>
No symbol table info available.
#4  0x00007f122684d5f7 in raise () from /lib64/libc.so.6
No symbol table info available.
#5  0x00007f122684ece8 in abort () from /lib64/libc.so.6
No symbol table info available.
#6  0x00007f123295e057 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=212, 
    func=0x7f1232c14ee0 <librbd::AioImageRequestWQ::shut_down(Context*)::__PRETTY_FUNCTION__> "void librbd::AioImageRequestWQ::shut_down(Context*)") at common/assert.cc:78
        tss = <incomplete type>
        buf = "librbd/AioImageRequestWQ.cc: In function 'void librbd::AioImageRequestWQ::shut_down(Context*)' thread 7f1203fff700 time 2016-04-20 09:35:40.643544\nlibrbd/AioImageRequestWQ.cc: 212: FAILED assert(!m_sh"...
        bt = 0x7f11fc004fe0
        oss = <incomplete type>
#7  0x00007f1232764bf4 in librbd::AioImageRequestWQ::shut_down (this=0x7f11fc002450, on_shutdown=0x7f11fc0039e0) at librbd/AioImageRequestWQ.cc:212
        locker = {m_lock = @0x7f11fc002498, locked = true}
        cct = <optimized out>
        __PRETTY_FUNCTION__ = "void librbd::AioImageRequestWQ::shut_down(Context*)"
        __func__ = "shut_down"
#8  0x00007f12327f1830 in librbd::image::CloseRequest<librbd::ImageCtx>::send_shut_down_aio_queue (this=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:74
        __func__ = "send_shut_down_aio_queue"
        cct = <optimized out>
        owner_locker = {m_lock = @0x7f11fc001058, locked = true}
#9  0x00007f12327f1c85 in librbd::image::CloseRequest<librbd::ImageCtx>::send_unregister_image_watcher (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:42
        __func__ = "send_unregister_image_watcher"
        cct = <optimized out>
#10 0x00007f12327f1cd5 in librbd::image::CloseRequest<librbd::ImageCtx>::send (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:36
No locals.
#11 0x00007f12327f9079 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11fc000bc0) at librbd/image/RefreshParentRequest.cc:199
        __func__ = "send_close_parent"
        __PRETTY_FUNCTION__ = "void librbd::image::RefreshParentRequest<ImageCtxT>::send_close_parent() [with ImageCtxT = librbd::ImageCtx]"
        ctx = 0x7f11fc004c80
        cct = <optimized out>
#12 0x00007f12327fa3a8 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent (this=0x7f11fc000bc0, result=result@entry=0x7f1203ffb1ec) at librbd/image/RefreshParentRequest.cc:135
        __func__ = "handle_open_parent"
        cct = 0x7f1247771e30
#13 0x00007f12327fa66a in librbd::util::detail::C_StateCallbackAdapter<librbd::image::RefreshParentRequest<librbd::ImageCtx>, &librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent, false>::complete (this=0x7f11fc0027a0, r=-2) at librbd/Utils.h:68
        on_finish = <optimized out>
#14 0x00007f12327f66df in librbd::util::detail::C_StateCallbackAdapter<librbd::image::OpenRequest<librbd::ImageCtx>, &librbd::image::OpenRequest<librbd::ImageCtx>::handle_close_image, true>::complete (
    this=0x7f11ec002160, r=-2) at librbd/Utils.h:70
        on_finish = <optimized out>
---Type <return> to continue, or q <return> to quit--- 
#15 0x00007f12327f36c9 in librbd::image::CloseRequest<librbd::ImageCtx>::finish (this=0x7f11ec000af0) at librbd/image/CloseRequest.cc:259
No locals.
#16 0x00007f12327f3a16 in librbd::image::CloseRequest<librbd::ImageCtx>::send_flush_image_watcher (this=<optimized out>) at librbd/image/CloseRequest.cc:236
No locals.
#17 0x00007f12327f3c55 in librbd::image::CloseRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11ec000af0) at librbd/image/CloseRequest.cc:208
        __func__ = "send_close_parent"
        cct = <optimized out>
#18 0x00007f12327f3d0b in librbd::image::CloseRequest<librbd::ImageCtx>::handle_flush_op_work_queue (this=0x7f11ec000af0, r=0) at librbd/image/CloseRequest.cc:202
        __func__ = "handle_flush_op_work_queue"
        cct = 0x7f1247771e30
#19 0x00007f123275bf29 in Context::complete (this=0x7f11fc004f60, r=<optimized out>) at include/Context.h:64
No locals.
#20 0x00007f123278c014 in ContextWQ::process (this=0x7f11fc0025f0, ctx=0x7f11fc004f60) at common/WorkQueue.h:603
        result = 0
#21 0x00007f123294e9fe in ThreadPool::worker (this=0x7f12477dc1b0, wt=0x7f12477dc490) at common/WorkQueue.cc:128
        tp_handle = {cct = 0x7f1247771e30, hb = 0x7f11fc000b40, grace = 60, suicide_grace = 0}
        item = 0x7f11fc004f60
        wq = 0x7f11fc0025f0
        tries = <optimized out>
        did = <optimized out>
        ss = <incomplete type>
        hb = 0x7f11fc000b40
#22 0x00007f123294f8d0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:440
No locals.
#23 0x00007f12287f8dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#24 0x00007f122690e28d in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb) q
[root@magna009 ~]# 

I will update this bug about the core location

Comment 2 Tejas 2016-04-20 12:19:03 UTC
Core is located at magna002.ceph.redhat.com:/home/tchandra/core.8274

Comment 3 Jason Dillaman 2016-04-27 20:05:58 UTC
Upstream PR: https://github.com/ceph/ceph/pull/8791

Comment 4 Ken Dreyer (Red Hat) 2016-05-10 13:21:38 UTC
This is undergoing review upstream (https://github.com/ceph/ceph/pull/8867) and will be in v10.2.1.

Comment 5 Ken Dreyer (Red Hat) 2016-05-16 15:30:45 UTC
The above PR was merged to jewel and is present in v10.2.1.

Comment 8 Tanay Ganguly 2016-05-30 11:34:42 UTC
Unable to reproduce.

Marking it as verified.
ceph version 10.2.1-6.el7cp

Comment 10 errata-xmlrpc 2016-08-23 19:36:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html


Note You need to log in before you can comment on or make changes to this bug.