Description of problem: During adhoc rbd testing I am seeing a crash in 'rbd ls -l' command Version-Release number of selected component (if applicable): ceph 10.1.1-1 How reproducible: seen only once Steps to Reproduce: 1. Performed a series of rbd operations including export, import of images nad flatten operations. 2. Deleted all the images in the pool 3. seeing a crash in rbd ls -l . Actual results: Crash seen and core dumped Expected results: No crash should be seen Additional info: Here is some info about the core: [root@magna009 ~]# gdb /usr/bin/rbd core.8274 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/rbd...Reading symbols from /usr/lib/debug/usr/bin/rbd.debug...done. done. [New LWP 8289] [New LWP 8292] [New LWP 8295] [New LWP 8279] [New LWP 8278] [New LWP 8293] [New LWP 8274] [New LWP 8286] [New LWP 8283] [New LWP 8284] [New LWP 8288] [New LWP 8281] [New LWP 8290] [New LWP 8294] [New LWP 8275] [New LWP 8276] [New LWP 8285] [New LWP 8282] [New LWP 8277] [New LWP 8280] [New LWP 8287] [New LWP 8291] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `rbd ls -l Tejas'. Program terminated with signal 6, Aborted. #0 0x00007f12287fffcb in raise () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install boost-iostreams-1.53.0-25.el7.x86_64 boost-program-options-1.53.0-25.el7.x86_64 boost-random-1.53.0-25.el7.x86_64 boost-regex-1.53.0-25.el7.x86_64 boost-system-1.53.0-25.el7.x86_64 boost-thread-1.53.0-25.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 elfutils-libs-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-26.el7_2.2.x86_64 libcap-2.22-8.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libicu-50.1.2-15.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 libuuid-2.23.2-26.el7_2.2.x86_64 lttng-ust-2.4.1-1.el7cp.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-3.16.2.3-13.el7_1.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 sqlite-3.7.17-8.el7.x86_64 systemd-libs-219-19.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64 (gdb) bt #0 0x00007f12287fffcb in raise () from /lib64/libpthread.so.0 #1 0x00007f123c403eb5 in reraise_fatal (signum=6) at global/signal_handler.cc:71 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:133 #3 <signal handler called> #4 0x00007f122684d5f7 in raise () from /lib64/libc.so.6 #5 0x00007f122684ece8 in abort () from /lib64/libc.so.6 #6 0x00007f123295e057 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=212, func=0x7f1232c14ee0 <librbd::AioImageRequestWQ::shut_down(Context*)::__PRETTY_FUNCTION__> "void librbd::AioImageRequestWQ::shut_down(Context*)") at common/assert.cc:78 #7 0x00007f1232764bf4 in librbd::AioImageRequestWQ::shut_down (this=0x7f11fc002450, on_shutdown=0x7f11fc0039e0) at librbd/AioImageRequestWQ.cc:212 #8 0x00007f12327f1830 in librbd::image::CloseRequest<librbd::ImageCtx>::send_shut_down_aio_queue (this=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:74 #9 0x00007f12327f1c85 in librbd::image::CloseRequest<librbd::ImageCtx>::send_unregister_image_watcher (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:42 #10 0x00007f12327f1cd5 in librbd::image::CloseRequest<librbd::ImageCtx>::send (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:36 #11 0x00007f12327f9079 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11fc000bc0) at librbd/image/RefreshParentRequest.cc:199 #12 0x00007f12327fa3a8 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent (this=0x7f11fc000bc0, result=result@entry=0x7f1203ffb1ec) at librbd/image/RefreshParentRequest.cc:135 #13 0x00007f12327fa66a in librbd::util::detail::C_StateCallbackAdapter<librbd::image::RefreshParentRequest<librbd::ImageCtx>, &librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent, false>::complete (this=0x7f11fc0027a0, r=-2) at librbd/Utils.h:68 #14 0x00007f12327f66df in librbd::util::detail::C_StateCallbackAdapter<librbd::image::OpenRequest<librbd::ImageCtx>, &librbd::image::OpenRequest<librbd::ImageCtx>::handle_close_image, true>::complete ( this=0x7f11ec002160, r=-2) at librbd/Utils.h:70 #15 0x00007f12327f36c9 in librbd::image::CloseRequest<librbd::ImageCtx>::finish (this=0x7f11ec000af0) at librbd/image/CloseRequest.cc:259 #16 0x00007f12327f3a16 in librbd::image::CloseRequest<librbd::ImageCtx>::send_flush_image_watcher (this=<optimized out>) at librbd/image/CloseRequest.cc:236 #17 0x00007f12327f3c55 in librbd::image::CloseRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11ec000af0) at librbd/image/CloseRequest.cc:208 #18 0x00007f12327f3d0b in librbd::image::CloseRequest<librbd::ImageCtx>::handle_flush_op_work_queue (this=0x7f11ec000af0, r=0) at librbd/image/CloseRequest.cc:202 #19 0x00007f123275bf29 in Context::complete (this=0x7f11fc004f60, r=<optimized out>) at include/Context.h:64 #20 0x00007f123278c014 in ContextWQ::process (this=0x7f11fc0025f0, ctx=0x7f11fc004f60) at common/WorkQueue.h:603 #21 0x00007f123294e9fe in ThreadPool::worker (this=0x7f12477dc1b0, wt=0x7f12477dc490) at common/WorkQueue.cc:128 #22 0x00007f123294f8d0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:440 #23 0x00007f12287f8dc5 in start_thread () from /lib64/libpthread.so.0 #24 0x00007f122690e28d in clone () from /lib64/libc.so.6 (gdb) f 7 #7 0x00007f1232764bf4 in librbd::AioImageRequestWQ::shut_down (this=0x7f11fc002450, on_shutdown=0x7f11fc0039e0) at librbd/AioImageRequestWQ.cc:212 212 assert(!m_shutdown); (gdb) p *this $1 = {<ThreadPool::PointerWQ<librbd::AioImageRequest<librbd::ImageCtx> >> = {<ThreadPool::WorkQueue_> = {_vptr.WorkQueue_ = 0x7f1232fda590 <vtable for librbd::AioImageRequestWQ+16>, name = "librbd::aio_work_queue", timeout_interval = 60, suicide_interval = 0}, m_pool = 0x7f12477dc1b0, m_items = empty std::list, m_processing = 0}, m_image_ctx = @0x7f11fc000ec0, m_lock = {L = { __data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 8289, __shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0}, __size = '\000' <repeats 24 times>, "a ", '\000' <repeats 29 times>, __align = 0}, name = "AioImageRequestWQ::m_lock (0x7f11fc002450)", id = -1, nrlock = {val = 0}, nwlock = {val = 1}, track = true, lockdep = true}, m_write_blocker_contexts = empty std::list, m_write_blockers = 0, m_require_lock_on_read = false, m_in_progress_writes = {val = 0}, m_queued_reads = {val = 0}, m_queued_writes = {val = 0}, m_in_flight_ops = {val = 0}, m_refresh_in_progress = false, m_shutdown = true, m_on_shutdown = 0x0} (gdb) bt full #0 0x00007f12287fffcb in raise () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007f123c403eb5 in reraise_fatal (signum=6) at global/signal_handler.cc:71 ret = <optimized out> buf = '\000' <repeats 40 times>, "\260|\377\003\022\177\000\000X\r\206&\022\177", '\000' <repeats 26 times>, "\364\377\377\377\377\377\377\377\000\000\000\000\020\000\000\000\215\342\220&\022\177\000\000\000\000\000\000\000\000\000\000 ", '\000' <repeats 19 times>, "\022\177\000\000\000\000\000\000\000\000\000\000\354\200\377\003\022\177", '\000' <repeats 14 times>, "\022\177\000\000\001", '\000' <repeats 15 times>, "\377\377\377\377\377\377\377\377X\r\206&\022\177\000\000"... #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:133 buf = "*** Caught signal (Aborted) **\n in thread 7f1203fff700 thread_name:tp_librbd\n", '\000' <repeats 811 times>... pthread_name = "tp_librbd\000\000\000\000\000\000" #3 <signal handler called> No symbol table info available. #4 0x00007f122684d5f7 in raise () from /lib64/libc.so.6 No symbol table info available. #5 0x00007f122684ece8 in abort () from /lib64/libc.so.6 No symbol table info available. #6 0x00007f123295e057 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=212, func=0x7f1232c14ee0 <librbd::AioImageRequestWQ::shut_down(Context*)::__PRETTY_FUNCTION__> "void librbd::AioImageRequestWQ::shut_down(Context*)") at common/assert.cc:78 tss = <incomplete type> buf = "librbd/AioImageRequestWQ.cc: In function 'void librbd::AioImageRequestWQ::shut_down(Context*)' thread 7f1203fff700 time 2016-04-20 09:35:40.643544\nlibrbd/AioImageRequestWQ.cc: 212: FAILED assert(!m_sh"... bt = 0x7f11fc004fe0 oss = <incomplete type> #7 0x00007f1232764bf4 in librbd::AioImageRequestWQ::shut_down (this=0x7f11fc002450, on_shutdown=0x7f11fc0039e0) at librbd/AioImageRequestWQ.cc:212 locker = {m_lock = @0x7f11fc002498, locked = true} cct = <optimized out> __PRETTY_FUNCTION__ = "void librbd::AioImageRequestWQ::shut_down(Context*)" __func__ = "shut_down" #8 0x00007f12327f1830 in librbd::image::CloseRequest<librbd::ImageCtx>::send_shut_down_aio_queue (this=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:74 __func__ = "send_shut_down_aio_queue" cct = <optimized out> owner_locker = {m_lock = @0x7f11fc001058, locked = true} #9 0x00007f12327f1c85 in librbd::image::CloseRequest<librbd::ImageCtx>::send_unregister_image_watcher (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:42 __func__ = "send_unregister_image_watcher" cct = <optimized out> #10 0x00007f12327f1cd5 in librbd::image::CloseRequest<librbd::ImageCtx>::send (this=this@entry=0x7f11fc0038c0) at librbd/image/CloseRequest.cc:36 No locals. #11 0x00007f12327f9079 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11fc000bc0) at librbd/image/RefreshParentRequest.cc:199 __func__ = "send_close_parent" __PRETTY_FUNCTION__ = "void librbd::image::RefreshParentRequest<ImageCtxT>::send_close_parent() [with ImageCtxT = librbd::ImageCtx]" ctx = 0x7f11fc004c80 cct = <optimized out> #12 0x00007f12327fa3a8 in librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent (this=0x7f11fc000bc0, result=result@entry=0x7f1203ffb1ec) at librbd/image/RefreshParentRequest.cc:135 __func__ = "handle_open_parent" cct = 0x7f1247771e30 #13 0x00007f12327fa66a in librbd::util::detail::C_StateCallbackAdapter<librbd::image::RefreshParentRequest<librbd::ImageCtx>, &librbd::image::RefreshParentRequest<librbd::ImageCtx>::handle_open_parent, false>::complete (this=0x7f11fc0027a0, r=-2) at librbd/Utils.h:68 on_finish = <optimized out> #14 0x00007f12327f66df in librbd::util::detail::C_StateCallbackAdapter<librbd::image::OpenRequest<librbd::ImageCtx>, &librbd::image::OpenRequest<librbd::ImageCtx>::handle_close_image, true>::complete ( this=0x7f11ec002160, r=-2) at librbd/Utils.h:70 on_finish = <optimized out> ---Type <return> to continue, or q <return> to quit--- #15 0x00007f12327f36c9 in librbd::image::CloseRequest<librbd::ImageCtx>::finish (this=0x7f11ec000af0) at librbd/image/CloseRequest.cc:259 No locals. #16 0x00007f12327f3a16 in librbd::image::CloseRequest<librbd::ImageCtx>::send_flush_image_watcher (this=<optimized out>) at librbd/image/CloseRequest.cc:236 No locals. #17 0x00007f12327f3c55 in librbd::image::CloseRequest<librbd::ImageCtx>::send_close_parent (this=this@entry=0x7f11ec000af0) at librbd/image/CloseRequest.cc:208 __func__ = "send_close_parent" cct = <optimized out> #18 0x00007f12327f3d0b in librbd::image::CloseRequest<librbd::ImageCtx>::handle_flush_op_work_queue (this=0x7f11ec000af0, r=0) at librbd/image/CloseRequest.cc:202 __func__ = "handle_flush_op_work_queue" cct = 0x7f1247771e30 #19 0x00007f123275bf29 in Context::complete (this=0x7f11fc004f60, r=<optimized out>) at include/Context.h:64 No locals. #20 0x00007f123278c014 in ContextWQ::process (this=0x7f11fc0025f0, ctx=0x7f11fc004f60) at common/WorkQueue.h:603 result = 0 #21 0x00007f123294e9fe in ThreadPool::worker (this=0x7f12477dc1b0, wt=0x7f12477dc490) at common/WorkQueue.cc:128 tp_handle = {cct = 0x7f1247771e30, hb = 0x7f11fc000b40, grace = 60, suicide_grace = 0} item = 0x7f11fc004f60 wq = 0x7f11fc0025f0 tries = <optimized out> did = <optimized out> ss = <incomplete type> hb = 0x7f11fc000b40 #22 0x00007f123294f8d0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:440 No locals. #23 0x00007f12287f8dc5 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #24 0x00007f122690e28d in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) q [root@magna009 ~]# I will update this bug about the core location
Core is located at magna002.ceph.redhat.com:/home/tchandra/core.8274
Upstream PR: https://github.com/ceph/ceph/pull/8791
This is undergoing review upstream (https://github.com/ceph/ceph/pull/8867) and will be in v10.2.1.
The above PR was merged to jewel and is present in v10.2.1.
Unable to reproduce. Marking it as verified. ceph version 10.2.1-6.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html