A reproducible segmentation fault was found during RGW testing: RHCS 2.5: Ceph 10.2.10-17.el7cp (9865b1b203321435cc7128257833dca28bd779aa) The segfault is caused here on line 3272 in gc_iterate_entries: 3267 3268 if (max_entries && (i >= max_entries)) { 3269 if (truncated) 3270 *truncated = true; 3271 --iter; 3272 key_iter = iter->first; 3273 return 0; 3274 } 3275 3276 ret = cb(hctx, key, e, param); It looks like iter here may be out of bounds? key_iter == “”, i == 800 (equal to max_entries); there are 32 (GC_NUM_KEYS) items in the keys map. It looks like this code was substantially altered by 5334622a8365520fa4247241f97422c044cbf5b2, but the commit message doesn’t indicate it was a bug fix. #0 std::string::assign (this=this@entry=0x7fffd25862b0, __str=<error reading variable: Cannot access memory at address 0x8>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:249 #1 0x00007fffe34eae4d in operator= (__str=..., this=0x7fffd25862b0) at /usr/include/c++/4.8.2/bits/basic_string.h:547 #2 gc_iterate_entries ( cb=0x7fffe34e8ff0 <gc_list_cb(cls_method_context_t, std::string const&, cls_rgw_gc_obj_info&, void*)>, param=0x7fffd25862a0, truncated=0x7fffd25862b8, max_entries=800, key_iter="", expired_only=<optimized out>, marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3272 #3 gc_list_entries (next_marker="", truncated=0x7fffd25862b8, entries=std::list = {...}, expired_only=<optimized out>, max=800, marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3301 #4 rgw_cls_gc_list (hctx=0x7fffd2586688, in=<optimized out>, out=0x7fffd2586c90) at cls/rgw/cls_rgw.cc:3319 #5 0x00005555559da4a4 in ClassHandler::ClassMethod::exec ( this=this@entry=0x555560ae4438, ctx=ctx@entry=0x7fffd2586688, indata=..., outdata=...) at osd/ClassHandler.cc:287 #6 0x0000555555ad3e13 in ReplicatedPG::do_osd_ops ( this=this@entry=0x55556ed6d000, ctx=ctx@entry=0x55557614a000, ops=std::vector of length 1, capacity 1 = {...}) at osd/ReplicatedPG.cc:4461 #7 0x0000555555ae896f in ReplicatedPG::prepare_transaction ( this=this@entry=0x55556ed6d000, ctx=ctx@entry=0x55557614a000) at osd/ReplicatedPG.cc:6604 #8 0x0000555555ae9850 in ReplicatedPG::execute_ctx ( this=this@entry=0x55556ed6d000, ctx=0x55557614a000) at osd/ReplicatedPG.cc:2970 #9 0x0000555555aed7c3 in ReplicatedPG::do_op (this=<optimized out>, op=...) at osd/ReplicatedPG.cc:2170 #10 0x0000555555aa8be7 in ReplicatedPG::do_request (this=0x55556ed6d000, op= std::shared_ptr (count 5, weak 0) 0x555561770400, handle=...) at osd/ReplicatedPG.cc:1520 #11 0x00005555559559c5 in OSD::dequeue_op (this=0x5555600d4000, pg=..., op=std::shared_ptr (count 5, weak 0) 0x555561770400, handle=...) at osd/OSD.cc:8964 #12 0x0000555555955bed in PGQueueable::RunVis::operator() ( this=0x7fffd25884f0, op=...) at osd/OSD.cc:167 #13 0x00005555559596b9 in OSD::ShardedOpWQ::_process (this=0x5555600d51e8, thread_index=<optimized out>, hb=<optimized out>) at /usr/include/boost/variant/variant.hpp:1017 #14 0x0000555556040727 in ShardedThreadPool::shardedthreadpool_worker ( this=0x5555600d4768, thread_index=<optimized out>) at common/WorkQueue.cc:340 #15 0x0000555556042690 in ShardedThreadPool::WorkThreadSharded::entry ( this=<optimized out>) at common/WorkQueue.h:687 #16 0x00007ffff67b6dd5 in start_thread (arg=0x7fffd258a700) at pthread_create.c:308 #17 0x00007ffff4e3db3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
This appears to have been triggered by radosgw-admin gc list --include-all On a cluster with a very large number of pending GCs.
Adding information from mail thread for future reference - - - - - 8< - - - - - Seems like the issue originated higher in the stack at #4 rgw_cls_gc_list -> #3 gc_list_entries (next_marker="", truncated=0x7fffd25862b8, entries=std::list = {...}, expired_only=<optimized out>, max=800, marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3301 - - - - - 8< - - - - - Comment from Yehuda - quote: >> 3267 >> 3268 if (max_entries && (i >= max_entries)) { >> 3269 if (truncated) >> 3270 *truncated = true; >> 3271 --iter; ^^^ This is dangerous. need to check if (iter != keys.end())
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2651