Bug 1599842
| Summary: | segfault in cls_rgw | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Douglas Fuller <dfuller> |
| Component: | RGW | Assignee: | Matt Benjamin (redhat) <mbenjamin> |
| Status: | CLOSED ERRATA | QA Contact: | Tejas <tchandra> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.5 | CC: | cbodley, ceph-eng-bugs, frival, kbader, kdreyer, mbenjamin, mkogan, owasserm, sweil, tchandra, tserlin, tunguyen, vakulkar |
| Target Milestone: | z2 | Flags: | vakulkar:
automate_bug?
|
| Target Release: | 2.5 | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: ceph-10.2.10-36.el7cp Ubuntu: ceph_10.2.10-31redhat1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-05 19:39:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1581350 | ||
This appears to have been triggered by radosgw-admin gc list --include-all On a cluster with a very large number of pending GCs. Adding information from mail thread for future reference
- - - - - 8< - - - - -
Seems like the issue originated higher in the stack at #4 rgw_cls_gc_list ->
#3 gc_list_entries (next_marker="", truncated=0x7fffd25862b8,
entries=std::list = {...}, expired_only=<optimized out>, max=800,
marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3301
- - - - - 8< - - - - -
Comment from Yehuda - quote:
>> 3267
>> 3268 if (max_entries && (i >= max_entries)) {
>> 3269 if (truncated)
>> 3270 *truncated = true;
>> 3271 --iter;
^^^ This is dangerous. need to check if (iter != keys.end())
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2651 |
A reproducible segmentation fault was found during RGW testing: RHCS 2.5: Ceph 10.2.10-17.el7cp (9865b1b203321435cc7128257833dca28bd779aa) The segfault is caused here on line 3272 in gc_iterate_entries: 3267 3268 if (max_entries && (i >= max_entries)) { 3269 if (truncated) 3270 *truncated = true; 3271 --iter; 3272 key_iter = iter->first; 3273 return 0; 3274 } 3275 3276 ret = cb(hctx, key, e, param); It looks like iter here may be out of bounds? key_iter == “”, i == 800 (equal to max_entries); there are 32 (GC_NUM_KEYS) items in the keys map. It looks like this code was substantially altered by 5334622a8365520fa4247241f97422c044cbf5b2, but the commit message doesn’t indicate it was a bug fix. #0 std::string::assign (this=this@entry=0x7fffd25862b0, __str=<error reading variable: Cannot access memory at address 0x8>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:249 #1 0x00007fffe34eae4d in operator= (__str=..., this=0x7fffd25862b0) at /usr/include/c++/4.8.2/bits/basic_string.h:547 #2 gc_iterate_entries ( cb=0x7fffe34e8ff0 <gc_list_cb(cls_method_context_t, std::string const&, cls_rgw_gc_obj_info&, void*)>, param=0x7fffd25862a0, truncated=0x7fffd25862b8, max_entries=800, key_iter="", expired_only=<optimized out>, marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3272 #3 gc_list_entries (next_marker="", truncated=0x7fffd25862b8, entries=std::list = {...}, expired_only=<optimized out>, max=800, marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3301 #4 rgw_cls_gc_list (hctx=0x7fffd2586688, in=<optimized out>, out=0x7fffd2586c90) at cls/rgw/cls_rgw.cc:3319 #5 0x00005555559da4a4 in ClassHandler::ClassMethod::exec ( this=this@entry=0x555560ae4438, ctx=ctx@entry=0x7fffd2586688, indata=..., outdata=...) at osd/ClassHandler.cc:287 #6 0x0000555555ad3e13 in ReplicatedPG::do_osd_ops ( this=this@entry=0x55556ed6d000, ctx=ctx@entry=0x55557614a000, ops=std::vector of length 1, capacity 1 = {...}) at osd/ReplicatedPG.cc:4461 #7 0x0000555555ae896f in ReplicatedPG::prepare_transaction ( this=this@entry=0x55556ed6d000, ctx=ctx@entry=0x55557614a000) at osd/ReplicatedPG.cc:6604 #8 0x0000555555ae9850 in ReplicatedPG::execute_ctx ( this=this@entry=0x55556ed6d000, ctx=0x55557614a000) at osd/ReplicatedPG.cc:2970 #9 0x0000555555aed7c3 in ReplicatedPG::do_op (this=<optimized out>, op=...) at osd/ReplicatedPG.cc:2170 #10 0x0000555555aa8be7 in ReplicatedPG::do_request (this=0x55556ed6d000, op= std::shared_ptr (count 5, weak 0) 0x555561770400, handle=...) at osd/ReplicatedPG.cc:1520 #11 0x00005555559559c5 in OSD::dequeue_op (this=0x5555600d4000, pg=..., op=std::shared_ptr (count 5, weak 0) 0x555561770400, handle=...) at osd/OSD.cc:8964 #12 0x0000555555955bed in PGQueueable::RunVis::operator() ( this=0x7fffd25884f0, op=...) at osd/OSD.cc:167 #13 0x00005555559596b9 in OSD::ShardedOpWQ::_process (this=0x5555600d51e8, thread_index=<optimized out>, hb=<optimized out>) at /usr/include/boost/variant/variant.hpp:1017 #14 0x0000555556040727 in ShardedThreadPool::shardedthreadpool_worker ( this=0x5555600d4768, thread_index=<optimized out>) at common/WorkQueue.cc:340 #15 0x0000555556042690 in ShardedThreadPool::WorkThreadSharded::entry ( this=<optimized out>) at common/WorkQueue.h:687 #16 0x00007ffff67b6dd5 in start_thread (arg=0x7fffd258a700) at pthread_create.c:308 #17 0x00007ffff4e3db3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113