1599842 – segfault in cls_rgw

Bug 1599842 - segfault in cls_rgw

Summary: segfault in cls_rgw

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	2.5
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z2
Target Release:	2.5
Assignee:	Matt Benjamin (redhat)
QA Contact:	Tejas
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1581350
TreeView+	depends on / blocked

Reported:	2018-07-10 17:47 UTC by Douglas Fuller
Modified:	2018-09-05 19:40 UTC (History)
CC List:	13 users (show)
Fixed In Version:	RHEL: ceph-10.2.10-36.el7cp Ubuntu: ceph_10.2.10-31redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-05 19:39:32 UTC
Embargoed:
Dependent Products:
Flags:	vakulkar: automate_bug?

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	26882	0	None	None	None	2018-08-08 18:24:51 UTC
Red Hat Product Errata	RHBA-2018:2651	0	None	None	None	2018-09-05 19:40:25 UTC

Description Douglas Fuller 2018-07-10 17:47:46 UTC

A reproducible segmentation fault was found during RGW testing:

RHCS 2.5: Ceph 10.2.10-17.el7cp (9865b1b203321435cc7128257833dca28bd779aa)

The segfault is caused here on line 3272 in gc_iterate_entries:

3267	
3268	      if (max_entries && (i >= max_entries)) {
3269	        if (truncated)
3270	          *truncated = true;
3271	        --iter;
3272	        key_iter = iter->first;
3273	        return 0;
3274	      }
3275	
3276	      ret = cb(hctx, key, e, param);

It looks like iter here may be out of bounds? key_iter == “”, i == 800 (equal to max_entries); there are 32 (GC_NUM_KEYS) items in the keys map.

It looks like this code was substantially altered by 5334622a8365520fa4247241f97422c044cbf5b2, but the commit message doesn’t indicate it was a bug fix.

#0  std::string::assign (this=this@entry=0x7fffd25862b0, 
   __str=<error reading variable: Cannot access memory at address 0x8>)
   at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:249
#1  0x00007fffe34eae4d in operator= (__str=..., this=0x7fffd25862b0)
   at /usr/include/c++/4.8.2/bits/basic_string.h:547
#2  gc_iterate_entries (
   cb=0x7fffe34e8ff0 <gc_list_cb(cls_method_context_t, std::string const&, cls_rgw_gc_obj_info&, void*)>, param=0x7fffd25862a0, truncated=0x7fffd25862b8, 
   max_entries=800, key_iter="", expired_only=<optimized out>, marker="", 
   hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3272
#3  gc_list_entries (next_marker="", truncated=0x7fffd25862b8, 
   entries=std::list = {...}, expired_only=<optimized out>, max=800, 
   marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3301
#4  rgw_cls_gc_list (hctx=0x7fffd2586688, in=<optimized out>, 
   out=0x7fffd2586c90) at cls/rgw/cls_rgw.cc:3319
#5  0x00005555559da4a4 in ClassHandler::ClassMethod::exec (
   this=this@entry=0x555560ae4438, ctx=ctx@entry=0x7fffd2586688, indata=..., 
   outdata=...) at osd/ClassHandler.cc:287
#6  0x0000555555ad3e13 in ReplicatedPG::do_osd_ops (
   this=this@entry=0x55556ed6d000, ctx=ctx@entry=0x55557614a000, 
   ops=std::vector of length 1, capacity 1 = {...})
   at osd/ReplicatedPG.cc:4461
#7  0x0000555555ae896f in ReplicatedPG::prepare_transaction (
   this=this@entry=0x55556ed6d000, ctx=ctx@entry=0x55557614a000)
   at osd/ReplicatedPG.cc:6604
#8  0x0000555555ae9850 in ReplicatedPG::execute_ctx (
   this=this@entry=0x55556ed6d000, ctx=0x55557614a000)
   at osd/ReplicatedPG.cc:2970
#9  0x0000555555aed7c3 in ReplicatedPG::do_op (this=<optimized out>, op=...)
   at osd/ReplicatedPG.cc:2170
#10 0x0000555555aa8be7 in ReplicatedPG::do_request (this=0x55556ed6d000, op=
   std::shared_ptr (count 5, weak 0) 0x555561770400, handle=...)
   at osd/ReplicatedPG.cc:1520
#11 0x00005555559559c5 in OSD::dequeue_op (this=0x5555600d4000, pg=..., 
   op=std::shared_ptr (count 5, weak 0) 0x555561770400, handle=...)
   at osd/OSD.cc:8964
#12 0x0000555555955bed in PGQueueable::RunVis::operator() (
   this=0x7fffd25884f0, op=...) at osd/OSD.cc:167
#13 0x00005555559596b9 in OSD::ShardedOpWQ::_process (this=0x5555600d51e8, 
   thread_index=<optimized out>, hb=<optimized out>)
   at /usr/include/boost/variant/variant.hpp:1017
#14 0x0000555556040727 in ShardedThreadPool::shardedthreadpool_worker (
   this=0x5555600d4768, thread_index=<optimized out>)
   at common/WorkQueue.cc:340
#15 0x0000555556042690 in ShardedThreadPool::WorkThreadSharded::entry (
   this=<optimized out>) at common/WorkQueue.h:687
#16 0x00007ffff67b6dd5 in start_thread (arg=0x7fffd258a700)
   at pthread_create.c:308
#17 0x00007ffff4e3db3d in clone ()
   at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 4 Douglas Fuller 2018-07-10 17:57:05 UTC

This appears to have been triggered by

radosgw-admin gc list --include-all

On a cluster with a very large number of pending GCs.

Comment 5 Mark Kogan 2018-07-11 08:16:34 UTC

Adding information from mail thread for future reference

- - - - -  8<  - - - - - 
Seems like the issue originated higher in the stack at #4 rgw_cls_gc_list ->
#3  gc_list_entries (next_marker="", truncated=0x7fffd25862b8, 
    entries=std::list = {...}, expired_only=<optimized out>, max=800, 
    marker="", hctx=0x7fffd2586688) at cls/rgw/cls_rgw.cc:3301
- - - - -  8<  - - - - - 
Comment from Yehuda - quote:

>> 3267
>> 3268          if (max_entries && (i >= max_entries)) {
>> 3269            if (truncated)
>> 3270              *truncated = true;
>> 3271            --iter;

^^^ This is dangerous. need to check if (iter != keys.end())

Comment 16 errata-xmlrpc 2018-09-05 19:39:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2651

Note You need to log in before you can comment on or make changes to this bug.