Bug 1393665 - Multisite error handling leads to segfaults
Summary: Multisite error handling leads to segfaults
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 2.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 2.1
Assignee: Matt Benjamin (redhat)
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-10 06:19 UTC by shilpa
Modified: 2017-07-30 15:50 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-10.2.3-13.el7cp Ubuntu: ceph_10.2.3-14redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-22 19:33:30 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 17568 None None None 2016-11-10 15:35:58 UTC
Ceph Project Bug Tracker 17569 None None None 2016-11-10 06:19:20 UTC
Ceph Project Bug Tracker 17570 None None None 2016-11-10 06:20:05 UTC
Ceph Project Bug Tracker 17571 None None None 2016-11-10 06:20:43 UTC
Red Hat Product Errata RHSA-2016:2815 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update 2017-03-22 02:06:33 UTC

Description shilpa 2016-11-10 06:19:21 UTC
Description of problem:
While running S3test workload, rgw process crashes:

in thread 7f6bf6ffd700 thread_name:radosgw

 ceph version 10.2.3-12.el7cp (120ddb2dc963bbd3fe12b13c19f7a69422e2d039)
 1: (()+0x5709ca) [0x7f6da3b929ca]
 2: (()+0xf100) [0x7f6da2fa1100]
 3: (gsignal()+0x37) [0x7f6da24e25f7]
 4: (abort()+0x148) [0x7f6da24e3ce8]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f6da3d84e47]
 6: (Mutex::Lock(bool)+0x19c) [0x7f6da3d0e8dc]
 7: (RGWRemoteDataLog::wakeup(int, std::set<std::string, std::less<std::string>, std::allocator<std::string> >&)+0x9f) [0x7f6da39a344f]
 8: (RGWRados::wakeup_data_sync_shards(std::string const&, std::map<int, std::set<std::string, std::less<std::string>, std::allocator<std::string> >, std::less<int>, std::allocator<std::pair<int const, std::set<std::string, std::less<std::string>, std::allocator<std::string> > > > >&)+0x28f) [0x7f6da3a0cabf]
 9: (RGWOp_DATALog_Notify::execute()+0x495) [0x7f6da3ab5ff5]
 10: (process_request(RGWRados*, RGWREST*, RGWRequest*, RGWStreamIO*, OpsLogSocket*)+0xd7f) [0x7f6da39ff7df]
 11: (()+0x192b3) [0x7f6dad4b92b3]
 12: (()+0x2327f) [0x7f6dad4c327f]
 13: (()+0x25298) [0x7f6dad4c5298]
 14: (()+0x7dc5) [0x7f6da2f99dc5]
 15: (clone()+0x6d) [0x7f6da25a3ced]

The related fixes are upstream:

http://tracker.ceph.com/issues/17569 
http://tracker.ceph.com/issues/17570
http://tracker.ceph.com/issues/17571


Version-Release number of selected component (if applicable):
10.2.3-12

How reproducible:
One out of three times

Steps to Reproduce:
1. Create multisite configuration with two zones
2. Run S3tests workload
3. At some point the race condition is hit

Comment 8 shilpa 2016-11-14 14:35:51 UTC
Tested and verified on ceph-10.2.3-13

Comment 10 errata-xmlrpc 2016-11-22 19:33:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html


Note You need to log in before you can comment on or make changes to this bug.