Description of problem: While running S3test workload, rgw process crashes: in thread 7f6bf6ffd700 thread_name:radosgw ceph version 10.2.3-12.el7cp (120ddb2dc963bbd3fe12b13c19f7a69422e2d039) 1: (()+0x5709ca) [0x7f6da3b929ca] 2: (()+0xf100) [0x7f6da2fa1100] 3: (gsignal()+0x37) [0x7f6da24e25f7] 4: (abort()+0x148) [0x7f6da24e3ce8] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x267) [0x7f6da3d84e47] 6: (Mutex::Lock(bool)+0x19c) [0x7f6da3d0e8dc] 7: (RGWRemoteDataLog::wakeup(int, std::set<std::string, std::less<std::string>, std::allocator<std::string> >&)+0x9f) [0x7f6da39a344f] 8: (RGWRados::wakeup_data_sync_shards(std::string const&, std::map<int, std::set<std::string, std::less<std::string>, std::allocator<std::string> >, std::less<int>, std::allocator<std::pair<int const, std::set<std::string, std::less<std::string>, std::allocator<std::string> > > > >&)+0x28f) [0x7f6da3a0cabf] 9: (RGWOp_DATALog_Notify::execute()+0x495) [0x7f6da3ab5ff5] 10: (process_request(RGWRados*, RGWREST*, RGWRequest*, RGWStreamIO*, OpsLogSocket*)+0xd7f) [0x7f6da39ff7df] 11: (()+0x192b3) [0x7f6dad4b92b3] 12: (()+0x2327f) [0x7f6dad4c327f] 13: (()+0x25298) [0x7f6dad4c5298] 14: (()+0x7dc5) [0x7f6da2f99dc5] 15: (clone()+0x6d) [0x7f6da25a3ced] The related fixes are upstream: http://tracker.ceph.com/issues/17569 http://tracker.ceph.com/issues/17570 http://tracker.ceph.com/issues/17571 Version-Release number of selected component (if applicable): 10.2.3-12 How reproducible: One out of three times Steps to Reproduce: 1. Create multisite configuration with two zones 2. Run S3tests workload 3. At some point the race condition is hit
Tested and verified on ceph-10.2.3-13
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html