Bug 1353972
Summary: | Master zone radosgw process segfaults during I/O and sync operations | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | shilpa <smanjara> |
Component: | RGW | Assignee: | Casey Bodley <cbodley> |
Status: | CLOSED ERRATA | QA Contact: | shilpa <smanjara> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.0 | CC: | cbodley, ceph-eng-bugs, ceph-qe-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, sweil, yehuda |
Target Milestone: | rc | ||
Target Release: | 2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-10.2.2-18.el7cp Ubuntu: ceph_10.2.2-14redhat1xenial | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-23 19:43:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1354156 | ||
Bug Blocks: |
Description
shilpa
2016-07-08 14:50:01 UTC
upstream fix pending review: https://github.com/ceph/ceph/pull/10157 While I tried to continue with testing after a rgw restart on the two nodes, I noticed that the non-master zone segfaults with a different stack trace a few seconds after the master zone segfaults. 2016-07-09 08:06:39.522101 7fc10d7e2700 -1 *** Caught signal (Segmentation fault) ** in thread 7fc10d7e2700 thread_name:radosgw ceph version 10.2.2-15.el7cp (60cd52496ca02bdde9c2f4191e617f75166d87b6) 1: (()+0x54e22a) [0x7fc192d5d22a] 2: (()+0xf100) [0x7fc19218e100] 3: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x1b) [0x7fc191d30f3b] 4: (rgw_bucket::rgw_bucket(rgw_bucket const&)+0x23) [0x7fc192ae60e3] 5: (RGWRadosRemoveOmapKeysCR::RGWRadosRemoveOmapKeysCR(RGWRados*, rgw_bucket const&, std::string const&, std::set<std::string, std::less<std::string>, std::allocator<std::string> > const&)+0x128) [0x7fc192ae2968] 6: (RGWDataSyncSingleEntryCR::operate()+0xa96) [0x7fc192b9dbb6] 7: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7fc192ad8a2e] 8: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x3f1) [0x7fc192ada9d1] 9: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7fc192adb590] 10: (RGWRemoteDataLog::run_sync(int, rgw_data_sync_status&)+0x352) [0x7fc192b82912] 11: (RGWDataSyncProcessorThread::process()+0x49) [0x7fc192c45289] 12: (RGWRadosThread::Worker::entry()+0x133) [0x7fc192bea083] 13: (()+0x7dc5) [0x7fc192186dc5] 14: (clone()+0x6d) [0x7fc191790ced] Not sure if this is related to the original segfault on master. (In reply to shilpa from comment #6) > Not sure if this is related to the original segfault on master. The fix will address this segfault as well. Running on 10.2.2-18. I hit this stack trace again. This time on non-master node, during object upload and sync operations. 2016-07-12 13:07:27.107029 7f3879ffb700 -1 *** Caught signal (Segmentation fault) ** in thread 7f3879ffb700 thread_name:radosgw ceph version 10.2.2-18.el7cp (408019449adec8263014b356737cf326544ea7c6) 1: (()+0x54e2ba) [0x7f39102ab2ba] 2: (()+0xf100) [0x7f390f6dc100] 3: (RGWCoroutinesStack::wakeup()+0xe) [0x7f39100274ce] 4: (RGWBucketShardIncrementalSyncCR::operate()+0xfed) [0x7f39100d639d] 5: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7f3910026a4e] 6: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&)+0x3f1) [0x7f39100289f1] 7: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7f39100295b0] 8: (RGWRemoteDataLog::run_sync(int, rgw_data_sync_status&)+0x352) [0x7f39100d0932] 9: (RGWDataSyncProcessorThread::process()+0x49) [0x7f3910193319] 10: (RGWRadosThread::Worker::entry()+0x133) [0x7f3910138113] 11: (()+0x7dc5) [0x7f390f6d4dc5] 12: (clone()+0x6d) [0x7f390ecdeced] I haven't seen this occur since 10.2.2-23. Moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html |