Bug 1355641
Summary: | RGW Segfaults during I/O and sync operations on non-master node | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | shilpa <smanjara> |
Component: | RGW | Assignee: | Casey Bodley <cbodley> |
Status: | CLOSED ERRATA | QA Contact: | shilpa <smanjara> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.0 | CC: | cbodley, ceph-eng-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil, vumrao |
Target Milestone: | rc | ||
Target Release: | 2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-10.2.2-23.el7cp Ubuntu: ceph_10.2.2-18redhat1xenial | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-08-23 19:44:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
shilpa
2016-07-12 07:01:24 UTC
So far unable to reproduce this one. The log doesn't have debug information other than the ~30 seconds leading up to the segfault, so it's hard to see what's happening with the long-running RGWMetaSyncShardControlCR that's being woken up here. I do see a potential issue that could lead to this stack trace. RGWMetaSyncCR holds a reference to its RGWMetaSyncShardControlCRs to guarantee that the coroutines won't be freed before it tries to call wakeup() on them. However, we don't hold references to the RGWCoroutinesStacks associated with the RGWMetaSyncShardControlCRs. So if a coroutine was to finish early, its stack would be freed and a later call to RGWCoroutinesStack::wakeup() would segfault. I'll prepare and test a patch that holds a reference to the stack instead of the coroutine itself. In the meantime, Shilpa, if you're able to reproduce this with --debug-rgw=20 and --debug-ms=1, I'd love to see the logs. PR undergoing review (assigned to Yehuda) https://github.com/ceph/ceph/pull/10301 The fix has been cherry-picked to ceph-2-rhel-patches. Verified in 10.2.2-23 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html |