Bug 1235571
Summary: | snapd crashed due to stack overflow | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anil Shah <ashah> | |
Component: | core | Assignee: | krishnan parthasarathi <kparthas> | |
Status: | CLOSED ERRATA | QA Contact: | Anil Shah <ashah> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | amukherj, asrivast, kparthas, nsathyan, rgowdapp, rhs-bugs, skoduri, storage-qa-internal, vagarwal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.1 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.1-12 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1235582 (view as bug list) | Environment: | ||
Last Closed: | 2015-10-05 07:14:58 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1216951, 1235582, 1251815, 1253212 |
Description
Anil Shah
2015-06-25 08:30:59 UTC
I think once http://review.gluster.org/10824 will get backported to downstream, this issue won't be seen RCA ---- The stack overflow was seen when older snapshots were being deleted while new ones were being created concurently. In the setup detailed above, snapshot scheduler creates snapshots periodically and auto-delete of snapshots is enabled. When no. of snapshots in the system (of the volume) exceeds the soft-limit configured, snapshots are (auto-)deleted. The crash happened when a scheduled snapshot-create coincided with the auto-delete triggered snapshot-delete operation. Implementation detail ---------------------- Snapshot daemon uses gfapi interface to serve user-serviceable snapshots. gfapi interface creates a new glfs object for every snapshot (volume) serviced. This object is 'linked' with a global xlator object until the time glfs object is fully initialized (i.e, set-volume operation is complete). The global xlator object's ctx (glusterfs_ctx_t) object is being modified in a thread-unsafe manner and could refer to a destroyed ctx (which belonged to glfs representing a deleted snapshot). Fix outline ------------ All initialisation managment operations (e.g, RPCs like DUMP_VERSION, SET_VOLUME, etc.) must refer to the corresponding translator objects in the glfs' graph. Atin, the patch you refer to in comment #2 doesn't fix the issue. Please refer to the RCA in comment #3. Doc text is edited. Please sign off to be included in Known Issues. Upstream patch on master branch - http://review.gluster.com/11436 Bug verified on build glusterfs-3.7.1-15.el7rhgs.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html |