Bug 1235582
Summary: | snapd crashed due to stack overflow | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | krishnan parthasarathi <kparthas> | |
Component: | protocol | Assignee: | krishnan parthasarathi <kparthas> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | bugs, gluster-bugs, nsathyan, rgowdapp | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1235571 | |||
: | 1253212 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-16 13:16:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1235571 | |||
Bug Blocks: | 1253212 |
Description
krishnan parthasarathi
2015-06-25 09:01:18 UTC
REVIEW: http://review.gluster.org/11399 (core: check for xl->mem_acct only if memory accounting is enabled) posted (#2) for review on master by Krishnan Parthasarathi (kparthas) RCA ---- The stack overflow was seen when older snapshots were being deleted while new ones were being created concurently. In the setup detailed above, snapshot scheduler creates snapshots periodically and auto-delete of snapshots is enabled. When no. of snapshots in the system (of the volume) exceeds the soft-limit configured, snapshots are (auto-)deleted. The crash happened when a scheduled snapshot-create coincided with the auto-delete triggered snapshot-delete operation. Implementation detail ---------------------- Snapshot daemon uses gfapi interface to serve user-serviceable snapshots. gfapi interface creates a new glfs object for every snapshot (volume) serviced. This object is 'linked' with a global xlator object until the time glfs object is fully initialized (i.e, set-volume operation is complete). The global xlator object's ctx (glusterfs_ctx_t) object is being modified in a thread-unsafe manner and could refer to a destroyed ctx (which belonged to glfs representing a deleted snapshot). Fix outline ------------ All initialisation managment operations (e.g, RPCs like DUMP_VERSION, SET_VOLUME, etc.) must refer to the corresponding translator objects in the glfs' graph. REVIEW: http://review.gluster.org/11436 (client: set THIS to client's xl in non-FOP RPCs) posted (#1) for review on master by Krishnan Parthasarathi (kparthas) REVIEW: http://review.gluster.org/11436 (rpc: add owner xlator argument to rpc_clnt_new) posted (#2) for review on master by Krishnan Parthasarathi (kparthas) REVIEW: http://review.gluster.org/11436 (rpc: add owner xlator argument to rpc_clnt_new) posted (#3) for review on master by Krishnan Parthasarathi (kparthas) COMMIT: http://review.gluster.org/11436 committed in master by Raghavendra G (rgowdapp) ------ commit f7668938cd7745d024f3d2884e04cd744d0a69ab Author: Krishnan Parthasarathi <kparthas> Date: Sat Jun 27 11:04:25 2015 +0530 rpc: add owner xlator argument to rpc_clnt_new The @owner argument tells RPC layer the xlator that owns the connection and to which xlator THIS needs be set during network notifications like CONNECT and DISCONNECT. Code paths that originate from the head of a (volume) graph and use STACK_WIND ensure that the RPC local endpoint has the right xlator saved in the frame of the call (callback pair). This guarantees that the callback is executed in the right xlator context. The client handshake process which includes fetching of brick ports from glusterd, setting lk-version on the brick for the session, don't have the correct xlator set in their frames. The problem lies with RPC notifications. It doesn't have the provision to set THIS with the xlator that is registered with the corresponding RPC programs. e.g, RPC_CLNT_CONNECT event received by protocol/client doesn't have THIS set to its xlator. This implies, call(-callbacks) originating from this thread don't have the right xlator set too. The fix would be to save the xlator registered with the RPC connection during rpc_clnt_new. e.g, protocol/client's xlator would be saved with the RPC connection that it 'owns'. RPC notifications such as CONNECT, DISCONNECT, etc inherit THIS from the RPC connection's xlator. Change-Id: I9dea2c35378c511d800ef58f7fa2ea5552f2c409 BUG: 1235582 Signed-off-by: Krishnan Parthasarathi <kparthas> Reviewed-on: http://review.gluster.org/11436 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |