+++ This bug was initially created as a clone of Bug #1229658 +++ Description of problem: statedump requests that traverse call frames of all call stacks in execution may race with a STACK_RESET on a stack. This could crash the corresponding glusterfs process. For e.g, recently we observed this in a regression test case, tests/basic/afr/sparse-self-heal.t. Version-Release number of selected component (if applicable): N/A How reproducible: Intermittent Steps to Reproduce: 1. Maintain constant I/O on a GlusterFS volume. 2. Issue a statedump request, using kill -SIGUSR1 <process-pid> concurrently. 3. Actual results: glusterfs process may crash Expected results: glusterfs process shouldn't crash and the statedump must be logged successfully. Additional info: --- Additional comment from Anand Avati on 2015-06-09 07:43:20 EDT --- REVIEW: http://review.gluster.org/11095 (stack: use list_head for managing frames) posted (#5) for review on master by Krishnan Parthasarathi (kparthas)
REVIEW: http://review.gluster.org/11352 (stack: use list_head for managing frames) posted (#1) for review on release-3.7 by Krishnan Parthasarathi (kparthas)
COMMIT: http://review.gluster.org/11352 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 8ad92bbde3a17ce9aa44e32ae42df5db259fa2ce Author: Krishnan Parthasarathi <kparthas> Date: Fri Jun 5 10:33:11 2015 +0530 stack: use list_head for managing frames PROBLEM -------- statedump requests that traverse call frames of all call stacks in execution may race with a STACK_RESET on a stack. This could crash the corresponding glusterfs process. For e.g, recently we observed this in a regression test case tests/basic/afr/sparse-self-heal.t. FIX --- gf_proc_dump_pending_frames takes a (TRY_LOCK) call_pool->lock before iterating through call frames of all call stacks in progress. With this fix, STACK_RESET removes its call frames under the same lock. Additional info ---------------- This fix makes call_stack_t to use struct list_head in place of custom doubly-linked list implementation. This makes call_frame_t manipulation easier to maintain in the context of STACK_WIND et al. BUG: 1234408 Change-Id: I7e43bccd3994cd9184ab982dba3dbc10618f0d94 Signed-off-by: Krishnan Parthasarathi <kparthas> Reviewed-on: http://review.gluster.org/11095 Reviewed-by: Niels de Vos <ndevos> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: NetBSD Build System <jenkins.org> (cherry picked from commit 79e4c7b2fad6db15863efb4e979525b1bd4862ea) Reviewed-on: http://review.gluster.org/11352
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report. glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user