Description of problem: In snapshot creation, releasing the big-lock before completing operation can cause problem like deadlock or memory corruption. Bricks are started as part of snapshot created operation. brick_start releases the big_lock when doing brick_connect and this might cause glusterd crash. There is a similar issue in bug# 1088355. Currently rpc_connect calls the notification function on failure in the same thread, glusterd notification holds the big_lock and hence big_lock is released before rpc_connect. Version-Release number of selected component (if applicable): 3.5.0 How reproducible: Solution is let the event handler handle the failure than doing it in the rpc_connect.
Patch http://review.gluster.org/#/c/7843 posted
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#12) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#13) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#14) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#15) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#16) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#17) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#18) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#19) for review on master by Vijaikumar Mallikarjuna (vmallika)
REVIEW: http://review.gluster.org/7843 (glusterd: Handle rpc_connect failure in the event handler) posted (#20) for review on master by Vijaikumar Mallikarjuna (vmallika)
COMMIT: http://review.gluster.org/7843 committed in master by Raghavendra G (rgowdapp) ------ commit 42b956971c47fd0708cbbd17ce8c78c2ed79bfba Author: Vijaikumar M <vmallika> Date: Fri May 23 14:42:08 2014 +0530 glusterd: Handle rpc_connect failure in the event handler Currently rpc_connect calls the notification function on failure in the same thread, glusterd notification holds the big_lock and hence big_lock is released before rpc_connect In snapshot creation, releasing the big-lock before completeing operation can cause problem like deadlock or memory corruption. Bricks are started as part of snapshot created operation. brick_start releases the big_lock when doing brick_connect and this might cause glusterd crash. There is a similar issue in bug# 1088355. Solution is let the event handler handle the failure than doing it in the rpc_connect. Change-Id: I088d44092ce845a07516c1d67abd02b220e08b38 BUG: 1101507 Signed-off-by: Vijaikumar M <vmallika> Reviewed-on: http://review.gluster.org/7843 Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp>
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users