Description of problem: When creating a volume snapshot, the back-end operation 'taking a lvm_snapshot and starting brick' for the each brick are executed in parallel using synctask framework. brick_start was releasing a big_lock with brick_connect and does a lock again. This will cause a deadlock in some race condition where main-thread waiting for one of the synctask thread to finish and synctask-thread waiting for the big_lock. Version-Release number of selected component (if applicable): 3.5.0 How reproducible: Not awlays Steps to Reproduce: 1. Execute the test-case 'tests/bugs/bug-1090042.t' in loop for': for i in {1..100}; do ./tests/bugs/bug-1090042.t ; done' Actual results: glusterd hangs Expected results: glusterd should not hang Additional info:
REVIEW: http://review.gluster.org/7842 (glusterd/snapshot: brick_start shouldn't be done from child thread) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika)
COMMIT: http://review.gluster.org/7842 committed in master by Krishnan Parthasarathi (kparthas) ------ commit 15f698833de54793880505a1f8e549b956eca137 Author: Vijaikumar M <vmallika> Date: Thu May 22 11:58:06 2014 +0530 glusterd/snapshot: brick_start shouldn't be done from child thread When creating a volume snapshot, the back-end operation 'taking a lvm_snapshot and starting brick' for the each brick are executed in parallel using synctask framework. brick_start was releasing a big_lock with brick_connect and does a lock again. This will cause a deadlock in some race condition where main-thread waiting for one of the synctask thread to finish and synctask-thread waiting for the big_lock. Solution is not to start_brick from from synctask Change-Id: Iaaf0be3070fb71e63c2de8fc2938d2b77d40057d BUG: 1100218 Signed-off-by: Vijaikumar M <vmallika> Reviewed-on: http://review.gluster.org/7842 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Krishnan Parthasarathi <kparthas> Tested-by: Krishnan Parthasarathi <kparthas>
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users