Bug 1100218

Summary: [SNSPSHOT]brick should not be started in a child thread when creating snapshot
Product: [Community] GlusterFS Reporter: Vijaikumar Mallikarjuna <vmallika>
Component: glusterdAssignee: Vijaikumar Mallikarjuna <vmallika>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: bugs, gluster-bugs, smohan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.6.0beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1101961 (view as bug list) Environment:
Last Closed: 2014-11-11 08:33:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1101961    

Description Vijaikumar Mallikarjuna 2014-05-22 10:02:48 UTC
Description of problem:
When creating a volume snapshot, the back-end operation 'taking a
lvm_snapshot and starting brick' for the each brick
are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock
again.

This will cause a deadlock in some race condition where main-thread waiting
for one of the synctask thread to finish and
synctask-thread waiting for the big_lock.


Version-Release number of selected component (if applicable):
3.5.0

How reproducible:
Not awlays

Steps to Reproduce:
1. Execute the test-case 'tests/bugs/bug-1090042.t' in loop
    for': for i in {1..100}; do ./tests/bugs/bug-1090042.t ; done'

Actual results:
glusterd hangs

Expected results:
glusterd should not hang

Additional info:

Comment 1 Anand Avati 2014-05-22 10:04:03 UTC
REVIEW: http://review.gluster.org/7842 (glusterd/snapshot: brick_start shouldn't be done from child thread) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika)

Comment 2 Anand Avati 2014-05-22 12:21:06 UTC
COMMIT: http://review.gluster.org/7842 committed in master by Krishnan Parthasarathi (kparthas) 
------
commit 15f698833de54793880505a1f8e549b956eca137
Author: Vijaikumar M <vmallika>
Date:   Thu May 22 11:58:06 2014 +0530

    glusterd/snapshot: brick_start shouldn't be done from child thread
    
    When creating a volume snapshot, the back-end operation 'taking a
    lvm_snapshot and starting brick' for the each brick
    are executed in parallel using synctask framework.
    
    brick_start was releasing a big_lock with brick_connect and does a lock
    again.
    This will cause a deadlock in some race condition where main-thread waiting
    for one of the synctask thread to finish and
    synctask-thread waiting for the big_lock.
    
    Solution is not to start_brick from from synctask
    
    Change-Id: Iaaf0be3070fb71e63c2de8fc2938d2b77d40057d
    BUG: 1100218
    Signed-off-by: Vijaikumar M <vmallika>
    Reviewed-on: http://review.gluster.org/7842
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Krishnan Parthasarathi <kparthas>

Comment 3 Niels de Vos 2014-09-22 12:40:52 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-11-11 08:33:04 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users