Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1336381

Summary: ENOTCONN error during parallel rmdir
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.9.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1338051 1339446 (view as bug list) Environment:
Last Closed: 2017-03-27 18:11:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1338051, 1339446    

Description Ravishankar N 2016-05-16 10:18:39 UTC
Description of problem:

Reported by Sakshi Bansal sabansal

Parallel rmdir from multiple clients results in application receiving "Transport end point not connected" messages even though there was no network disconnects.


Steps to Reproduce:
1. Create 1x2 replica, fuse mount it from 2 clients.
2. Run the script from both clients

-------------------------
#!/bin/bash

dir=$(dirname $(readlink -f $0))
echo 'Script in '$dir
while :
do
        mkdir -p foo$1/bar/gee
        mkdir -p foo$1/bar/gne
        mkdir -p foo$1/lna/gme
        rm -rf foo$1
done
-------------------------

Comment 1 Vijay Bellur 2016-05-16 10:19:36 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Return correct op_errno in pre-op) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Vijay Bellur 2016-05-18 09:27:03 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Check for required number of entrylks) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 3 Vijay Bellur 2016-05-20 11:28:18 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Check for required number of entrylks) posted (#3) for review on master by Ravishankar N (ravishankar)

Comment 4 Vijay Bellur 2016-05-20 15:19:42 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Check for required number of entrylks) posted (#4) for review on master by Ravishankar N (ravishankar)

Comment 5 Vijay Bellur 2016-05-23 06:08:37 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Check for required number of entrylks) posted (#6) for review on master by Ravishankar N (ravishankar)

Comment 6 Vijay Bellur 2016-05-23 10:04:56 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Check for required number of entrylks) posted (#7) for review on master by Ravishankar N (ravishankar)

Comment 7 Vijay Bellur 2016-05-24 05:28:24 UTC
REVIEW: http://review.gluster.org/14358 (cluster/afr: Check for required number of entrylks) posted (#8) for review on master by Ravishankar N (ravishankar)

Comment 8 Vijay Bellur 2016-05-24 08:23:49 UTC
COMMIT: http://review.gluster.org/14358 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 86a87a2ec0984f450b36ae6414c2d6d66870af73
Author: Ravishankar N <ravishankar>
Date:   Wed May 18 14:37:46 2016 +0530

    cluster/afr: Check for required number of entrylks
    
    Problem:
    Parallel rmdir operations on the same directory results in ENOTCONN messages
    eventhough there was no network disconnect.
    
    In blocking entry lock during rmdir, AFR takes 2 set of locks on all its
    children-One (parentdir,name of dir to be deleted), the other (full lock
    on the dir being deleted). We proceed to pre-op stage even if only a single
    lock (but not all the needed locks) was obtained, only to fail it with ENOTCONN
    because afr_locked_nodes_get() returns zero nodes  in afr_changelog_pre_op().
    
    Fix:
    After we get replies for all blocking lock requests, if we don't have
    the minimum number of locks to carry out the FOP, unlock and fail the
    FOP. The op_errno will be that of the last failed reply we got, i.e.
    whatever is set in afr_lock_cbk().
    
    Change-Id: Ibef25e65b468ebb5ea6ae1f5121a5f1201072293
    BUG: 1336381
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/14358
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 9 Shyamsundar 2017-03-27 18:11:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/