Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1360576

Summary: [Disperse volume]: IO hang seen on mount with file ops
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: disperseAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.8.1CC: amukherj, aspandey, asrivast, bugs, kramdoss, nchilaka, pkarampu, rcyriac, rhinduja, sarumuga, skoduri
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1344836
: 1361402 (view as bug list) Environment:
Last Closed: 2016-08-12 09:48:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1329466, 1330132, 1330997, 1342426, 1344836    
Bug Blocks: 1361402    

Comment 1 Pranith Kumar K 2016-07-27 05:20:23 UTC
This is an issue we observed in internal testing:
The locks were getting acquired at the time when bricks were going down because of ping timeouts. 4 of the 6 bricks went down at that time. 2 of the 6 bricks have locks which are not being unlocked for some reason and were left stale.

Steps to recreate the issue:
1) create a plain disperse volume
2) Put a breakpoint at ec_wind_inodelk
3) From the fuse mount issue ls -laR <mount>
4) as soon as the break point is hit in gdb, from other terminal kill 4 of the 6 bricks
5) quit gdb
6) Wait for a second or two to confirm that there are stale locks on the remaining bricks
7) In my case there were, so I issued ls -laR on the mount and it hung.

Relevant logs to come to this conclustion(These failures were on disperse-2 of 6=4+2 setup):
[2016-06-10 17:21:44.690734] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-15: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537422 (xid=0x274d7)

[2016-06-10 17:21:44.771235] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-17: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537520 (xid=0x2740b)

[2016-06-10 17:21:44.773164] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-16: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537487 (xid=0x2740b)

[2016-06-10 17:21:44.808576] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7feed0cd5c32] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7feed0aa084e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7feed0aa095e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7feed0aa22ea] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7feed0aa2b18] ))))) 0-ec-nfsganesha-client-14: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-06-10 17:21:44.537377 (xid=0x2740d)

Comment 2 Vijay Bellur 2016-07-27 05:53:04 UTC
REVIEW: http://review.gluster.org/15025 (cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails) posted (#1) for review on release-3.8 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Vijay Bellur 2016-07-29 10:50:52 UTC
COMMIT: http://review.gluster.org/15025 committed in release-3.8 by Xavier Hernandez (xhernandez) 
------
commit e641ac9444d04399761a46ac6b05f28e5231c66e
Author: Pranith Kumar K <pkarampu>
Date:   Sat Jun 11 18:43:42 2016 +0530

    cluster/ec: Unlock stale locks when inodelk/entrylk/lk fails
    
    Thanks to Rafi for hinting a while back that this kind of
    problem he saw once. I didn't think the theory was valid.
    Could have caught it earlier if I had tested his theory.
    
     >Change-Id: Iac6ffcdba2950aa6f8cf94f8994adeed6e6a9c9b
     >BUG: 1344836
     >Signed-off-by: Pranith Kumar K <pkarampu>
     >Reviewed-on: http://review.gluster.org/14703
     >Reviewed-by: Xavier Hernandez <xhernandez>
     >Smoke: Gluster Build System <jenkins.org>
     >Tested-by: mohammed rafi  kc <rkavunga>
     >NetBSD-regression: NetBSD Build System <jenkins.org>
     >CentOS-regression: Gluster Build System <jenkins.org>
    
    BUG: 1360576
    Change-Id: If9ccf0b3db7159b87ddcdc7b20e81cde8c3c76f0
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/15025
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Xavier Hernandez <xhernandez>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 4 Niels de Vos 2016-08-12 09:48:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.2, please open a new bug report.

glusterfs-3.8.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/announce/2016-August/000058.html
[2] https://www.gluster.org/pipermail/gluster-users/