Bug 1247959

Summary: Statfs is hung because of frame loss in quota
Product: [Community] GlusterFS Reporter: Vijaikumar Mallikarjuna <vmallika>
Component: quotaAssignee: Vijaikumar Mallikarjuna <vmallika>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.4CC: bugs, gluster-bugs, jdarcy, nbalacha, rabhat, rgowdapp, smohan, vmallika
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.6.5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1178619 Environment:
Last Closed: 2015-08-27 13:06:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1178619    
Bug Blocks: 1250544    

Description Vijaikumar Mallikarjuna 2015-07-29 10:50:47 UTC
+++ This bug was initially created as a clone of Bug #1178619 +++

Description of problem:
Rebalance process is hung in statfs call of quota and fails after time out
###################################################################
1. crated a 6x2 dist-rep volume
2. Ran ACA script which does deep directory creation and renaming of
directories and files
3. while script is running did add-brick and rebalance

Result:
Rebalance will be hung for 1800 seconds which is call bail timeout then
it runs to completion


statedump:
--------------
[global.callpool.stack.1.frame.1]
ref_count=1
translator=test-server
complete=0

[global.callpool.stack.1.frame.2]
ref_count=0
translator=test-quota
complete=0
parent=/brick2/test7
wind_from=io_stats_statfs
wind_to=FIRST_CHILD(this)->fops->statfs
unwind_to=io_stats_statfs_cbk

[global.callpool.stack.1.frame.3]
ref_count=1
translator=/brick2/test7
complete=0
parent=test-server
wind_from=server_statfs_resume
wind_to=bound_xl->fops->statfs
unwind_to=server_statfs_cbk


From rebalance logs
===========
[2015-01-03 14:49:59.065353] E [rpc-clnt.c:201:call_bail]
0-test-client-1: bailing out frame type(GlusterFS 3.3) op(STATFS(14)) xid =
0x794 sent = 2015-01-03 14:19:58.397959. timeout = 1800 for
10.70.44.70:49152

Version-Release number of selected component (if applicable):


How reproducible:
When building ancestry fails, it results in frame loss as error is not handled properly. We saw an error log in brick process which said open failed on the same gfid (on which statfs was issued). This open most likely would've been issued as part of Ancestry building code in quota.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Anand Avati on 2015-01-05 01:52:17 EST ---

REVIEW: http://review.gluster.org/9380 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#4) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Anand Avati on 2015-04-30 03:42:31 EDT ---

REVIEW: http://review.gluster.org/9380 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#5) for review on master by Vijaikumar Mallikarjuna (vmallika)

--- Additional comment from Niels de Vos on 2015-05-22 06:21:36 EDT ---

I've dropped this bug from the glusterfs-3.7.1 tracker. Please clone this bug and have the clone depend on 1178619 (this bug) and block "glusterfs-3.7.1".

--- Additional comment from Anand Avati on 2015-05-28 00:23:31 EDT ---

REVIEW: http://review.gluster.org/9380 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 1 Anand Avati 2015-07-29 10:52:34 UTC
REVIEW: http://review.gluster.org/11790 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#1) for review on release-3.6 by Vijaikumar Mallikarjuna (vmallika)

Comment 2 Anand Avati 2015-08-20 09:04:58 UTC
COMMIT: http://review.gluster.org/11790 committed in release-3.6 by Raghavendra Bhat (raghavendra) 
------
commit dfa2bfb289cc73ade0e441f2e2ee88d0d819d48d
Author: vmallika <vmallika>
Date:   Wed Jul 29 16:19:12 2015 +0530

    features/quota: prevent statfs frame-loss when an error happens during
    ancestry building.
    
    This is a backport of http://review.gluster.org/#/c/9380/
    
    We do quota_build_ancestry in function 'quota_get_limit_dir',
    suppose if quota_build_ancestry fails, then we don't have a
    frame saved to continue the statfs FOP and client can hang.
    
    > Change-Id: I92e25c1510d09444b9d4810afdb6b2a69dcd92c0
    > BUG: 1178619
    > Signed-off-by: Raghavendra G <rgowdapp>
    > Signed-off-by: vmallika <vmallika>
    > Reviewed-on: http://review.gluster.org/9380
    > Tested-by: Gluster Build System <jenkins.com>
    
    Change-Id: Ia25cf738250fdc2c766f96c26e3c31093d534aba
    BUG: 1247959
    Signed-off-by: vmallika <vmallika>
    Reviewed-on: http://review.gluster.org/11790
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra Bhat <raghavendra>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 3 Raghavendra Bhat 2015-08-27 13:06:37 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.5, please open a new bug report.

glusterfs-3.6.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-devel/2015-August/046570.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user