Bug 1412890

Summary: Extra lookup/fstats are sent over the network when a brick is down.
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.15CC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.20 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1409206 Environment:
Last Closed: 2017-02-01 11:38:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1409206    
Bug Blocks: 1410025, 1412886, 1412888    

Description Ravishankar N 2017-01-13 04:21:16 UTC
+++ This bug was initially created as a clone of Bug #1409206 +++

Description of problem:
When a brick is killed in a replica and `dd` is run, we see a lot of fstats being sent over the network, with a small (but very real) reduction in write throughput.

Throughput and profile info on a random brick when all bricks are up:
==================================================================
root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 2.9847 s, 3.5 MB/s
Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+ 
 No. of Reads:                    0 
No. of Writes:                10240 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.01      30.50 us      27.00 us      34.00 us              2     INODELK
      0.01      70.00 us      70.00 us      70.00 us              1        OPEN
      0.01      35.50 us      19.00 us      52.00 us              2       FLUSH
      0.03      96.50 us      83.00 us     110.00 us              2    GETXATTR
      0.04     253.00 us     253.00 us     253.00 us              1    TRUNCATE
      0.07     225.50 us     202.00 us     249.00 us              2    FXATTROP
      0.15     153.17 us      47.00 us     656.00 us              6      STATFS
      0.17     537.00 us     207.00 us     867.00 us              2     XATTROP
      0.22     685.00 us      22.00 us    1348.00 us              2    FINODELK
      0.62     255.67 us     104.00 us     928.00 us             15      LOOKUP
     98.66      59.72 us      35.00 us    4772.00 us          10240       WRITE
                                                                                                                                                                                             
    Duration: 673 seconds                                                                                                                                                                    
   Data Read: 0 bytes                                                                                                                                                                        
Data Written: 10485760 bytes         


Throughput and profile info of one of the 'up' bricks when one brick is down
===========================================================================
0:root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 4.24494 s, 2.5 MB/s

Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+
 No. of Reads:                    0
No. of Writes:                10240
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              5  RELEASEDIR
      0.01      98.00 us      98.00 us      98.00 us              1        OPEN
      0.01      57.50 us      43.00 us      72.00 us              2     INODELK
      0.01     126.00 us     126.00 us     126.00 us              1    GETXATTR
      0.02     184.00 us     184.00 us     184.00 us              1    TRUNCATE
      0.02     113.00 us     109.00 us     117.00 us              2    FXATTROP
      0.02     122.00 us      16.00 us     228.00 us              2       FLUSH
      0.02     132.00 us      38.00 us     226.00 us              2    FINODELK
      0.08     418.00 us     283.00 us     553.00 us              2     XATTROP
      0.21     763.00 us     122.00 us    1630.00 us              3      LOOKUP
     41.23      44.83 us      36.00 us     490.00 us          10240       WRITE
     58.38      63.47 us      46.00 us     888.00 us          10240       FSTAT

    Duration: 75 seconds
   Data Read: 0 bytes
Data Written: 10485760 bytes

--- Additional comment from Worker Ant on 2016-12-30 04:58:52 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-01-09 12:12:20 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#2) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-01-12 00:14:55 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#3) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-01-12 01:56:57 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#4) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2017-01-12 21:54:07 EST ---

COMMIT: http://review.gluster.org/16309 committed in master by Jeff Darcy (jdarcy) 
------
commit 522640be476a3f97dac932f7046f0643ec0ec2f2
Author: Ravishankar N <ravishankar>
Date:   Fri Dec 30 14:57:17 2016 +0530

    afr: Avoid resetting event_gen when brick is always down
    
    Problem:
    __afr_set_in_flight_sb_status(), which resets event_gen to zero, is
    called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
    is true even if the brick was down *before* the transaction started.
    Hence say if 1 brick is down in  a replica-3, every writev that comes
    will trigger an inode refresh because of this resetting, as seen from
    the no. of FSTATs in the profile info in the BZ.
    
    Fix:
    Reset event gen only if the brick was previously a valid read child and
    the FOP failed on it the first time.
    
    Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
    the function only resets event gen and not the data/metadata readable.
    
    Change-Id: I603ae646cbde96995c35db77916e2ed80b602a91
    BUG: 1409206
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/16309
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 1 Worker Ant 2017-01-13 04:28:53 UTC
REVIEW: http://review.gluster.org/16387 (afr: Avoid resetting event_gen when brick is always downa) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-01-13 06:29:20 UTC
REVIEW: http://review.gluster.org/16387 (afr: Avoid resetting event_gen when brick is always down) posted (#2) for review on release-3.7 by Ravishankar N (ravishankar)

Comment 3 Worker Ant 2017-01-14 04:00:00 UTC
COMMIT: http://review.gluster.org/16387 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit ef341819eaedb703bc2f7bc1cd2e5ac855fed42b
Author: Ravishankar N <ravishankar>
Date:   Fri Jan 13 09:55:49 2017 +0530

    afr: Avoid resetting event_gen when brick is always down
    
    Backport of http://review.gluster.org/#/c/16309/
    
    Problem:
    __afr_set_in_flight_sb_status(), which resets event_gen to zero, is
    called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
    is true even if the brick was down *before* the transaction started.
    Hence say if 1 brick is down in  a replica-3, every writev that comes
    will trigger an inode refresh because of this resetting, as seen from
    the no. of FSTATs in the profile info in the BZ.
    
    Fix:
    Reset event gen only if the brick was previously a valid read child and
    the FOP failed on it the first time.
    
    Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
    the function only resets event gen and not the data/metadata readable.
    
    Change-Id: I7840f7123d3b3e0404743988088ec349391ca980
    BUG: 1412890
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/16387
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 4 Kaushal 2017-02-01 11:38:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.20, please open a new bug report.

glusterfs-3.7.20 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/gluster-devel/2017-January/052010.html
[2] https://www.gluster.org/pipermail/gluster-users/