Bug 1412888 - Extra lookup/fstats are sent over the network when a brick is down.
Summary: Extra lookup/fstats are sent over the network when a brick is down.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On: 1409206 1412890
Blocks: 1410025 1412886
TreeView+ depends on / blocked
 
Reported: 2017-01-13 04:15 UTC by Ravishankar N
Modified: 2017-02-20 12:33 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.8.9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1409206
Environment:
Last Closed: 2017-02-20 12:33:40 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Ravishankar N 2017-01-13 04:15:57 UTC
+++ This bug was initially created as a clone of Bug #1409206 +++

Description of problem:
When a brick is killed in a replica and `dd` is run, we see a lot of fstats being sent over the network, with a small (but very real) reduction in write throughput.

Throughput and profile info on a random brick when all bricks are up:
==================================================================
root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 2.9847 s, 3.5 MB/s
Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+ 
 No. of Reads:                    0 
No. of Writes:                10240 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.01      30.50 us      27.00 us      34.00 us              2     INODELK
      0.01      70.00 us      70.00 us      70.00 us              1        OPEN
      0.01      35.50 us      19.00 us      52.00 us              2       FLUSH
      0.03      96.50 us      83.00 us     110.00 us              2    GETXATTR
      0.04     253.00 us     253.00 us     253.00 us              1    TRUNCATE
      0.07     225.50 us     202.00 us     249.00 us              2    FXATTROP
      0.15     153.17 us      47.00 us     656.00 us              6      STATFS
      0.17     537.00 us     207.00 us     867.00 us              2     XATTROP
      0.22     685.00 us      22.00 us    1348.00 us              2    FINODELK
      0.62     255.67 us     104.00 us     928.00 us             15      LOOKUP
     98.66      59.72 us      35.00 us    4772.00 us          10240       WRITE
                                                                                                                                                                                             
    Duration: 673 seconds                                                                                                                                                                    
   Data Read: 0 bytes                                                                                                                                                                        
Data Written: 10485760 bytes         


Throughput and profile info of one of the 'up' bricks when one brick is down
===========================================================================
0:root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 4.24494 s, 2.5 MB/s

Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+
 No. of Reads:                    0
No. of Writes:                10240
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              5  RELEASEDIR
      0.01      98.00 us      98.00 us      98.00 us              1        OPEN
      0.01      57.50 us      43.00 us      72.00 us              2     INODELK
      0.01     126.00 us     126.00 us     126.00 us              1    GETXATTR
      0.02     184.00 us     184.00 us     184.00 us              1    TRUNCATE
      0.02     113.00 us     109.00 us     117.00 us              2    FXATTROP
      0.02     122.00 us      16.00 us     228.00 us              2       FLUSH
      0.02     132.00 us      38.00 us     226.00 us              2    FINODELK
      0.08     418.00 us     283.00 us     553.00 us              2     XATTROP
      0.21     763.00 us     122.00 us    1630.00 us              3      LOOKUP
     41.23      44.83 us      36.00 us     490.00 us          10240       WRITE
     58.38      63.47 us      46.00 us     888.00 us          10240       FSTAT

    Duration: 75 seconds
   Data Read: 0 bytes
Data Written: 10485760 bytes

--- Additional comment from Worker Ant on 2016-12-30 04:58:52 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#1) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Worker Ant on 2017-01-09 12:12:20 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#2) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Worker Ant on 2017-01-12 00:14:55 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#3) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Worker Ant on 2017-01-12 01:56:57 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#4) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Worker Ant on 2017-01-12 21:54:07 EST ---

COMMIT: http://review.gluster.org/16309 committed in master by Jeff Darcy (jdarcy@redhat.com) 
------
commit 522640be476a3f97dac932f7046f0643ec0ec2f2
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Fri Dec 30 14:57:17 2016 +0530

    afr: Avoid resetting event_gen when brick is always down
    
    Problem:
    __afr_set_in_flight_sb_status(), which resets event_gen to zero, is
    called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
    is true even if the brick was down *before* the transaction started.
    Hence say if 1 brick is down in  a replica-3, every writev that comes
    will trigger an inode refresh because of this resetting, as seen from
    the no. of FSTATs in the profile info in the BZ.
    
    Fix:
    Reset event gen only if the brick was previously a valid read child and
    the FOP failed on it the first time.
    
    Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
    the function only resets event gen and not the data/metadata readable.
    
    Change-Id: I603ae646cbde96995c35db77916e2ed80b602a91
    BUG: 1409206
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-on: http://review.gluster.org/16309
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Comment 1 Worker Ant 2017-01-13 04:16:56 UTC
REVIEW: http://review.gluster.org/16386 (afr: Avoid resetting event_gen when brick is always down) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar@redhat.com)

Comment 2 Worker Ant 2017-01-17 10:32:01 UTC
COMMIT: http://review.gluster.org/16386 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 7969610cd129dacf3074dfec67abf1871e04c82c
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Fri Dec 30 14:57:17 2016 +0530

    afr: Avoid resetting event_gen when brick is always down
    
    Problem:
    __afr_set_in_flight_sb_status(), which resets event_gen to zero, is
    called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
    is true even if the brick was down *before* the transaction started.
    Hence say if 1 brick is down in  a replica-3, every writev that comes
    will trigger an inode refresh because of this resetting, as seen from
    the no. of FSTATs in the profile info in the BZ.
    
    Fix:
    Reset event gen only if the brick was previously a valid read child and
    the FOP failed on it the first time.
    
    Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
    the function only resets event gen and not the data/metadata readable.
    
    > Reviewed-on: http://review.gluster.org/16309
    > Smoke: Gluster Build System <jenkins@build.gluster.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    > Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    (cherry picked from commit 522640be476a3f97dac932f7046f0643ec0ec2f2)
    
    Change-Id: I603ae646cbde96995c35db77916e2ed80b602a91
    BUG: 1412888
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-on: http://review.gluster.org/16386
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Smoke: Gluster Build System <jenkins@build.gluster.org>

Comment 3 Niels de Vos 2017-02-20 12:33:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.9, please open a new bug report.

glusterfs-3.8.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-February/000066.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.