1409206 – Extra lookup/fstats are sent over the network when a brick is down.

Bug 1409206 - Extra lookup/fstats are sent over the network when a brick is down.

Summary: Extra lookup/fstats are sent over the network when a brick is down.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1410025 1412886 1412888 1412890
TreeView+	depends on / blocked

Reported:	2016-12-30 09:58 UTC by Ravishankar N
Modified:	2017-03-06 17:41 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.10.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1410025 1412886 1412888 1412890 (view as bug list)
Environment:
Last Closed:	2017-03-06 17:41:22 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2016-12-30 09:58:05 UTC

Description of problem:
When a brick is killed in a replica and `dd` is run, we see a lot of fstats being sent over the network, with a small (but very real) reduction in write throughput.

Throughput and profile info on a random brick when all bricks are up:
==================================================================
root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 2.9847 s, 3.5 MB/s
Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+ 
 No. of Reads:                    0 
No. of Writes:                10240 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.01      30.50 us      27.00 us      34.00 us              2     INODELK
      0.01      70.00 us      70.00 us      70.00 us              1        OPEN
      0.01      35.50 us      19.00 us      52.00 us              2       FLUSH
      0.03      96.50 us      83.00 us     110.00 us              2    GETXATTR
      0.04     253.00 us     253.00 us     253.00 us              1    TRUNCATE
      0.07     225.50 us     202.00 us     249.00 us              2    FXATTROP
      0.15     153.17 us      47.00 us     656.00 us              6      STATFS
      0.17     537.00 us     207.00 us     867.00 us              2     XATTROP
      0.22     685.00 us      22.00 us    1348.00 us              2    FINODELK
      0.62     255.67 us     104.00 us     928.00 us             15      LOOKUP
     98.66      59.72 us      35.00 us    4772.00 us          10240       WRITE
                                                                                                                                                                                             
    Duration: 673 seconds                                                                                                                                                                    
   Data Read: 0 bytes                                                                                                                                                                        
Data Written: 10485760 bytes         


Throughput and profile info of one of the 'up' bricks when one brick is down
===========================================================================
0:root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 4.24494 s, 2.5 MB/s

Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+
 No. of Reads:                    0
No. of Writes:                10240
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              5  RELEASEDIR
      0.01      98.00 us      98.00 us      98.00 us              1        OPEN
      0.01      57.50 us      43.00 us      72.00 us              2     INODELK
      0.01     126.00 us     126.00 us     126.00 us              1    GETXATTR
      0.02     184.00 us     184.00 us     184.00 us              1    TRUNCATE
      0.02     113.00 us     109.00 us     117.00 us              2    FXATTROP
      0.02     122.00 us      16.00 us     228.00 us              2       FLUSH
      0.02     132.00 us      38.00 us     226.00 us              2    FINODELK
      0.08     418.00 us     283.00 us     553.00 us              2     XATTROP
      0.21     763.00 us     122.00 us    1630.00 us              3      LOOKUP
     41.23      44.83 us      36.00 us     490.00 us          10240       WRITE
     58.38      63.47 us      46.00 us     888.00 us          10240       FSTAT

    Duration: 75 seconds
   Data Read: 0 bytes
Data Written: 10485760 bytes

Comment 1 Worker Ant 2016-12-30 09:58:52 UTC

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-01-09 17:12:20 UTC

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 3 Worker Ant 2017-01-12 05:14:55 UTC

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#3) for review on master by Ravishankar N (ravishankar)

Comment 4 Worker Ant 2017-01-12 06:56:57 UTC

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#4) for review on master by Ravishankar N (ravishankar)

Comment 5 Worker Ant 2017-01-13 02:54:07 UTC

COMMIT: http://review.gluster.org/16309 committed in master by Jeff Darcy (jdarcy) 
------
commit 522640be476a3f97dac932f7046f0643ec0ec2f2
Author: Ravishankar N <ravishankar>
Date:   Fri Dec 30 14:57:17 2016 +0530

    afr: Avoid resetting event_gen when brick is always down
    
    Problem:
    __afr_set_in_flight_sb_status(), which resets event_gen to zero, is
    called if failed_subvols[i] is non-zero for any brick. But failed_subvols[i]
    is true even if the brick was down *before* the transaction started.
    Hence say if 1 brick is down in  a replica-3, every writev that comes
    will trigger an inode refresh because of this resetting, as seen from
    the no. of FSTATs in the profile info in the BZ.
    
    Fix:
    Reset event gen only if the brick was previously a valid read child and
    the FOP failed on it the first time.
    
    Also `s/afr_inode_read_subvol_reset/afr_inode_event_gen_reset` because
    the function only resets event gen and not the data/metadata readable.
    
    Change-Id: I603ae646cbde96995c35db77916e2ed80b602a91
    BUG: 1409206
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/16309
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 6 Shyamsundar 2017-03-06 17:41:22 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.