1410025 – Extra lookup/fstats are sent over the network when a brick is down.

Bug 1410025 - Extra lookup/fstats are sent over the network when a brick is down.

Summary: Extra lookup/fstats are sent over the network when a brick is down.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Ravishankar N
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Depends On:	1409206 1412886 1412888 1412890
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2017-01-04 09:00 UTC by Ravishankar N
Modified:	2017-03-23 06:01 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-12
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1409206
Environment:
Last Closed:	2017-03-23 06:01:49 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Ravishankar N 2017-01-04 09:00:20 UTC

+++ This bug was initially created as a clone of Bug #1409206 +++

Description of problem:
When a brick is killed in a replica and `dd` is run, we see a lot of fstats being sent over the network, with a small (but very real) reduction in write throughput.

Throughput and profile info on a random brick when all bricks are up:
==================================================================
root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 2.9847 s, 3.5 MB/s
Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+ 
 No. of Reads:                    0 
No. of Writes:                10240 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
      0.01      30.50 us      27.00 us      34.00 us              2     INODELK
      0.01      70.00 us      70.00 us      70.00 us              1        OPEN
      0.01      35.50 us      19.00 us      52.00 us              2       FLUSH
      0.03      96.50 us      83.00 us     110.00 us              2    GETXATTR
      0.04     253.00 us     253.00 us     253.00 us              1    TRUNCATE
      0.07     225.50 us     202.00 us     249.00 us              2    FXATTROP
      0.15     153.17 us      47.00 us     656.00 us              6      STATFS
      0.17     537.00 us     207.00 us     867.00 us              2     XATTROP
      0.22     685.00 us      22.00 us    1348.00 us              2    FINODELK
      0.62     255.67 us     104.00 us     928.00 us             15      LOOKUP
     98.66      59.72 us      35.00 us    4772.00 us          10240       WRITE
                                                                                                                                                                                             
    Duration: 673 seconds                                                                                                                                                                    
   Data Read: 0 bytes                                                                                                                                                                        
Data Written: 10485760 bytes         


Throughput and profile info of one of the 'up' bricks when one brick is down
===========================================================================
0:root@tuxpad fuse_mnt$ dd if=/dev/zero of=FILE bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 4.24494 s, 2.5 MB/s

Brick: 127.0.0.2:/home/ravi/bricks/brick1
-----------------------------------------
Cumulative Stats:
   Block Size:               1024b+
 No. of Reads:                    0
No. of Writes:                10240
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              2     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              5  RELEASEDIR
      0.01      98.00 us      98.00 us      98.00 us              1        OPEN
      0.01      57.50 us      43.00 us      72.00 us              2     INODELK
      0.01     126.00 us     126.00 us     126.00 us              1    GETXATTR
      0.02     184.00 us     184.00 us     184.00 us              1    TRUNCATE
      0.02     113.00 us     109.00 us     117.00 us              2    FXATTROP
      0.02     122.00 us      16.00 us     228.00 us              2       FLUSH
      0.02     132.00 us      38.00 us     226.00 us              2    FINODELK
      0.08     418.00 us     283.00 us     553.00 us              2     XATTROP
      0.21     763.00 us     122.00 us    1630.00 us              3      LOOKUP
     41.23      44.83 us      36.00 us     490.00 us          10240       WRITE
     58.38      63.47 us      46.00 us     888.00 us          10240       FSTAT

    Duration: 75 seconds
   Data Read: 0 bytes
Data Written: 10485760 bytes

--- Additional comment from Worker Ant on 2016-12-30 04:58:52 EST ---

REVIEW: http://review.gluster.org/16309 (afr: Avoid resetting event_gen when brick is always down) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Ravishankar N 2017-01-04 09:02:58 UTC

Upstream patch http://review.gluster.org/16309

Comment 3 Ravishankar N 2017-01-04 11:39:51 UTC

Hi Karan,
As discussed, please try out the test case (`dd`) in a replicate volume. We would need to see if there is significant performance drop in I/O throughput when one brick of a replica is down for the bug to be considered for 3.2.0.

Comment 7 Ravishankar N 2017-01-13 04:39:36 UTC

Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/94930

Comment 11 errata-xmlrpc 2017-03-23 06:01:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.