1419825 – Sequential and Random Writes are off target by 12% and 22% respectively on EC backed volumes over FUSE

Bug 1419825 - Sequential and Random Writes are off target by 12% and 22% respectively on EC backed volumes over FUSE

Summary: Sequential and Random Writes are off target by 12% and 22% respectively on EC...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	3.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Xavi Hernandez
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1409191
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-07 07:57 UTC by Xavi Hernandez
Modified:	2017-03-06 17:45 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:	1409191
Environment:
Last Closed:	2017-03-06 17:45:31 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Xavi Hernandez 2017-02-07 07:57:39 UTC

+++ This bug was initially created as a clone of Bug #1409191 +++

Please provide a public description of the problem.

--- Additional comment from Ashish Pandey on 2017-01-09 08:33:43 CET ---

Description of problem:
------------------------

Testbed : 12*(4+2),6 servers,6 workload generating clients.

Benchmark : 3.1.3 with io-threads enabled.

3.2 testing was done with io-threads enabled and mdcache parameters set.

It looks like we have regressed with 3.2 on large file writes/rand writes :

******************
Sequential Writes
******************

3.1.3 : 2838601.16 kB/sec
3.2   : 2506687.55 kB/sec

Regression : ~ 12%


******************
Random Writes
******************

3.1.3 : 617384.17 kB/sec
3.2   : 480226.17 kB/sec

Regression : ~22%

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-10.el7rhgs.x86_64


How reproducible:
------------------

100%

Actual results:
----------------

Regressions on sequential and random large file writes.

Expected results:
-----------------

Regression Threshold is within +-10%

--- Additional comment from Worker Ant on 2017-01-11 18:33:30 CET ---

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good file while IO is going on) posted (#1) for review on master by Ashish Pandey (aspandey)

--- Additional comment from Ashish Pandey on 2017-01-11 18:48:44 CET ---

Just missed to mention this info 

Possible RCA -  

After implementing patch http://review.gluster.org/#/c/13733/,
before writing on a file we set dirty flag and at the end we remove this flag.
This creates an index entry in .glusterfs/indices/xattrop/ .
which remains there through out write fop. every 60 seconds shd will come up and scan this entry and starts heal, Heal in turn takes a lot of locks to FIND and heal the file. 

Which raises the number of inodelk fop count and could be a possible culprit.

I disabled the shd and  wrote a file -

time dd if=/dev/urandom of=a1 count=1024 bs=1M conv=fdatasync
Profile shows only 4 calls inodelk.

Brick: apandey:/brick/gluster/testvol-6
---------------------------------------
Cumulative Stats:
   Block Size:              32768b+               65536b+ 
 No. of Reads:                    0                     0 
No. of Writes:                 8188                     2 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      47.00 us      47.00 us      47.00 us              1      STATFS
      0.00      49.50 us      46.00 us      53.00 us              2       FLUSH
      0.00      38.00 us      26.00 us      52.00 us              4     INODELK
      0.00      92.50 us      85.00 us     100.00 us              2     XATTROP
      0.00     305.00 us     305.00 us     305.00 us              1      CREATE
      0.00     138.00 us      32.00 us     395.00 us              4    FXATTROP
      0.00     164.14 us     119.00 us     212.00 us              7      LOOKUP
      0.92      72.73 us      43.00 us    8431.00 us           8190       WRITE
     99.08 64142355.00 us 64142355.00 us 64142355.00 us              1       FSYNC


With shd enable it is around 54- 


Brick: apandey:/brick/gluster/testvol-1
---------------------------------------
Cumulative Stats:
   Block Size:              32768b+               65536b+ 
 No. of Reads:                    0                     0 
No. of Writes:                 8190                     1 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              7     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             21  RELEASEDIR
      0.00      30.00 us      30.00 us      30.00 us              1      STATFS
      0.00       5.76 us       2.00 us       9.00 us             21     OPENDIR
      0.00      64.50 us      30.00 us      99.00 us              2       FLUSH
      0.00      23.17 us      20.00 us      27.00 us              6       FSTAT
      0.00      95.50 us      89.00 us     102.00 us              2     XATTROP
      0.00     272.00 us     272.00 us     272.00 us              1      CREATE
      0.00      61.67 us      42.00 us      85.00 us              6        OPEN
      0.00      98.94 us      31.00 us     428.00 us             16    FXATTROP
      0.00      79.92 us      22.00 us     190.00 us             38      LOOKUP
      0.12    2379.48 us    1376.00 us    4600.00 us             42     READDIR
      0.74      74.70 us      42.00 us   49556.00 us           8191       WRITE
     10.29  163490.19 us      19.00 us 1405941.00 us             52     INODELK
     19.02  320668.04 us      26.00 us 15705174.00 us             49    GETXATTR
     69.83 57700430.00 us 57700430.00 us 57700430.00 us              1       FSY

--- Additional comment from Worker Ant on 2017-01-16 09:25:05 CET ---

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good file while IO is going on) posted (#2) for review on master by Ashish Pandey (aspandey)

--- Additional comment from Worker Ant on 2017-01-19 13:56:35 CET ---

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good file while IO is going on) posted (#3) for review on master by Ashish Pandey (aspandey)

--- Additional comment from Worker Ant on 2017-01-20 13:29:39 CET ---

COMMIT: http://review.gluster.org/16377 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 578e9b5b5b45245ed044bab066533411e2141db6
Author: Ashish Pandey <aspandey>
Date:   Wed Jan 11 17:19:30 2017 +0530

    cluster/ec: Do not start heal on good file while IO is going on
    
    Problem:
    Write on a file has been slowed down significantly after
    http://review.gluster.org/#/c/13733/
    
    RC : When update fop starts on a file, it sets dirty flag at
    the start and remove it at the end which make an index entry
    in indices/xattrop. During IO, SHD scans this and finds out
    an index and starts heal even if all the fragments are healthy
    and up tp date. This heal takes inodelk for different types of
    heal. If the IO is for long time this will happen in every 60 seconds.
    Due to this extra, unneccessary locking, IO gets slowed down.
    
    Solution:
    Before starting  any  type of heal check if file needs heal or not.
    
    Change-Id: Ib9519a43e7e4b2565d3f3153f9ca0fb92174fe51
    BUG: 1409191
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/16377
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-by: Xavier Hernandez <xhernandez>

--- Additional comment from Worker Ant on 2017-01-20 13:59:00 CET ---

REVIEW: http://review.gluster.org/16444 (cluster/ec: Do not start heal on good file while IO is going on) posted (#1) for review on release-3.9 by Ashish Pandey (aspandey)

Comment 1 Worker Ant 2017-02-07 08:00:46 UTC

REVIEW: https://review.gluster.org/16552 (cluster/ec: Do not start heal on good file while IO is going on) posted (#1) for review on release-3.10 by Xavier Hernandez (xhernandez)

Comment 2 Worker Ant 2017-02-07 11:49:30 UTC

COMMIT: https://review.gluster.org/16552 committed in release-3.10 by Shyamsundar Ranganathan (srangana) 
------
commit e404eae2c22fbac2fdbfd4cb695b692a8c1ff81a
Author: Ashish Pandey <aspandey>
Date:   Wed Jan 11 17:19:30 2017 +0530

    cluster/ec: Do not start heal on good file while IO is going on
    
    Problem:
    Write on a file has been slowed down significantly after
    http://review.gluster.org/#/c/13733/
    
    RC : When update fop starts on a file, it sets dirty flag at
    the start and remove it at the end which make an index entry
    in indices/xattrop. During IO, SHD scans this and finds out
    an index and starts heal even if all the fragments are healthy
    and up tp date. This heal takes inodelk for different types of
    heal. If the IO is for long time this will happen in every 60 seconds.
    Due to this extra, unneccessary locking, IO gets slowed down.
    
    Solution:
    Before starting  any  type of heal check if file needs heal or not.
    
    > Change-Id: Ib9519a43e7e4b2565d3f3153f9ca0fb92174fe51
    > BUG: 1409191
    > Signed-off-by: Ashish Pandey <aspandey>
    > Reviewed-on: http://review.gluster.org/16377
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    > Reviewed-by: Xavier Hernandez <xhernandez>
    
    Change-Id: I1a66aca626156164555a7a99a4f715c54c87e9a9
    BUG: 1419825
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: https://review.gluster.org/16552
    Tested-by: Xavier Hernandez <xhernandez>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 3 Shyamsundar 2017-03-06 17:45:31 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.