1409191 – Sequential and Random Writes are off target by 12% and 22% respectively on EC backed volumes over FUSE

Bug 1409191 - Sequential and Random Writes are off target by 12% and 22% respectively on EC backed volumes over FUSE

Summary: Sequential and Random Writes are off target by 12% and 22% respectively on EC...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ashish Pandey
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1408639 1415160 1419825
TreeView+	depends on / blocked

Reported:	2016-12-30 05:38 UTC by Ashish Pandey
Modified:	2017-05-30 18:38 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.11.0
Clone Of:	1408639
Clones:	1415160 1419825 (view as bug list)
Environment:
Last Closed:	2017-05-30 18:38:06 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Niels de Vos 2017-01-03 12:32:30 UTC

Please provide a public description of the problem.

Comment 2 Ashish Pandey 2017-01-09 07:33:43 UTC

Description of problem:
------------------------

Testbed : 12*(4+2),6 servers,6 workload generating clients.

Benchmark : 3.1.3 with io-threads enabled.

3.2 testing was done with io-threads enabled and mdcache parameters set.

It looks like we have regressed with 3.2 on large file writes/rand writes :

******************
Sequential Writes
******************

3.1.3 : 2838601.16 kB/sec
3.2   : 2506687.55 kB/sec

Regression : ~ 12%


******************
Random Writes
******************

3.1.3 : 617384.17 kB/sec
3.2   : 480226.17 kB/sec

Regression : ~22%

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-10.el7rhgs.x86_64


How reproducible:
------------------

100%

Actual results:
----------------

Regressions on sequential and random large file writes.

Expected results:
-----------------

Regression Threshold is within +-10%

Comment 3 Worker Ant 2017-01-11 17:33:30 UTC

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good file while IO is going on) posted (#1) for review on master by Ashish Pandey (aspandey)

Comment 4 Ashish Pandey 2017-01-11 17:48:44 UTC

Just missed to mention this info 

Possible RCA -  

After implementing patch http://review.gluster.org/#/c/13733/,
before writing on a file we set dirty flag and at the end we remove this flag.
This creates an index entry in .glusterfs/indices/xattrop/ .
which remains there through out write fop. every 60 seconds shd will come up and scan this entry and starts heal, Heal in turn takes a lot of locks to FIND and heal the file. 

Which raises the number of inodelk fop count and could be a possible culprit.

I disabled the shd and  wrote a file -

time dd if=/dev/urandom of=a1 count=1024 bs=1M conv=fdatasync
Profile shows only 4 calls inodelk.

Brick: apandey:/brick/gluster/testvol-6
---------------------------------------
Cumulative Stats:
   Block Size:              32768b+               65536b+ 
 No. of Reads:                    0                     0 
No. of Writes:                 8188                     2 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1     RELEASE
      0.00      47.00 us      47.00 us      47.00 us              1      STATFS
      0.00      49.50 us      46.00 us      53.00 us              2       FLUSH
      0.00      38.00 us      26.00 us      52.00 us              4     INODELK
      0.00      92.50 us      85.00 us     100.00 us              2     XATTROP
      0.00     305.00 us     305.00 us     305.00 us              1      CREATE
      0.00     138.00 us      32.00 us     395.00 us              4    FXATTROP
      0.00     164.14 us     119.00 us     212.00 us              7      LOOKUP
      0.92      72.73 us      43.00 us    8431.00 us           8190       WRITE
     99.08 64142355.00 us 64142355.00 us 64142355.00 us              1       FSYNC


With shd enable it is around 54- 


Brick: apandey:/brick/gluster/testvol-1
---------------------------------------
Cumulative Stats:
   Block Size:              32768b+               65536b+ 
 No. of Reads:                    0                     0 
No. of Writes:                 8190                     1 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              7     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             21  RELEASEDIR
      0.00      30.00 us      30.00 us      30.00 us              1      STATFS
      0.00       5.76 us       2.00 us       9.00 us             21     OPENDIR
      0.00      64.50 us      30.00 us      99.00 us              2       FLUSH
      0.00      23.17 us      20.00 us      27.00 us              6       FSTAT
      0.00      95.50 us      89.00 us     102.00 us              2     XATTROP
      0.00     272.00 us     272.00 us     272.00 us              1      CREATE
      0.00      61.67 us      42.00 us      85.00 us              6        OPEN
      0.00      98.94 us      31.00 us     428.00 us             16    FXATTROP
      0.00      79.92 us      22.00 us     190.00 us             38      LOOKUP
      0.12    2379.48 us    1376.00 us    4600.00 us             42     READDIR
      0.74      74.70 us      42.00 us   49556.00 us           8191       WRITE
     10.29  163490.19 us      19.00 us 1405941.00 us             52     INODELK
     19.02  320668.04 us      26.00 us 15705174.00 us             49    GETXATTR
     69.83 57700430.00 us 57700430.00 us 57700430.00 us              1       FSY

Comment 5 Worker Ant 2017-01-16 08:25:05 UTC

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good file while IO is going on) posted (#2) for review on master by Ashish Pandey (aspandey)

Comment 6 Worker Ant 2017-01-19 12:56:35 UTC

REVIEW: http://review.gluster.org/16377 (cluster/ec: Do not start heal on good file while IO is going on) posted (#3) for review on master by Ashish Pandey (aspandey)

Comment 7 Worker Ant 2017-01-20 12:29:39 UTC

COMMIT: http://review.gluster.org/16377 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 578e9b5b5b45245ed044bab066533411e2141db6
Author: Ashish Pandey <aspandey>
Date:   Wed Jan 11 17:19:30 2017 +0530

    cluster/ec: Do not start heal on good file while IO is going on
    
    Problem:
    Write on a file has been slowed down significantly after
    http://review.gluster.org/#/c/13733/
    
    RC : When update fop starts on a file, it sets dirty flag at
    the start and remove it at the end which make an index entry
    in indices/xattrop. During IO, SHD scans this and finds out
    an index and starts heal even if all the fragments are healthy
    and up tp date. This heal takes inodelk for different types of
    heal. If the IO is for long time this will happen in every 60 seconds.
    Due to this extra, unneccessary locking, IO gets slowed down.
    
    Solution:
    Before starting  any  type of heal check if file needs heal or not.
    
    Change-Id: Ib9519a43e7e4b2565d3f3153f9ca0fb92174fe51
    BUG: 1409191
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/16377
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-by: Xavier Hernandez <xhernandez>

Comment 8 Worker Ant 2017-01-20 12:59:00 UTC

REVIEW: http://review.gluster.org/16444 (cluster/ec: Do not start heal on good file while IO is going on) posted (#1) for review on release-3.9 by Ashish Pandey (aspandey)

Comment 9 Worker Ant 2017-02-07 07:59:24 UTC

REVIEW: https://review.gluster.org/16551 (cluster/ec: Do not start heal on good file while IO is going on) posted (#1) for review on release-3.10 by Xavier Hernandez (xhernandez)

Comment 10 Shyamsundar 2017-05-30 18:38:06 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.