1417043 – [Perf] : 30% regression on large file sequential reads on replica 3 over FUSE mounts

Bug 1417043 - [Perf] : 30% regression on large file sequential reads on replica 3 over FUSE mounts

Summary: [Perf] : 30% regression on large file sequential reads on replica 3 over FUSE...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ravishankar N
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-27 05:14 UTC by Ambarish
Modified:	2017-03-28 06:49 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-02-08 03:59:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ambarish 2017-01-27 05:14:59 UTC

Description of problem:
-----------------------

With 3.2 bits , it seems a regression was introduced on large file sequential reads specific to replica 3.

*3.1.3 GA bits* : 1568891.33 kB/sec

*3.8.4-13 bits* : 1093736.45 kB/sec

Regression : -30%

I do not see a problem with 2 way replica though.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-server-3.8.4-13.el7rhgs.x86_64


How reproducible:
----------------

Every which way I try.



Actual results:
---------------

30% regression on large file reads on 3.2.

Expected results:
-----------------

Regression Threshold is within 10%

Additional info:
----------------

Volume Name: rep3
Type: Distributed-Replicate
Volume ID: 390622f1-9ff1-4d04-b342-5492254f25fe
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 3 = 36
Transport-type: tcp
Bricks:
Brick1: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick5: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick6: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick7: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick8: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick9: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick10: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick11: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick12: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick13: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick14: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick15: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick16: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick17: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick18: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick19: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick20: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick21: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick22: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick23: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick24: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick25: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick26: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick27: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick28: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick29: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick30: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick31: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick32: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick33: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick34: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick35: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick36: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
[root@gqas005 ~]#

Comment 2 Ambarish 2017-01-27 05:17:52 UTC

On switching off io-threads ,the magnitude of regression mellowed down,but still not within acceptable limits :

*3.1.3* : 1568891.33 kB/sec

*3.8.4-13.io-threads off* : 1192683.07 kB/sec

Regression : -23%

Comment 8 Ravishankar N 2017-01-31 11:53:26 UTC

I stripped the cumulative profile info numbers in to separate files. 
(cumu-3.1.3.txt, cumu-3.2-iot-on.txt and cumu-3.2-iot-off.txt)

Some observations:
-------------------------------------------------------------------------
1) The sum total no. of READ calls made on all bricks in 3.1.3 > 3.2-without-io-threads > 3.2-with-io-threads. Despite the same workload and file size, 3.1.3 had more number of reads but still better performance.

$ grep -re "No. of Reads" cumu-3.1.3.txt|cut -d ":"  -f2|tr -d [:blank:]|awk '{ sum+=$0} END {print sum}'
691474

$ grep -re "No. of Reads" cumu-3.2-iot-off.txt|cut -d ":"  -f2|tr -d [:blank:]|awk '{ sum+=$0} END {print sum}'
546177

$ grep -re "No. of Reads" cumu-3.2-iot-on.txt|cut -d ":"  -f2|tr -d [:blank:]|awk '{ sum+=$0} END {print sum}'
491279
-------------------------------------------------------------------------
2). The Average-latency (in micro secs) of READs in 3.2 setup seems to be more, possibly reducing the performance. 3.1.3 had one occurrence of more than 2500us while 3.2 ones have many more.

$ grep -w READ cumu-3.1.3.txt | awk '{print $2}'|sort
2660.47
482.76
485.19
524.39
632.65
654.08
654.81
673.18
674.24
685.41
864.87

$ grep -w READ cumu-3.2-iot-off.txt | awk '{print $2}'|sort
2735.70
2801.96
2856.74
566.85
641.55
675.00
678.89
681.29
699.62

$ grep -w READ cumu-3.2-iot-on.txt | awk '{print $2}'|sort
2712.74
2737.09
2751.98
2764.30
560.78
583.90
617.89
788.29

We need to find why there is higher latency of READS in the 3.2 runs
-------------------------------------------------------------------------
Ambarish

1. Could you provide the strace output of all the bricks when you run the workload (in 3.1.3 and 3.2)? This can give us some idea of the time spent in syscalls etc.

2. For the sake of easier debugging, is it possible to see if we get similar regressions in a smaller setup? (say 2x3 volume, single reader workload).

Comment 9 Ravishankar N 2017-01-31 11:54:44 UTC

Created attachment 1246197 [details]
stripped cumulative profile infos from the original attachments.

Comment 10 Ambarish 2017-01-31 12:00:09 UTC

Cleared need info on me by mistake

Comment 11 Ambarish 2017-02-01 19:52:21 UTC

A solid 27% regression on large file reads on 2*3 :

3.1.3 : 801896 kB/sec

3.2 : 583641 kB/sec

IOT was off in both cases,FYI.

Comment 14 Ravishankar N 2017-02-07 07:31:45 UTC

Observations from the attachments in comment #12 and comment #13.

0. volinfo was not available but I've assumed the 2 replica-3 subvols are made up of the following bricks:

Brick: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick: gqas014.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick: gqas015.sbu.lab.eng.bos.redhat.com:/bricks1/brick

Brick: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick: gqas014.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick: gqas015.sbu.lab.eng.bos.redhat.com:/bricks2/brick

1. In this run also, the no. of READs (from the profile info) for the same test is more in 3.1.3 than 3.2:

$grep -re "No. of Reads" 3.1.3-iot-off.txt |awk '{print $4}'|awk '{ sum+=$0} END {print sum}'
1118115

$ grep -re "No. of Reads" 3.2-iot-off.txt |awk '{print $4}'|awk '{ sum+=$0} END {print sum}'
674470

2. The time taken by posix xlator for the pread syscall is significantly more on gqas014 for 3.2 compared to 3.1.3:

Highest 10 values for 3.1.3:
grep -w pread 313/gqas010/*.txt*|awk '{print $8}'|tr '<' ' '|tr '>' ' '|sort|tail -10
 0.032043
 0.032882
 0.033726
 0.033868
 0.034844
 0.039520
 0.039931
 0.046005
 0.055675
 0.069946

Highest 10 values for 3.2
$ grep -w pread 32/gqas010/*.txt*|awk '{print $8}'|tr '<' ' '|tr '>' ' '|sort|tail -10
 0.417160
 0.434892
 0.436256
 0.440567
 0.457439
 0.457820
 0.494122
 0.495383
 0.503378
 0.504748

The values on the other 2 nodes gqas014 and gqas015 are in the same 'range', although a bit higher for 3.2.
$ grep -w pread 313/gqas014/*.txt*|awk '{print $8}'|tr '<' ' '|tr '>' ' '|sort|tail -10
 0.247469
 0.250882
 0.266613
 0.282624
 0.283825
 0.284675
 0.304581
 0.305631
 0.323572
 0.323947

$ grep -w pread 32/gqas014/*.txt*|awk '{print $8}'|tr '<' ' '|tr '>' ' '|sort|tail -10
 0.531700
 0.537332
 0.549611
 0.569431
 0.576682
 0.583793
 0.587856
 0.590194
 0.590830
 0.663191

$ grep -w pread 313/gqas015/*.txt*|awk '{print $8}'|tr '<' ' '|tr '>' ' '|sort|tail -10
 0.640470
 0.701599
 0.704544
 0.710213
 0.710533
 0.726115
 0.726428
 0.770418
 0.976238
 0.976581

$ grep -w pread 32/gqas015/*.txt*|awk '{print $8}'|tr '<' ' '|tr '>' ' '|sort|tail -10
 0.800597
 0.800757
 0.881200
 0.884361
 0.986186
 0.986745
 1.005609
 1.005773
 1.011861
 1.011967
----------------------------------------------------------------------
I have requested Ambarish to try out the test on a different set of machines, both with and without perf xlators, for single and multi-writer scenarios, to eliminate possible issues with hardware.

Comment 15 Ravishankar N 2017-02-07 13:09:04 UTC

(In reply to Ravishankar N from comment #14)

> 2. The time taken by posix xlator for the pread syscall is significantly
> more on gqas014 for 3.2 compared to 3.1.3:

Sorry typo. I meant gqas010 instead of gqas014

Comment 16 Ambarish 2017-02-08 03:08:59 UTC

I could not repro this on another setup.

Comment 17 Atin Mukherjee 2017-02-08 03:59:31 UTC

Based on comment 16, I am closing the bug, please reopen if the issue is hit again.

Note You need to log in before you can comment on or make changes to this bug.