Bug 1417044 - [Perf] : Smallfile appends are 25% slower on replica 3 over FUSE
Summary: [Perf] : Smallfile appends are 25% slower on replica 3 over FUSE
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Ravishankar N
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-27 05:23 UTC by Ambarish
Modified: 2018-09-18 08:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-30 11:25:08 UTC
Embargoed:


Attachments (Terms of Use)
Results for comment #18 (8.94 KB, text/plain)
2017-02-26 13:08 UTC, Ravishankar N
no flags Details

Description Ambarish 2017-01-27 05:23:33 UTC
Description of problem:
------------------------

Smallfile Appends are off target by 25% on the latest 3.2 bits :

*3.1.3* : 1708 files/sec
*3.2/3.8.4-13* : 1265 files/sec

Regression : ~-25%

Version-Release number of selected component (if applicable):
------------------------------------------------------------

glusterfs-server-3.8.4-13.el7rhgs.x86_64

How reproducible:
----------------

Every which way I try.

Actual results:
---------------

smallfile appends are 25% slower.

Expected results:
-----------------

Regression Threshold : 25%

Additional info:
----------------

Volume Name: rep3
Type: Distributed-Replicate
Volume ID: 390622f1-9ff1-4d04-b342-5492254f25fe
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x 3 = 36
Transport-type: tcp
Bricks:
Brick1: gqas005.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas006.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas008.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas005.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick5: gqas006.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick6: gqas008.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick7: gqas005.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick8: gqas006.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick9: gqas008.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick10: gqas005.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick11: gqas006.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick12: gqas008.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick13: gqas005.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick14: gqas006.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick15: gqas008.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick16: gqas005.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick17: gqas006.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick18: gqas008.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick19: gqas005.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick20: gqas006.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick21: gqas008.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick22: gqas005.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick23: gqas006.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick24: gqas008.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick25: gqas005.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick26: gqas006.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick27: gqas008.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick28: gqas005.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick29: gqas006.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick30: gqas008.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick31: gqas005.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick32: gqas006.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick33: gqas008.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick34: gqas005.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick35: gqas006.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick36: gqas008.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-samba-metadata: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
[root@gqas005 ~]#

Comment 3 Ambarish 2017-01-27 06:53:18 UTC
Created attachment 1245018 [details]
3.2_ServerProfile

Comment 5 Ambarish 2017-01-27 11:36:10 UTC
*EXACT WORKLOAD* :

*Drop kernel caches* 

python /small-files/smallfile/smallfile_cli.py --operation  append  --threads 8  --file-size 64 --files 10000  --top <mount point> --host-set <comma separated host list>' ' ','`"


Run creates first to create the data set though :

python /small-files/smallfile/smallfile_cli.py --operation  create  --threads 8  --file-size 64 --files 10000  --top <mount point> --host-set <comma separated host list>' ' ','`"

Comment 6 Ambarish 2017-01-30 05:59:20 UTC
Switching io-threads to off made appends slightly worse - 1192 files/sec

[I thought I already updated this is description for some reason].

Comment 9 Ambarish 2017-02-08 03:10:46 UTC
I know this is not what was asked ,but I could reproduce this on another setup as well..And I see high latencies on LOOKUP on this too..

I'll fetch the straces as soon as I can.

Comment 13 Ambarish 2017-02-08 13:52:09 UTC
I see an increase in the number of GETXATTR calls from 3.1.3 and 3.2 - from 6/8 to almost 600,but not sure if it is big enough to cause a perf regression.

Comment 14 Pranith Kumar K 2017-02-09 09:37:50 UTC
Ambarish,
   I analysed the strace files you gave. Based on my analysis the number of syscalls is low in 3.2.0 but the overall time it takes to execute the syscalls is more in 3.2.0 compared to 3.1.3. And further the time it takes to do lgetxattr of trusted.gfid in 3.2.0 is much more than 3.1.3.

Here is the comparison:
On 3.1.3:
trusted.gfid
62.3767 in seconds
208291 number of calls

On 3.2.0:
trusted.gfid
136.881 in seconds
195520 number of calls

So for some reason even when number of calls is less in 3.2.0 time it takes to getxattr is more with same RHEL version. We should probably involve Brian as well?

Brian,
     What should be debugged further to find RC as to why there is more delay? Could you advise?

Comment 15 Brian Foster 2017-02-09 12:36:21 UTC
(In reply to Pranith Kumar K from comment #14)
...
> 
> Brian,
>      What should be debugged further to find RC as to why there is more
> delay? Could you advise?

If the same rhel version is involved, I assume the tests include the exact same volume configuration as well..? E.g., there are no default volumes that are being added or removed between the two tests?

That aside, there's probably a few approaches you could use to narrow this down. One is to look at ftrace or perf data. I'm not sure how much that would help if the difference is in userspace, however. I suppose you could pull translators out of the the newer configuration (or actually both) one at a time and try and compare to isolate where the regression lives. Finally, if you can't get anywhere with that I suspect a git bisect will point out where the regression is introduced given you can consistently reproduce.

Comment 16 Ravishankar N 2017-02-24 04:23:52 UTC
I ran a scaled down test (--files =1000 instead of 10000) on fedora VMs on my laptop and was able to see a slightly better performance for rhgs 3.2 when compared to rhgs 3.1.3.

Volume=2x3 dist-rep volume using 3 fedora 22 VMs. (with default volume options)
Client= fuse mount on a fedora 24 VM.

Test ran:
1.Create the files: 
#python smallfile_cli.py --operation  create  --threads 8  --file-size 64 --files 1000  --top /mnt/fuse_mount
2. umount, drop caches and remount.
3. Append to the files:
# python smallfile_cli.py --operation  append  --threads 8  --file-size 64 --files 1000  --top /mnt/fuse_mnt


3.1.3 Results from 2 test runs for append:
------------------------------------------
100.00% of requested files processed, minimum is  90.00
159.136881 sec elapsed time
50.271188 files/sec
50.271188 IOPS
3.141949 MB/sec


100.00% of requested files processed, minimum is  90.00
161.230330 sec elapsed time
49.618456 files/sec
49.618456 IOPS
3.101153 MB/sec

3.2 Results from 2 test runs for append:
----------------------------------------
100.00% of requested files processed, minimum is  90.00
137.299699 sec elapsed time
58.266697 files/sec
58.266697 IOPS
3.641669 MB/sec


100.00% of requested files processed, minimum is  90.00
137.465230 sec elapsed time
58.196535 files/sec
58.196535 IOPS
3.637283 MB/sec

So it is around 50 files/sec in 3.1.3 vs 58 files/sec on 3.2

Comment 17 Ravishankar N 2017-02-25 13:11:32 UTC
I ran the tests on the perf machines using a 2x3 setup.

1) 3.1.3 with default volume options
==================================
Did 3 runs:
426.198634 files/sec <---This can be ignored since XFS was freshly formatted.
291.988836 files/sec
299.597521 files/sec



2) rhgs.3.2 with default volume options (HEAD, @ ganesha/scripts : restart pcs cluster during add node)
======================================
Did 3 runs:
300.858783 files/sec
278.304994 files/sec
278.231024 files/sec

3) rhgs=3.2 with the following options:
===========================================
  gluster volume set testvol features.cache-invalidation on
  gluster volume set testvol features.cache-invalidation-timeout 600
  gluster volume set testvol performance.stat-prefetch on
  gluster volume set testvol performance.cache-samba-metadata on     
  gluster volume set testvol performance.cache-invalidation on
  gluster volume set testvol performance.md-cache-timeout 600

Did 1 run:
296.241372 files/sec


Between 1 and 2, the perf drop is ~7%. 
Between 1 and 3, there is no drop.

Will also test on a 12x3 set up tomorrow.

Comment 18 Ravishankar N 2017-02-26 13:01:33 UTC
Ran the tests on a 12x3 dist-rep volume with the following options set:
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
transport.address-family: inet
performance.readdir-ahead: on

1) rhgs-3.1.3: Ran 3 iterations:
1197.813533 files/sec
1101.542113 files/sec
1084.906356 files/sec

Avg=1128

2) rhgs-3.2: Ran 3 iterations:
1162.215698 files/sec
1075.400852 files/sec
1028.074938 files/sec

Avg=1089

3) rhgs-3.2 with the 2 extra md-cache options:(2 iterations)
performance.cache-invalidation: on
performance.md-cache-timeout: 600
1053.610636 files/sec
1044.966673 files/sec

Avg=1049

Perf drop b/w avg values of 1 and 2 = 3.5%
Perf drop b/w avg values of 1 and 3 = 7%

Comment 19 Ravishankar N 2017-02-26 13:08:54 UTC
Created attachment 1257821 [details]
Results for comment #18


Note You need to log in before you can comment on or make changes to this bug.