Bug 1406723
Summary: | [Perf] : significant Performance regression seen with disperse volume when compared with 3.1.3 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | ||||||
Component: | disperse | Assignee: | Ashish Pandey <aspandey> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Ambarish <asoman> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rhgs-3.2 | CC: | amukherj, asoman, aspandey, pkarampu, rcyriac, rhinduja, rhs-bugs, storage-qa-internal | ||||||
Target Milestone: | --- | Keywords: | Regression | ||||||
Target Release: | RHGS 3.2.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.8.4-17 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1408809 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-03-23 05:58:45 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1351528, 1408809 | ||||||||
Attachments: |
|
Description
Nag Pavan Chilakam
2016-12-21 10:37:56 UTC
below are the numbers (in both cases, quotas was enabled, uss was turned on) 2x(4+2) volume on rhel7.3 File Operation 3.1.3 3.2 touch to create new file 10000 files 1min 12sec 3min 9sec linux untar of kernel image 4.9 25min 23Sec 43min 15sec ls -lRt of untarred directory 51Sec 59.3Sec rm -rf of the 10k files 50sec 1min 15sec Stat * of the folder hosting 10000 files 8sec 14sec Operation 3.1.3 3.2 Drop in Performance touch 72 189 61.9047619048 untar 1523 2595 41.3102119461 ls 51 59 13.5593220339 rm -rf 50 75 33.3333333333 stat 8 14 42.8571428571 3.1.3 numbers: Was run in 3.2 time frame Setup info Ec volume build (3.7.9-12) 2nd async build after GA rhel 7.3 6 VMs each of 8GB Client: 16GB RHEL7.3 root@dhcp35-37 ~]# gluster v info Volume Name: disperse Type: Distributed-Disperse Volume ID: ccede272-2cde-4b55-be94-51581289eb56 Status: Started Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick2/disperse Brick2: 10.70.35.116:/rhs/brick2/disperse Brick3: 10.70.35.239:/rhs/brick2/disperse Brick4: 10.70.35.135:/rhs/brick2/disperse Brick5: 10.70.35.8:/rhs/brick2/disperse Brick6: 10.70.35.196:/rhs/brick2/disperse Brick7: 10.70.35.37:/rhs/brick3/disperse Brick8: 10.70.35.116:/rhs/brick3/disperse Brick9: 10.70.35.239:/rhs/brick3/disperse Brick10: 10.70.35.135:/rhs/brick3/disperse Brick11: 10.70.35.8:/rhs/brick3/disperse Brick12: 10.70.35.196:/rhs/brick3/disperse Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on [root@dhcp35-37 ~]# gluster v status Status of volume: disperse Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.37:/rhs/brick2/disperse 49154 0 Y 22200 Brick 10.70.35.116:/rhs/brick2/disperse 49154 0 Y 21974 Brick 10.70.35.239:/rhs/brick2/disperse 49154 0 Y 21982 Brick 10.70.35.135:/rhs/brick2/disperse 49154 0 Y 21966 Brick 10.70.35.8:/rhs/brick2/disperse 49154 0 Y 21998 Brick 10.70.35.196:/rhs/brick2/disperse 49154 0 Y 21999 Brick 10.70.35.37:/rhs/brick3/disperse 49155 0 Y 22219 Brick 10.70.35.116:/rhs/brick3/disperse 49155 0 Y 21993 Brick 10.70.35.239:/rhs/brick3/disperse 49155 0 Y 22001 Brick 10.70.35.135:/rhs/brick3/disperse 49155 0 Y 21985 Brick 10.70.35.8:/rhs/brick3/disperse 49155 0 Y 22017 Brick 10.70.35.196:/rhs/brick3/disperse 49155 0 Y 22018 Snapshot Daemon on localhost 49156 0 Y 22343 NFS Server on localhost 2049 0 Y 22353 Self-heal Daemon on localhost N/A N/A Y 22244 Quota Daemon on localhost N/A N/A Y 22298 Snapshot Daemon on 10.70.35.135 49156 0 Y 22089 NFS Server on 10.70.35.135 2049 0 Y 22097 Self-heal Daemon on 10.70.35.135 N/A N/A Y 22012 Quota Daemon on 10.70.35.135 N/A N/A Y 22055 Snapshot Daemon on 10.70.35.196 49156 0 Y 22123 NFS Server on 10.70.35.196 2049 0 Y 22131 Self-heal Daemon on 10.70.35.196 N/A N/A Y 22045 Quota Daemon on 10.70.35.196 N/A N/A Y 22086 Snapshot Daemon on 10.70.35.239 49156 0 Y 22105 NFS Server on 10.70.35.239 2049 0 Y 22114 Self-heal Daemon on 10.70.35.239 N/A N/A Y 22028 Quota Daemon on 10.70.35.239 N/A N/A Y 22069 Snapshot Daemon on 10.70.35.116 49156 0 Y 22097 NFS Server on 10.70.35.116 2049 0 Y 22105 Self-heal Daemon on 10.70.35.116 N/A N/A Y 22020 Quota Daemon on 10.70.35.116 N/A N/A Y 22061 Snapshot Daemon on 10.70.35.8 49156 0 Y 22121 NFS Server on 10.70.35.8 2049 0 Y 22129 Self-heal Daemon on 10.70.35.8 N/A N/A Y 22048 Quota Daemon on 10.70.35.8 N/A N/A Y 22088 Task Status of Volume disperse ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-37 ~]# cat /etc/redhat-* cat: /etc/redhat-access-insights: Is a directory Red Hat Enterprise Linux Server release 7.3 (Maipo) Red Hat Gluster Storage Server 3.1 Update 3 [root@dhcp35-37 ~]# rpm -qa|grep gluster glusterfs-libs-3.7.9-12.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-fuse-3.7.9-12.el7rhgs.x86_64 glusterfs-client-xlators-3.7.9-12.el7rhgs.x86_64 glusterfs-server-3.7.9-12.el7rhgs.x86_64 python-gluster-3.7.9-12.el7rhgs.noarch glusterfs-api-3.7.9-12.el7rhgs.x86_64 glusterfs-geo-replication-3.7.9-12.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.el7rhgs.noarch glusterfs-3.7.9-12.el7rhgs.x86_64 glusterfs-cli-3.7.9-12.el7rhgs.x86_64 gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64 [root@dhcp35-37 ~]# ####### FINDINGS########################### Fuse mount: (all bricks are up) touch of 10k files in /rootOFvol/dir1/ ===>took 1min 12sec [root@rhs-client45 dir1]# date;touch file{1..10000};date Wed Dec 21 15:40:31 IST 2016 Wed Dec 21 15:41:43 IST 2016 post above step immediate ls -lRt on root ===>took less than 1sec to start displaying and o/p was completed in 1sec real 0m0.920s user 0m0.084s sys 0m0.126s post above step immediate find * on root ===>took less than 1sec to start displaying and o/p was completed in 1sec real 0m0.863s user 0m0.014s sys 0m0.021s post above step immediate stat * on /rootOfvol/dir1/===>took about 1 sec to respond and 8 sec for completing the total o/p real 0m8.358s user 0m0.482s sys 0m0.651s post above step rm -rf on root ===>took about 50.720s [root@rhs-client45 disperse]# ls dir1 [root@rhs-client45 disperse]# time rm -rf * real 0m50.711s user 0m0.045s sys 0m0.841s Linux Untar: downloaded 4.9 kernel image(size was 89MB) in /rootOFvol/dir2/ ===>the untar folder size was 695MB===>untar took about 25m 23sec as below real 25m23.259s user 0m14.129s sys 0m23.510s ls -lRt of the untarred folder took in total to complete as below real 0m50.975s user 0m0.701s sys 0m1.292s 3.2 numbers: Setup info Ec volume build (3.8.4-9) rhel 7.3 6 VMs each of 8GB Client: 16GB RHEL7.3 Volume Name: disperse Type: Distributed-Disperse Volume ID: ef4f768e-4b10-4c81-8053-adafaa1183db Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick2/disperse Brick2: 10.70.35.116:/rhs/brick2/disperse Brick3: 10.70.35.239:/rhs/brick2/disperse Brick4: 10.70.35.135:/rhs/brick2/disperse Brick5: 10.70.35.8:/rhs/brick2/disperse Brick6: 10.70.35.196:/rhs/brick2/disperse Brick7: 10.70.35.37:/rhs/brick3/disperse Brick8: 10.70.35.116:/rhs/brick3/disperse Brick9: 10.70.35.239:/rhs/brick3/disperse Brick10: 10.70.35.135:/rhs/brick3/disperse Brick11: 10.70.35.8:/rhs/brick3/disperse Brick12: 10.70.35.196:/rhs/brick3/disperse Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on ####### FINDINGS########################### Fuse mount: (all bricks are up) touch of 10k files in /rootOFvol/dir1/ ===>took 3min 09Sec post above step immediate ls -lRt on root ===>took less than 1sec to start displaying and o/p was completed in 1sec post above step immediate find * on root ===>took less than 1sec to start displaying and o/p was completed in 1sec post above step immediate stat * on /rootOfvol/dir1/===>took about 1 sec to respond and 14 sec for completing the total o/p post above step rm -rf on root ===>took about 1min 15.720s Linux Untar: downloaded 4.9 kernel image(size was 89MB) in /rootOFvol/dir2/ ===>the untar folder size was 695MB===>untar took about 43m 15 sec as below Real 43m15.552s user 0m16.014s sys 0m35.983s ls -lRt of the untarred folder took in total to complete as below real 0m59.373s For actual numbers you can refer to https://docs.google.com/spreadsheets/d/1T0pqXuL8mnIMMATwNGVWVTuJLSwziQvwi7LGr0LSgnk/edit#gid=0 Also attaching 3.2 and 3.1.3 numbers seperately Created attachment 1234395 [details]
3.2 numbers
Created attachment 1234396 [details]
3.1.3 numbers
I have rerun the measurements without enabling quota/uss and have been updated in https://docs.google.com/spreadsheets/d/1T0pqXuL8mnIMMATwNGVWVTuJLSwziQvwi7LGr0LSgnk/edit#gid=0 I still see the degradation **Perf Data on physical machines** : *Setup/Environment details*: Testbed : 12*(4+2),6 servers,6 workload generating clients. Benchmark : 3.1.3 with io-threads enabled. 3.2 testing was done with io-threads enabled and mdcache parameters set *********** OBSERVATION *********** **FUSE** : 1. Creates : 3.1.3 : **Perf Data on physical machines** : *Setup/Environment details*: Testbed : 12*(4+2),6 servers,6 workload generating clients. Benchmark : 3.1.3 with io-threads enabled. 3.2 testing was done with io-threads enabled and mdcache parameters set *********** OBSERVATION *********** --------- Creates --------- 3.1.3 : 3445 files/sec 3.2 : 1841 files/sec Regression - -46% -------- Renames -------- 3.1.3 : 724 files/sec 3.2 : 592 files/sec Regression : -18% mkdir regression on FUSE and gNFS is tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1408655 Upstream patch : http://review.gluster.org/#/c/16298/ A new upstream patch https://review.gluster.org/#/c/16821/ is posted with a different alternative. Small file workloads are well within the regression threshold my runs are allowed to have. Verified on 3.8.4-18. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |