Bug 1406723
| Summary: | [Perf] : significant Performance regression seen with disperse volume when compared with 3.1.3 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | ||||||
| Component: | disperse | Assignee: | Ashish Pandey <aspandey> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Ambarish <asoman> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rhgs-3.2 | CC: | amukherj, asoman, aspandey, pkarampu, rcyriac, rhinduja, rhs-bugs, storage-qa-internal | ||||||
| Target Milestone: | --- | Keywords: | Regression | ||||||
| Target Release: | RHGS 3.2.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | glusterfs-3.8.4-17 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1408809 (view as bug list) | Environment: | |||||||
| Last Closed: | 2017-03-23 05:58:45 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1351528, 1408809 | ||||||||
| Attachments: |
|
||||||||
|
Description
Nag Pavan Chilakam
2016-12-21 10:37:56 UTC
below are the numbers (in both cases, quotas was enabled, uss was turned on) 2x(4+2) volume on rhel7.3 File Operation 3.1.3 3.2 touch to create new file 10000 files 1min 12sec 3min 9sec linux untar of kernel image 4.9 25min 23Sec 43min 15sec ls -lRt of untarred directory 51Sec 59.3Sec rm -rf of the 10k files 50sec 1min 15sec Stat * of the folder hosting 10000 files 8sec 14sec Operation 3.1.3 3.2 Drop in Performance touch 72 189 61.9047619048 untar 1523 2595 41.3102119461 ls 51 59 13.5593220339 rm -rf 50 75 33.3333333333 stat 8 14 42.8571428571 3.1.3 numbers:
Was run in 3.2 time frame
Setup info
Ec volume build (3.7.9-12) 2nd async build after GA
rhel 7.3
6 VMs each of 8GB
Client:
16GB RHEL7.3
root@dhcp35-37 ~]# gluster v info
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: ccede272-2cde-4b55-be94-51581289eb56
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick2/disperse
Brick2: 10.70.35.116:/rhs/brick2/disperse
Brick3: 10.70.35.239:/rhs/brick2/disperse
Brick4: 10.70.35.135:/rhs/brick2/disperse
Brick5: 10.70.35.8:/rhs/brick2/disperse
Brick6: 10.70.35.196:/rhs/brick2/disperse
Brick7: 10.70.35.37:/rhs/brick3/disperse
Brick8: 10.70.35.116:/rhs/brick3/disperse
Brick9: 10.70.35.239:/rhs/brick3/disperse
Brick10: 10.70.35.135:/rhs/brick3/disperse
Brick11: 10.70.35.8:/rhs/brick3/disperse
Brick12: 10.70.35.196:/rhs/brick3/disperse
Options Reconfigured:
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
[root@dhcp35-37 ~]# gluster v status
Status of volume: disperse
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.35.37:/rhs/brick2/disperse 49154 0 Y 22200
Brick 10.70.35.116:/rhs/brick2/disperse 49154 0 Y 21974
Brick 10.70.35.239:/rhs/brick2/disperse 49154 0 Y 21982
Brick 10.70.35.135:/rhs/brick2/disperse 49154 0 Y 21966
Brick 10.70.35.8:/rhs/brick2/disperse 49154 0 Y 21998
Brick 10.70.35.196:/rhs/brick2/disperse 49154 0 Y 21999
Brick 10.70.35.37:/rhs/brick3/disperse 49155 0 Y 22219
Brick 10.70.35.116:/rhs/brick3/disperse 49155 0 Y 21993
Brick 10.70.35.239:/rhs/brick3/disperse 49155 0 Y 22001
Brick 10.70.35.135:/rhs/brick3/disperse 49155 0 Y 21985
Brick 10.70.35.8:/rhs/brick3/disperse 49155 0 Y 22017
Brick 10.70.35.196:/rhs/brick3/disperse 49155 0 Y 22018
Snapshot Daemon on localhost 49156 0 Y 22343
NFS Server on localhost 2049 0 Y 22353
Self-heal Daemon on localhost N/A N/A Y 22244
Quota Daemon on localhost N/A N/A Y 22298
Snapshot Daemon on 10.70.35.135 49156 0 Y 22089
NFS Server on 10.70.35.135 2049 0 Y 22097
Self-heal Daemon on 10.70.35.135 N/A N/A Y 22012
Quota Daemon on 10.70.35.135 N/A N/A Y 22055
Snapshot Daemon on 10.70.35.196 49156 0 Y 22123
NFS Server on 10.70.35.196 2049 0 Y 22131
Self-heal Daemon on 10.70.35.196 N/A N/A Y 22045
Quota Daemon on 10.70.35.196 N/A N/A Y 22086
Snapshot Daemon on 10.70.35.239 49156 0 Y 22105
NFS Server on 10.70.35.239 2049 0 Y 22114
Self-heal Daemon on 10.70.35.239 N/A N/A Y 22028
Quota Daemon on 10.70.35.239 N/A N/A Y 22069
Snapshot Daemon on 10.70.35.116 49156 0 Y 22097
NFS Server on 10.70.35.116 2049 0 Y 22105
Self-heal Daemon on 10.70.35.116 N/A N/A Y 22020
Quota Daemon on 10.70.35.116 N/A N/A Y 22061
Snapshot Daemon on 10.70.35.8 49156 0 Y 22121
NFS Server on 10.70.35.8 2049 0 Y 22129
Self-heal Daemon on 10.70.35.8 N/A N/A Y 22048
Quota Daemon on 10.70.35.8 N/A N/A Y 22088
Task Status of Volume disperse
------------------------------------------------------------------------------
There are no active volume tasks
[root@dhcp35-37 ~]# cat /etc/redhat-*
cat: /etc/redhat-access-insights: Is a directory
Red Hat Enterprise Linux Server release 7.3 (Maipo)
Red Hat Gluster Storage Server 3.1 Update 3
[root@dhcp35-37 ~]# rpm -qa|grep gluster
glusterfs-libs-3.7.9-12.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-fuse-3.7.9-12.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-12.el7rhgs.x86_64
glusterfs-server-3.7.9-12.el7rhgs.x86_64
python-gluster-3.7.9-12.el7rhgs.noarch
glusterfs-api-3.7.9-12.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-12.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.el7rhgs.noarch
glusterfs-3.7.9-12.el7rhgs.x86_64
glusterfs-cli-3.7.9-12.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
[root@dhcp35-37 ~]#
####### FINDINGS###########################
Fuse mount:
(all bricks are up)
touch of 10k files in /rootOFvol/dir1/ ===>took 1min 12sec
[root@rhs-client45 dir1]# date;touch file{1..10000};date
Wed Dec 21 15:40:31 IST 2016
Wed Dec 21 15:41:43 IST 2016
post above step immediate ls -lRt on root ===>took less than 1sec to start displaying and o/p was completed in 1sec
real 0m0.920s
user 0m0.084s
sys 0m0.126s
post above step immediate find * on root ===>took less than 1sec to start displaying and o/p was completed in 1sec
real 0m0.863s
user 0m0.014s
sys 0m0.021s
post above step immediate stat * on /rootOfvol/dir1/===>took about 1 sec to respond and 8 sec for completing the total o/p
real 0m8.358s
user 0m0.482s
sys 0m0.651s
post above step rm -rf on root ===>took about 50.720s
[root@rhs-client45 disperse]# ls
dir1
[root@rhs-client45 disperse]# time rm -rf *
real 0m50.711s
user 0m0.045s
sys 0m0.841s
Linux Untar:
downloaded 4.9 kernel image(size was 89MB) in /rootOFvol/dir2/ ===>the untar folder size was 695MB===>untar took about 25m 23sec as below
real 25m23.259s
user 0m14.129s
sys 0m23.510s
ls -lRt of the untarred folder took in total to complete as below
real 0m50.975s
user 0m0.701s
sys 0m1.292s
3.2 numbers: Setup info Ec volume build (3.8.4-9) rhel 7.3 6 VMs each of 8GB Client: 16GB RHEL7.3 Volume Name: disperse Type: Distributed-Disperse Volume ID: ef4f768e-4b10-4c81-8053-adafaa1183db Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick2/disperse Brick2: 10.70.35.116:/rhs/brick2/disperse Brick3: 10.70.35.239:/rhs/brick2/disperse Brick4: 10.70.35.135:/rhs/brick2/disperse Brick5: 10.70.35.8:/rhs/brick2/disperse Brick6: 10.70.35.196:/rhs/brick2/disperse Brick7: 10.70.35.37:/rhs/brick3/disperse Brick8: 10.70.35.116:/rhs/brick3/disperse Brick9: 10.70.35.239:/rhs/brick3/disperse Brick10: 10.70.35.135:/rhs/brick3/disperse Brick11: 10.70.35.8:/rhs/brick3/disperse Brick12: 10.70.35.196:/rhs/brick3/disperse Options Reconfigured: features.uss: enable features.quota-deem-statfs: on features.inode-quota: on features.quota: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on ####### FINDINGS########################### Fuse mount: (all bricks are up) touch of 10k files in /rootOFvol/dir1/ ===>took 3min 09Sec post above step immediate ls -lRt on root ===>took less than 1sec to start displaying and o/p was completed in 1sec post above step immediate find * on root ===>took less than 1sec to start displaying and o/p was completed in 1sec post above step immediate stat * on /rootOfvol/dir1/===>took about 1 sec to respond and 14 sec for completing the total o/p post above step rm -rf on root ===>took about 1min 15.720s Linux Untar: downloaded 4.9 kernel image(size was 89MB) in /rootOFvol/dir2/ ===>the untar folder size was 695MB===>untar took about 43m 15 sec as below Real 43m15.552s user 0m16.014s sys 0m35.983s ls -lRt of the untarred folder took in total to complete as below real 0m59.373s For actual numbers you can refer to https://docs.google.com/spreadsheets/d/1T0pqXuL8mnIMMATwNGVWVTuJLSwziQvwi7LGr0LSgnk/edit#gid=0 Also attaching 3.2 and 3.1.3 numbers seperately Created attachment 1234395 [details]
3.2 numbers
Created attachment 1234396 [details]
3.1.3 numbers
I have rerun the measurements without enabling quota/uss and have been updated in https://docs.google.com/spreadsheets/d/1T0pqXuL8mnIMMATwNGVWVTuJLSwziQvwi7LGr0LSgnk/edit#gid=0 I still see the degradation **Perf Data on physical machines** : *Setup/Environment details*: Testbed : 12*(4+2),6 servers,6 workload generating clients. Benchmark : 3.1.3 with io-threads enabled. 3.2 testing was done with io-threads enabled and mdcache parameters set *********** OBSERVATION *********** **FUSE** : 1. Creates : 3.1.3 : **Perf Data on physical machines** : *Setup/Environment details*: Testbed : 12*(4+2),6 servers,6 workload generating clients. Benchmark : 3.1.3 with io-threads enabled. 3.2 testing was done with io-threads enabled and mdcache parameters set *********** OBSERVATION *********** --------- Creates --------- 3.1.3 : 3445 files/sec 3.2 : 1841 files/sec Regression - -46% -------- Renames -------- 3.1.3 : 724 files/sec 3.2 : 592 files/sec Regression : -18% mkdir regression on FUSE and gNFS is tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1408655 Upstream patch : http://review.gluster.org/#/c/16298/ A new upstream patch https://review.gluster.org/#/c/16821/ is posted with a different alternative. Small file workloads are well within the regression threshold my runs are allowed to have. Verified on 3.8.4-18. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |