Description of problem: ----------------------- There is a regression in large file reads/writes on Ganesha v4 mounts(not tested on v3 ATM) with the Ganesha rebase. The benchmark was calculated on RHGS 3.1.1 + Ganesha 2.2.0-9.This was compared with the current RHGS 3.1.3 build (3.7.9-3) + Ganesha 2.3.1-4. There's ~26% decrease in performance on sequential writes and reads. There was no failback/failover happening, which may have affected I/O performance.I checked for that in the logs. Even the baseline numbers on RHGS 3.1.1 and Ganesha 2.2.0 (vers=4) are nowhere close to what I see on gNFS mounts under the same workload.If NFS-Ganesha is THE future,it would be good to have it as performant as its "soon-to-be-obsolete" counterpart - gNFS. Version-Release number of selected component (if applicable): ------------------------------------------------------------- For current build : [root@gqas001 rpm2]# rpm -qa|grep ganesha nfs-ganesha-2.3.1-4.el6rhs.x86_64 nfs-ganesha-gluster-2.3.1-4.el6rhs.x86_64 glusterfs-ganesha-3.7.9-3.el6rhs.x86_64 [root@gqas001 rpm2]# For baseline : [root@gqas014 ~]# rpm -qa|grep ganesha nfs-ganesha-2.2.0-9.el6rhs.x86_64 glusterfs-ganesha-3.7.1-16.el6rhs.x86_64 nfs-ganesha-gluster-2.2.0-9.el6rhs.x86_64 [root@gqas014 ~]# How reproducible: ---------------- 100% Steps to Reproduce: ------------------ 1. Create 2*2 volume.Mount via NFS Ganesha vers=4. 2. Run Iozone Seq Write,Seq Read,Radom R/W workload 3. Check for regression with older builds. Actual results: -------------- R/Ws are slow in general. Also there's a > 10% regression from older Ganesha and gluster builds. Expected results: ---------------- Regression Threshold is 10% Additional info: --------------- 10 GbE network *Vol Conf* : [root@gqas001 rpm2]# gluster v info testvol Volume Name: testvol Type: Distributed-Replicate Volume ID: 638ef9eb-c536-424d-a0cd-134c1a6271b4 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on nfs.disable: on server.allow-insecure: on performance.stat-prefetch: off performance.readdir-ahead: on cluster.lookup-optimize: on server.event-threads: 4 client.event-threads: 4 nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas001 rpm2]# *Other packages* : [root@gqas001 rpm2]# rpm -qa|grep cman cman-3.0.12.1-73.el6.1.x86_64 [root@gqas001 rpm2]# rpm -qa|grep pcs pcs-0.9.139-9.el6.x86_64 [root@gqas001 rpm2]# rpm -qa|grep pacemaker pacemaker-libs-1.1.12-8.el6.x86_64 pacemaker-cluster-libs-1.1.12-8.el6.x86_64 pacemaker-cli-1.1.12-8.el6.x86_64 pacemaker-1.1.12-8.el6.x86_64 [root@gqas001 rpm2]# rpm -qa|grep ccs ccs-0.16.2-81.el6.x86_64
*EXACT WORKLOAD* : Each of these tests is ran twice : > iozone -+m <Iozone config file here> -+h <hostname here> -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16 > iozone -+m <Iozone config file here> -+h <hostname here> m -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16 > iozone -+m <Iozone config file here> -+h <hostname here> -C -w -c -e -i 2 -J 3 -+n -r 64k -s 2g -t 16
RCA is still in progress.
Created attachment 1158731 [details] glusterfs fixes from 3.7.4 to 3.7.5
Ugggh!!! I cleared the need info flags on Du and Pranith as well. Resetting..
Based on the numbers you provided, I think it has nothing to do with replicate layer. Thanks soumya ambarish for the tests. Pranith
Pranith, Based on comment#28, SEQUENTIAL_READS has performance drop only with replicated volumes. Any insights on that?
(In reply to Ambarish from comment #2) > *EXACT WORKLOAD* : > > Each of these tests is ran twice : I wanted to focus a bit on the twice part. I see mean throughput reported in some of the comments. Can you report the numbers for each of the two runs, so we can get a sense of the variance in the numbers? Are the files created by the first run deleted before the second run is started?
(In reply to Manoj Pillai from comment #35) > (In reply to Ambarish from comment #2) > > *EXACT WORKLOAD* : > > > > Each of these tests is ran twice : > > I wanted to focus a bit on the twice part. > > I see mean throughput reported in some of the comments. Can you report the > numbers for each of the two runs, so we can get a sense of the variance in > the numbers? > > Are the files created by the first run deleted before the second run is > started? Manoj, This is on 3.1.3 and Ganesha 2.3.1-4 on a 2*2 volume : Sequential Writes : Test 1 : 559257.13 KB/sec Test 2 : 489149.01 KB/sec Sequnetial Reads : Test 1 : 1692750.89 KB/sec Test 2 : 1708779.78 KB/sec Random Reads Test 1 : 591827.29 KB/sec Test 2 : 584802.99 KB/sec Random Writes : Test 1 : 120525.31 KB/sec Test 2 : 127400.71 KB/sec Mount point was cleared before running another iteration of sequential writes.
(In reply to Ambarish from comment #28) > Soumya, > > These were my observations on a Dist Volume : > > ***************************** > RHGS 3.1.1 + Ganesha 2.3.1-5 > ***************************** > > Sequential Write : 824095.89 KB/sec Total write calls: 779125 Total stat calls: 786606 > > ***************************** > RHGS 3.1.3 + Ganesha 2.3.1-5 > ***************************** > > Sequential Write : 548978.21 KB/sec Total write calls: 912569 Total stat calls: 922643 As can be seen above there is an increase from 3.1.1 to 3.1.3 1. in the number of write calls by 133444 which is 14.6%. Total increase in time = 133444 * 397.1175 us = 53 seconds 2. in the number of stat calls by 136037 which is 14.7%. Total increase in time = 136037 * 94.0425 us = 12.8 seconds 3. in the number of fsync calls by 10 which is 8.4%. Total increase in time = 1045113.255 * 10 us = 10.5 seconds The performance drop for sequential writes is around 33.4%. iozone doesn't give the time readily. I think it can be calculated. Once we calculate increase in time, we can compare the total increase in time with cumulative time increase in stat, write and fsync calls. Post that we should be able to tell whether the increase in number of these fops is the root cause for performance drop. > > I could see a regression in sequential writes on a plain distributed volume > as well,not much on reads though.
(In reply to Raghavendra G from comment #38) > (In reply to Ambarish from comment #28) > > Soumya, > > > > These were my observations on a Dist Volume : > > > > ***************************** > > RHGS 3.1.1 + Ganesha 2.3.1-5 > > ***************************** > > > > Sequential Write : 824095.89 KB/sec > > Total write calls: 779125 > Total stat calls: 786606 > > > > > ***************************** > > RHGS 3.1.3 + Ganesha 2.3.1-5 > > ***************************** > > > > Sequential Write : 548978.21 KB/sec > > Total write calls: 912569 > Total stat calls: 922643 > > As can be seen above there is an increase from 3.1.1 to 3.1.3 > 1. in the number of write calls by 133444 which is 14.6%. Total increase in > time = 133444 * 397.1175 us = 53 seconds > 2. in the number of stat calls by 136037 which is 14.7%. Total increase in > time = 136037 * 94.0425 us = 12.8 seconds > 3. in the number of fsync calls by 10 which is 8.4%. Total increase in time > = 1045113.255 * 10 us = 10.5 seconds > > The performance drop for sequential writes is around 33.4%. > > iozone doesn't give the time readily. I think it can be calculated. Once we > calculate increase in time, we can compare the total increase in time with > cumulative time increase in stat, write and fsync calls. Post that we should > be able to tell whether the increase in number of these fops is the root > cause for performance drop. Total time taken on RHGS-3.1.1 = (8388608*16)/824095.89 = 162.86663922083144 seconds. Total time taken on RHGS-3.1.3 = (8388608 * 16)/548978.21 = 244.48643963482633 seconds. So, time difference = 244.48643963482633 - 162.86663922083144 = 81.61980041399488 seconds. increase in time due increase in write, stat and fsync calls = (53 + 12.8 + 10.5) = 76.3 So, I assume the decrease in performance is due to increase in write, stat and fsync calls.
Above mentioned fd leak during create is being tracked upstream as part of bug1339553. Patch posted for this issue - http://review.gluster.org/14532
The patch merged upstream should fix this issue https://review.gerrithub.io/#/c/295524/ for nfs-ganesha-2.4
Comparing data again for large files,on 3.1.1,3.1.3 and 3.2 : ************************************************ THROUGHPUT VALUES on RHGS 3.1.1 + Ganesha 2.2.0 ************************************************ MEAN SEQ WRITE THROUGHPUT : 751385 kB/s MEAN SEQ READ THROUGHPUT : 2225940.95 kB/s MEAN RAND READ THROUGHPUT : 435995.63 kB/s MEAN RAND WRITE THROUGHPUT : 155435.46 kB/s ************************************************ THROUGHPUT VALUES on RHGS 3.1.3 + Ganesha 2.3.1 ************************************************ MEAN SEQ WRITE THROUGHPUT : 559257.13 KB/sec MEAN SEQ READ THROUGHPUT : 1692750.89 KB/sec MEAN RAND READ THROUGHPUT : 591827.29 KB/sec MEAN RAND WRITE THROUGHPUT : 120525.31 KB/sec ************************************************ THROUGHPUT VALUES on RHGS 3.2 (3.8.4-4) + Ganesha 2.4.1 ************************************************ MEAN SEQ WRITE THROUGHPUT : 1208539.89 KB/sec MEAN SEQ READ THROUGHPUT : 1458583.52 KB/sec MEAN RAND READ THROUGHPUT : 631356.65 KB/sec MEAN RAND WRITE THROUGHPUT : 92227.03.31 KB/sec There is still a 23% regression on random writes(This can be tracked via this bug) and 35% regression with large file sequential reads (tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1394654) Sequential Writes are substantially improved (almost 60% from 3.1.1 and 114% from 3.1.3).Random reads have also increased by almost 45% since 3.1.1. But till the regressions are fixed,I cannot move this bug to Verified. Moving this bug back to assigned
Changing summary to something more appropriate. See Perf Tracker for Large File Perf on Ganesha - https://bugzilla.redhat.com/show_bug.cgi?id=1382084