1334907 – [Perf] : Multi-fd support on Ganesha v4 disables AFR eager-locking affecting performance

Bug 1334907 - [Perf] : Multi-fd support on Ganesha v4 disables AFR eager-locking affecting performance

Summary: [Perf] : Multi-fd support on Ganesha v4 disables AFR eager-locking affecting ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Soumya Koduri
QA Contact:	Sachin P Mali
Docs Contact:
URL:
Whiteboard:
Depends On:	1339553 1350787 1351877
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-10 18:51 UTC by Ambarish
Modified:	2019-03-19 02:39 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-18 09:47:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterfs fixes from 3.7.4 to 3.7.5 (4.08 KB, text/plain) 2016-05-18 11:16 UTC, Soumya Koduri	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1382084	0	high	CLOSED	[Perf] : Poor large file performance on Ganesha v4 mounts compared to Gluster NFS	2021-02-22 00:41:40 UTC

Internal Links: 1382084

Description Ambarish 2016-05-10 18:51:08 UTC

Description of problem:
-----------------------

There is a regression in large file reads/writes on Ganesha v4 mounts(not tested on v3 ATM) with the Ganesha rebase.
The benchmark was calculated on RHGS 3.1.1 + Ganesha 2.2.0-9.This was compared with the current RHGS 3.1.3 build (3.7.9-3) + Ganesha 2.3.1-4.
There's ~26% decrease in performance on sequential writes and reads.
There was no failback/failover happening, which may have affected I/O performance.I checked for that in the logs.

Even the baseline numbers on RHGS 3.1.1 and Ganesha 2.2.0 (vers=4) are nowhere close to what I see on gNFS mounts under the same workload.If NFS-Ganesha is THE future,it would be good to have it as performant as its "soon-to-be-obsolete" counterpart - gNFS.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

For current build :

[root@gqas001 rpm2]# rpm -qa|grep ganesha
nfs-ganesha-2.3.1-4.el6rhs.x86_64
nfs-ganesha-gluster-2.3.1-4.el6rhs.x86_64
glusterfs-ganesha-3.7.9-3.el6rhs.x86_64
[root@gqas001 rpm2]# 

For baseline :

[root@gqas014 ~]# rpm -qa|grep  ganesha
nfs-ganesha-2.2.0-9.el6rhs.x86_64
glusterfs-ganesha-3.7.1-16.el6rhs.x86_64
nfs-ganesha-gluster-2.2.0-9.el6rhs.x86_64
[root@gqas014 ~]# 

How reproducible:
----------------

100%

Steps to Reproduce:
------------------

1. Create 2*2 volume.Mount via NFS Ganesha vers=4. 

2. Run Iozone Seq Write,Seq Read,Radom R/W workload 

3. Check for regression with older builds.

Actual results:
--------------

R/Ws are slow in general.
Also there's a > 10% regression from older Ganesha and gluster builds.

Expected results:
----------------

Regression Threshold is 10%

Additional info:
---------------

10 GbE network

*Vol Conf* :

[root@gqas001 rpm2]# gluster v info testvol
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 638ef9eb-c536-424d-a0cd-134c1a6271b4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
server.allow-insecure: on
performance.stat-prefetch: off
performance.readdir-ahead: on
cluster.lookup-optimize: on
server.event-threads: 4
client.event-threads: 4
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas001 rpm2]# 

*Other packages* :

[root@gqas001 rpm2]# rpm -qa|grep cman
cman-3.0.12.1-73.el6.1.x86_64

[root@gqas001 rpm2]# rpm -qa|grep pcs
pcs-0.9.139-9.el6.x86_64

[root@gqas001 rpm2]# rpm -qa|grep pacemaker
pacemaker-libs-1.1.12-8.el6.x86_64
pacemaker-cluster-libs-1.1.12-8.el6.x86_64
pacemaker-cli-1.1.12-8.el6.x86_64
pacemaker-1.1.12-8.el6.x86_64

[root@gqas001 rpm2]# rpm -qa|grep ccs
ccs-0.16.2-81.el6.x86_64

Comment 2 Ambarish 2016-05-10 18:55:21 UTC

*EXACT WORKLOAD* :

Each of these tests is ran twice :

> iozone -+m <Iozone config file here> -+h <hostname here> -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16

> iozone -+m <Iozone config file here> -+h <hostname here> m -C -w -c -e -i 0 -+n -r 64k -s 8g -t 16

> iozone -+m <Iozone config file here> -+h <hostname here> -C -w -c -e -i 2 -J 3 -+n -r 64k -s 2g -t 16

Comment 24 Atin Mukherjee 2016-05-18 10:06:22 UTC

RCA is still in progress.

Comment 26 Soumya Koduri 2016-05-18 11:16:11 UTC

Created attachment 1158731 [details]
glusterfs fixes from 3.7.4 to 3.7.5

Comment 30 Ambarish 2016-05-20 09:25:24 UTC

Ugggh!!!
I cleared the need info flags on Du and Pranith as well.
Resetting..

Comment 31 Pranith Kumar K 2016-05-20 10:30:22 UTC

Based on the numbers you provided, I think it has nothing to do with replicate layer. Thanks soumya ambarish for the tests.

Pranith

Comment 32 Soumya Koduri 2016-05-20 10:48:32 UTC

Pranith,

Based on comment#28, SEQUENTIAL_READS has performance drop only with replicated volumes. Any insights on that?

Comment 35 Manoj Pillai 2016-05-23 13:10:36 UTC

(In reply to Ambarish from comment #2)
> *EXACT WORKLOAD* :
> 
> Each of these tests is ran twice :

I wanted to focus a bit on the twice part.

I see mean throughput reported in some of the comments. Can you report the numbers for each of the two runs, so we can get a sense of the variance in the numbers?

Are the files created by the first run deleted before the second run is started?

Comment 36 Ambarish 2016-05-23 13:22:42 UTC

(In reply to Manoj Pillai from comment #35)
> (In reply to Ambarish from comment #2)
> > *EXACT WORKLOAD* :
> > 
> > Each of these tests is ran twice :
> 
> I wanted to focus a bit on the twice part.
> 
> I see mean throughput reported in some of the comments. Can you report the
> numbers for each of the two runs, so we can get a sense of the variance in
> the numbers?
> 
> Are the files created by the first run deleted before the second run is
> started?

Manoj,

This is on 3.1.3 and Ganesha 2.3.1-4 on a 2*2 volume : 

Sequential Writes : 

Test 1 : 559257.13 KB/sec
Test 2 : 489149.01 KB/sec

Sequnetial Reads :

Test 1 : 1692750.89 KB/sec
Test 2 : 1708779.78 KB/sec

Random Reads 

Test 1 : 591827.29 KB/sec
Test 2 : 584802.99 KB/sec

Random Writes :

Test 1 : 120525.31 KB/sec
Test 2 : 127400.71 KB/sec


Mount point was cleared before running another iteration of sequential writes.

Comment 38 Raghavendra G 2016-05-23 13:32:30 UTC

(In reply to Ambarish from comment #28)
> Soumya,
> 
> These were my observations on a Dist Volume :
> 
> *****************************
> RHGS 3.1.1 + Ganesha 2.3.1-5
> *****************************
> 
> Sequential Write : 824095.89 KB/sec

Total write calls: 779125
Total stat calls: 786606

> 
> *****************************
> RHGS 3.1.3 + Ganesha 2.3.1-5
> *****************************
> 
> Sequential Write :  548978.21 KB/sec

Total write calls: 912569
Total stat calls: 922643

As can be seen above there is an increase from 3.1.1 to 3.1.3 
1. in the number of write calls by 133444 which is 14.6%. Total increase in time = 133444 * 397.1175 us = 53 seconds
2. in the number of stat calls by 136037 which is 14.7%. Total increase in time = 136037 * 94.0425 us = 12.8 seconds
3. in the number of fsync calls by 10 which is 8.4%. Total increase in time = 1045113.255 * 10 us = 10.5 seconds

The performance drop for sequential writes is around 33.4%.

iozone doesn't give the time readily. I think it can be calculated. Once we calculate increase in time, we can compare the total increase in time with cumulative time increase in stat, write and fsync calls. Post that we should be able to tell whether the increase in number of these fops is the root cause for performance drop.

> 
> I could see a regression in sequential writes on a plain distributed volume
> as well,not much on reads though.

Comment 39 Raghavendra G 2016-05-24 12:13:16 UTC

(In reply to Raghavendra G from comment #38)
> (In reply to Ambarish from comment #28)
> > Soumya,
> > 
> > These were my observations on a Dist Volume :
> > 
> > *****************************
> > RHGS 3.1.1 + Ganesha 2.3.1-5
> > *****************************
> > 
> > Sequential Write : 824095.89 KB/sec
> 
> Total write calls: 779125
> Total stat calls: 786606
> 
> > 
> > *****************************
> > RHGS 3.1.3 + Ganesha 2.3.1-5
> > *****************************
> > 
> > Sequential Write :  548978.21 KB/sec
> 
> Total write calls: 912569
> Total stat calls: 922643
> 
> As can be seen above there is an increase from 3.1.1 to 3.1.3 
> 1. in the number of write calls by 133444 which is 14.6%. Total increase in
> time = 133444 * 397.1175 us = 53 seconds
> 2. in the number of stat calls by 136037 which is 14.7%. Total increase in
> time = 136037 * 94.0425 us = 12.8 seconds
> 3. in the number of fsync calls by 10 which is 8.4%. Total increase in time
> = 1045113.255 * 10 us = 10.5 seconds
> 
> The performance drop for sequential writes is around 33.4%.
> 
> iozone doesn't give the time readily. I think it can be calculated. Once we
> calculate increase in time, we can compare the total increase in time with
> cumulative time increase in stat, write and fsync calls. Post that we should
> be able to tell whether the increase in number of these fops is the root
> cause for performance drop.

Total time taken on RHGS-3.1.1 = (8388608*16)/824095.89 = 162.86663922083144 seconds.

Total time taken on RHGS-3.1.3 = (8388608 * 16)/548978.21 = 244.48643963482633 seconds.

So, time difference = 244.48643963482633 - 162.86663922083144 = 81.61980041399488 seconds.

increase in time due increase in write, stat and fsync calls = (53 + 12.8 + 10.5) = 76.3

So, I assume the decrease in performance is due to increase in write, stat and fsync calls.

Comment 48 Soumya Koduri 2016-05-25 10:07:54 UTC

Above mentioned fd leak during create is being tracked upstream as part of bug1339553.

Patch posted for this issue - 
    http://review.gluster.org/14532

Comment 55 Jiffin 2016-10-24 14:22:16 UTC

The patch merged upstream should fix this issue https://review.gerrithub.io/#/c/295524/ for nfs-ganesha-2.4

Comment 59 Ambarish 2016-11-14 12:59:17 UTC

Comparing data again for large files,on 3.1.1,3.1.3 and 3.2 :

************************************************
THROUGHPUT VALUES on RHGS 3.1.1 + Ganesha 2.2.0
************************************************

MEAN SEQ WRITE THROUGHPUT : 751385 kB/s

MEAN SEQ READ THROUGHPUT : 2225940.95 kB/s

MEAN RAND READ THROUGHPUT : 435995.63 kB/s

MEAN RAND WRITE THROUGHPUT : 155435.46 kB/s

************************************************
THROUGHPUT VALUES on RHGS 3.1.3 + Ganesha 2.3.1
************************************************


MEAN SEQ WRITE THROUGHPUT :  559257.13 KB/sec

MEAN SEQ READ THROUGHPUT :  1692750.89 KB/sec

MEAN RAND READ THROUGHPUT :  591827.29 KB/sec

MEAN RAND WRITE THROUGHPUT : 120525.31 KB/sec

************************************************
THROUGHPUT VALUES on RHGS 3.2 (3.8.4-4) + Ganesha 2.4.1
************************************************


MEAN SEQ WRITE THROUGHPUT :   1208539.89 KB/sec

MEAN SEQ READ THROUGHPUT :    1458583.52 KB/sec

MEAN RAND READ THROUGHPUT :   631356.65 KB/sec

MEAN RAND WRITE THROUGHPUT :  92227.03.31 KB/sec

There is still a 23% regression on random writes(This can be tracked via this bug) and 35% regression with large file sequential reads (tracked via https://bugzilla.redhat.com/show_bug.cgi?id=1394654)

Sequential Writes are substantially improved (almost 60% from 3.1.1 and 114% from 3.1.3).Random  reads have also increased by almost 45% since 3.1.1.

But till the regressions are fixed,I cannot move this  bug to Verified.

Moving this bug back to assigned

Comment 83 Ambarish 2017-02-10 04:11:00 UTC

Changing summary to something more appropriate.

See Perf Tracker for Large File Perf on Ganesha -  https://bugzilla.redhat.com/show_bug.cgi?id=1382084

Note You need to log in before you can comment on or make changes to this bug.