1382343 – [Perf] : Extremely slow file system crawls during 'du -sh' on Ganesha v4 mounts.

Bug 1382343 - [Perf] : Extremely slow file system crawls during 'du -sh' on Ganesha v4 mounts.

Summary: [Perf] : Extremely slow file system crawls during 'du -sh' on Ganesha v4 mou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.5.0
Assignee:	Frank Filz
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:	1475699 1695078
Blocks:	1657798 1696806
TreeView+	depends on / blocked

Reported:	2016-10-06 12:11 UTC by Ambarish
Modified:	2020-03-11 15:17 UTC (History)
CC List:	14 users (show)
Fixed In Version:	nfs-ganesha-2.7.3-3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-30 12:15:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2019:3252	0	None	None	None	2019-10-30 12:16:15 UTC

Description Ambarish 2016-10-06 12:11:38 UTC

Description of problem:
-----------------------

du -sh on my Ganesha v4 mount point takes looonnngg time to complete.

DATA SET : 20G of data,~5 lac files spread across 5000 dirs.

On gNFS : 5m18.147s
On Ganesha v3 : 6m38.112s
On Ganesha v4 : 81m42.983s

du takes ~6 minutes to complete on gnfs and Ganesha v3,but almost 1 hour 10 minutes on Ganesha v4 mounts on the same data set,each one calculated on a fresh set of machines.

Nothing else was running on the cluster,nor any I/O ran from mount point while du -sh was running.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

nfs-ganesha-2.4.0-2.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-2.el7rhgs.x86_64


How reproducible:
-----------------

Every which way I try.

Steps to Reproduce:
-------------------

1. Mount a 2*2 volume via gNFS.Create a huge data set.time du -sh over it.Clean mount point.

2. Create the data set again on Ganesha v4.time du -sh while it runs.

Actual results:
---------------

du -sh takes a lot of time on Ganesha v4,mathematically almost 16 times more.

Expected results:
-----------------

du -sh should not take this much time to complete.


Additional info:
----------------

* CLIENT/SERVER OS : RHEL 7.2

* VOLUME CONFIGURATION :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: b93b99bd-d1d2-4236-98bc-08311f94e7dc
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
ganesha.enable: on
features.cache-invalidation: off
nfs.disable: on
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas013 tmp]#

Comment 3 Soumya Koduri 2016-10-14 07:00:20 UTC

As updated in the https://bugzilla.redhat.com/show_bug.cgi?id=1383559#c5, please collect relevant information and update the same while this testcase is run on v4 mount.

Comment 7 surabhi 2016-11-29 10:04:58 UTC

As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack

Comment 10 Atin Mukherjee 2016-12-06 07:15:48 UTC

Upstream fix:

https://review.gerrithub.io/304278
https://review.gerrithub.io/304279

Comment 17 Daniel Gryniewicz 2018-08-27 12:24:51 UTC

This should be moved out of 3.4, since dirent chunk is removed.

Comment 29 Manisha Saini 2019-09-03 06:29:55 UTC

Verified this BZ with

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-7.el7rhgs.x86_64
glusterfs-ganesha-6.0-11.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-7.el7rhgs.x86_64


Steps performed for verification-

1.Create 4 node ganesha cluster
2.Create 2 x (4 + 2) Distributed-Disperse Volume.Enable ganesha on the volume
3.Mount the volume on 4 clients v3/v4.1 via same VIP.
4.Create huge data set consisting of small,large and empty directories-
5.Run du -sh on v3 and v4.1 mounts


v3 mount
--------

# time du -sh
28G     .

real    23m12.490s
user    0m4.817s
sys     1m0.564s


v4.1 mount
---------

# time du -sh
28G     .

real    11m3.559s
user    0m1.582s
sys     0m22.307s

# time du -sh
28G     .

real    31m20.569s
user    0m7.041s
sys     1m46.864s


Moving this BZ to verified state

Comment 31 errata-xmlrpc 2019-10-30 12:15:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3252

Note You need to log in before you can comment on or make changes to this bug.