Bug 1382343

Summary: [Perf] : Extremely slow file system crawls during 'du -sh' on Ganesha v4 mounts.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Frank Filz <ffilz>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, bkunal, dang, ffilz, jthottan, kkeithle, pasik, rcyriac, rhinduja, rhs-bugs, sanandpa, sheggodu, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Performance, Triaged
Target Release: RHGS 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-2.7.3-3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:15:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1475699, 1695078    
Bug Blocks: 1657798, 1696806    

Description Ambarish 2016-10-06 12:11:38 UTC
Description of problem:
-----------------------

du -sh on my Ganesha v4 mount point takes looonnngg time to complete.

DATA SET : 20G of data,~5 lac files spread across 5000 dirs.

On gNFS : 5m18.147s
On Ganesha v3 : 6m38.112s
On Ganesha v4 : 81m42.983s

du takes ~6 minutes to complete on gnfs and Ganesha v3,but almost 1 hour 10 minutes on Ganesha v4 mounts on the same data set,each one calculated on a fresh set of machines.

Nothing else was running on the cluster,nor any I/O ran from mount point while du -sh was running.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------

nfs-ganesha-2.4.0-2.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-2.el7rhgs.x86_64


How reproducible:
-----------------

Every which way I try.

Steps to Reproduce:
-------------------

1. Mount a 2*2 volume via gNFS.Create a huge data set.time du -sh over it.Clean mount point.

2. Create the data set again on Ganesha v4.time du -sh while it runs.

Actual results:
---------------

du -sh takes a lot of time on Ganesha v4,mathematically almost 16 times more.

Expected results:
-----------------

du -sh should not take this much time to complete.


Additional info:
----------------

* CLIENT/SERVER OS : RHEL 7.2

* VOLUME CONFIGURATION :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: b93b99bd-d1d2-4236-98bc-08311f94e7dc
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
ganesha.enable: on
features.cache-invalidation: off
nfs.disable: on
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas013 tmp]#

Comment 3 Soumya Koduri 2016-10-14 07:00:20 UTC
As updated in the https://bugzilla.redhat.com/show_bug.cgi?id=1383559#c5, please collect relevant information and update the same while this testcase is run on v4 mount.

Comment 7 surabhi 2016-11-29 10:04:58 UTC
As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack

Comment 10 Atin Mukherjee 2016-12-06 07:15:48 UTC
Upstream fix:

https://review.gerrithub.io/304278
https://review.gerrithub.io/304279

Comment 17 Daniel Gryniewicz 2018-08-27 12:24:51 UTC
This should be moved out of 3.4, since dirent chunk is removed.

Comment 29 Manisha Saini 2019-09-03 06:29:55 UTC
Verified this BZ with

# rpm -qa | grep ganesha
nfs-ganesha-2.7.3-7.el7rhgs.x86_64
glusterfs-ganesha-6.0-11.el7rhgs.x86_64
nfs-ganesha-gluster-2.7.3-7.el7rhgs.x86_64


Steps performed for verification-

1.Create 4 node ganesha cluster
2.Create 2 x (4 + 2) Distributed-Disperse Volume.Enable ganesha on the volume
3.Mount the volume on 4 clients v3/v4.1 via same VIP.
4.Create huge data set consisting of small,large and empty directories-
5.Run du -sh on v3 and v4.1 mounts


v3 mount
--------

# time du -sh
28G     .

real    23m12.490s
user    0m4.817s
sys     1m0.564s


v4.1 mount
---------

# time du -sh
28G     .

real    11m3.559s
user    0m1.582s
sys     0m22.307s

# time du -sh
28G     .

real    31m20.569s
user    0m7.041s
sys     1m46.864s


Moving this BZ to verified state

Comment 31 errata-xmlrpc 2019-10-30 12:15:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3252