Description of problem: ----------------------- du -sh on my Ganesha v4 mount point takes looonnngg time to complete. DATA SET : 20G of data,~5 lac files spread across 5000 dirs. On gNFS : 5m18.147s On Ganesha v3 : 6m38.112s On Ganesha v4 : 81m42.983s du takes ~6 minutes to complete on gnfs and Ganesha v3,but almost 1 hour 10 minutes on Ganesha v4 mounts on the same data set,each one calculated on a fresh set of machines. Nothing else was running on the cluster,nor any I/O ran from mount point while du -sh was running. Version-Release number of selected component (if applicable): -------------------------------------------------------------- nfs-ganesha-2.4.0-2.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-2.el7rhgs.x86_64 How reproducible: ----------------- Every which way I try. Steps to Reproduce: ------------------- 1. Mount a 2*2 volume via gNFS.Create a huge data set.time du -sh over it.Clean mount point. 2. Create the data set again on Ganesha v4.time du -sh while it runs. Actual results: --------------- du -sh takes a lot of time on Ganesha v4,mathematically almost 16 times more. Expected results: ----------------- du -sh should not take this much time to complete. Additional info: ---------------- * CLIENT/SERVER OS : RHEL 7.2 * VOLUME CONFIGURATION : Volume Name: testvol Type: Distributed-Replicate Volume ID: b93b99bd-d1d2-4236-98bc-08311f94e7dc Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on ganesha.enable: on features.cache-invalidation: off nfs.disable: on performance.readdir-ahead: on performance.stat-prefetch: off server.allow-insecure: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@gqas013 tmp]#
As updated in the https://bugzilla.redhat.com/show_bug.cgi?id=1383559#c5, please collect relevant information and update the same while this testcase is run on v4 mount.
As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack
Upstream fix: https://review.gerrithub.io/304278 https://review.gerrithub.io/304279
This should be moved out of 3.4, since dirent chunk is removed.
Verified this BZ with # rpm -qa | grep ganesha nfs-ganesha-2.7.3-7.el7rhgs.x86_64 glusterfs-ganesha-6.0-11.el7rhgs.x86_64 nfs-ganesha-gluster-2.7.3-7.el7rhgs.x86_64 Steps performed for verification- 1.Create 4 node ganesha cluster 2.Create 2 x (4 + 2) Distributed-Disperse Volume.Enable ganesha on the volume 3.Mount the volume on 4 clients v3/v4.1 via same VIP. 4.Create huge data set consisting of small,large and empty directories- 5.Run du -sh on v3 and v4.1 mounts v3 mount -------- # time du -sh 28G . real 23m12.490s user 0m4.817s sys 1m0.564s v4.1 mount --------- # time du -sh 28G . real 11m3.559s user 0m1.582s sys 0m22.307s # time du -sh 28G . real 31m20.569s user 0m7.041s sys 1m46.864s Moving this BZ to verified state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3252