Description of problem: Listing dir in gluster share takes approx. 100 times longer than the local disk Version-Release number of selected component (if applicable): On Server : glusterfs 3.8.4 built on Mar 8 2017 06:17:27 On Client : glusterfs 3.8.4 built on Mar 8 2017 06:17:26 How reproducible: Customer Environment Additional info: Implementing md-cache feature also doesn't help here much.
I see that there are around 318848 number of files/directories within arc.0530 # ls -l arc.0530 |wc -l 318848 Is it possible to get how many directories are there in this listing?
I also require following information: * time of ls with readdirplus (aka readdirp) disabled in the entire stack (we can use a temporary mount for this). [1] explains how to disable readdirplus [1] http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html
Also i see that network.inode-lru-limit is not set, had the customer not used the "gluster volume set <vol> group metadata-cache" ? The default value of network.inode-lru-limit is ~16000, which is very less. If the group metadata-cache was used the value of network.inode-lru-limit was changed to 50000. As mentioned in the admin guide, network.inode-lru-limit should be changed to larger value based on the workload. I would suggest to set network.inode-lru-limit to 200,000 as the number of active files are 318848. Note that incresing this value increases the memory footprint of the brick processes. Changing this value also had helped another customer [1] Let us know if this helped? [1]https://bugzilla.redhat.com/show_bug.cgi?id=1441417#c15
I think we got a very good improvement after disabling readdirp. The first iteration have taken around 2 and half minutes but next iterations are completed in around half minutes only. I'm also waiting for a feedback from cu if they have implemented the network.inode-lru-limit with these result or not. 1) number of directory: [root@XYZ]# find -maxdepth 1 -type d . ./arch 2. mount with use-readdirp=no # time ls -l arc.0530 | wc -l 274020 real 2m38.234s user 0m4.734s sys 0m4.554s # time ls -l arc.0530 | wc -l 274020 real 0m36.141s user 0m4.180s sys 0m2.736s # time ls -l arc.0530 | wc -l 274020 real 0m35.855s user 0m4.172s sys 0m2.698s # time ls -l arc.0530 | wc -l 274020 real 0m35.780s user 0m4.175s sys 0m2.768s
This is new data customer has shared after setting network.inode-lru-limit to 200000. 1. results without disabling readdirp: # time ls -l arc.0530 |wc -l 318848 real 2m52.289s user 0m6.553s sys 0m6.141s # time ls -l arc.0530 |wc -l 318848 real 2m53.819s user 0m6.407s sys 0m5.988s # time ls -l arc.0530 |wc -l 318848 real 3m10.560s user 0m6.858s sys 0m6.006s 2. results with use-readdirp=no # time ls -l arc.0530 |wc -l 318848 real 2m55.979s user 0m6.097s sys 0m5.192s # time ls -l arc.0530 |wc -l 318848 real 0m31.907s user 0m5.315s sys 0m2.921s # time ls -l arc.0530 |wc -l 318848 real 0m21.481s user 0m5.063s sys 0m2.761s # time ls -l arc.0530 |wc -l 318848 real 0m21.537s user 0m5.198s sys 0m2.656s
Created attachment 1293751 [details] Fuse dump
Abhishek, Is there anything pending from our side? If not, can we close this bug. Thanks, Susant
Susant, I see that you have cloned this for upstream. This bug should be kept open till it doesn't get part of downstream release. let me know if I misunderstood anything. -Bipin
Considering the bug got fixed in 3.12.2 (upstream) the fix should be part of RHGS3.4.0. Bipin/Abhishek, if you need any more info, please reopen the bug.