Description of problem: Listing a directory that contains many directories takes to much time Version-Release number of selected component (if applicable): Reproduced on 3.10 and 3.13 How reproducible: 100% Steps to Reproduce: 1. Prepare env: service glusterd start truncate -s 1GB /srv/disk{1..4} for i in `seq 1 4`;do mkfs.xfs -i size=512 /srv/disk$i ;done mkdir -p /export/brick{1..4} for i in `seq 1 4`;do echo "/srv/disk$i /export/brick$i xfs loop,inode64,noatime,nodiratime 0 0" >> /etc/fstab ;done mount -a gluster volume create test replica 2 transport tcp `hostname`:/export/brick{1..4}/data force gluster volume start test mkdir /mnt/test mount -t glusterfs `hostname`:test /mnt/test 2. Create 1000 directories and 1000 files: for i in {0000..9999} ; do mkdir -p "/mnt/test/dirs/DIR_${i}" ; done mkdir /mnt/test/files for i in {0000..9999} ; do touch "/mnt/test/files/file_${i}" ; done 3. Check that listing of files is much faster that listing of dirs: #time ls /mnt/test/dirs ... real 0m7,929s user 0m0,071s sys 0m0,160s #time ls /mnt/test/files ... real 0m0,479s user 0m0,044s sys 0m0,029s Actual results: Listing of dirs is much slower that listing of files Expected results: Listing of dirs must be faster or at least the same as listing of files Additional info: The same behaviour is observed on 3.13 version
From the email Marian had sent, the configuration is a single brick DHT. The problem was first observed in a setup of 192x2 distribute-replicate, but can be seen on a DHT with single subvolume too. I suspect handling of dentries pointing to directories in dht_readdirp_cbk to be the cause for this bug. Note that usually ls is aliased with some options and listing is usually accompanied by stat on the dentries. I think following are the reasons for increased latency: * readdirplus won't provide nodeid/gfid and stat information for dentries pointing to a directory. So, if application is interested in stat/inode information (as most likely in this case), it has to do a lookup. * In configurations where DHT has large number of subvolumes, the latency of this lookup is the latency of slowest lookup to each of the subvolumes. Note that latency is NOT cumulative latency of all lookup latencies but is equal to the LARGEST latency. We had discussed this problem with performance engineering group in Redhat. In a realistic workload this problem might not be too common as this problem occurs only in datasets with directories constitute significantly larger percentage of data. Nevertheless, we would be interested in knowing of real world use-cases where users have large number of directories. Note that all this I've said above is just a hypothesis and we need to do some experiments to confirm the hypothesis. Some of the experiments I can think of are: * Making sure that ls just issues readdir without requiring stat. This can be done by unaliasing ls before issuing the cmd, so that any options are stripped out and issuing plain ls without any options. If directories and files consume same time for listing in this test, we can be sure that the extra latencies seen for directories is due to lookup/stat. * Disabling readdirplus in the entire stack [1] and running ls. Note that it is not necessary to unalias ls in this test as there'll be a lookup for files too (since readdirplus is disabled for them too). Note that this test might yield slightly different results for large volumes due to the fact that lookup latency will be different for directory and files. But, in a single subvolume test they should be same. Based on the above reasoning, changing the component to distribute. But, can be changed if experiments show different results. [1] http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html
Marian, Is it possible to do the experiments mentioned in comment #2? regards, Raghavendra
Hi Raghavendra, My 'ls' command executed stat for each item. So I tried python os.listdir('/dirs') And behaviour was the same. I tried to set an option cluster.readdir-optimize and from first prospective it looks like it fixes the problem. But I'm still testing...
(In reply to Marian Horban from comment #3) > Hi Raghavendra, > > My 'ls' command executed stat for each item. > So I tried python > os.listdir('/dirs') > And behaviour was the same. > > I tried to set an option cluster.readdir-optimize and from first prospective > it looks like it fixes the problem. But I'm still testing... Are you setting this option in a volume with single brick? or in a larger volume (192 x 2)?
It helps in small local env and in (192 x 2) env. Still testing...
I can see that problem still can be reproduced when cache is clear by restarting glusterfsd processes. I've created 100000 empty files and 63000 directories inside 200_000_files_dirs directory First call for listing directories is very long: >>> b = datetime.datetime.now(); l = volume.listdir('/200_000_files_dirs'); a = datetime.datetime.now(); print len(l); print a-b; 163720 0:01:14.730083 >>> b = datetime.datetime.now(); l = volume.listdir('/200_000_files_dirs'); a = datetime.datetime.now(); print len(l); print a-b; 163720 0:00:17.222857
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.