Bug 1543072

Summary: Listing a directory that contains many directories takes to much time
Product: [Community] GlusterFS Reporter: Marian Horban <mhorban>
Component: distributeAssignee: Raghavendra G <rgowdapp>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.10CC: bugs, mhorban, rgowdapp
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:24:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1546649    
Bug Blocks:    

Description Marian Horban 2018-02-07 16:54:53 UTC
Description of problem:
Listing a directory that contains many directories takes to much time


Version-Release number of selected component (if applicable):
Reproduced on 3.10 and 3.13

How reproducible:
100%

Steps to Reproduce:
1. Prepare env:
service glusterd start
truncate -s 1GB /srv/disk{1..4}
for i in `seq 1 4`;do mkfs.xfs -i size=512 /srv/disk$i ;done
mkdir -p /export/brick{1..4}
for i in `seq 1 4`;do echo "/srv/disk$i /export/brick$i xfs loop,inode64,noatime,nodiratime 0 0" >> /etc/fstab ;done
mount -a
gluster volume create test replica 2 transport tcp `hostname`:/export/brick{1..4}/data force
gluster volume start test
mkdir /mnt/test
mount -t glusterfs `hostname`:test /mnt/test 

2. Create 1000 directories and 1000 files:
for i in {0000..9999} ; do mkdir -p "/mnt/test/dirs/DIR_${i}" ; done
mkdir /mnt/test/files
for i in {0000..9999} ; do touch "/mnt/test/files/file_${i}" ; done 

3. Check that listing of files is much faster that listing of dirs:
#time ls /mnt/test/dirs
...
real    0m7,929s
user    0m0,071s
sys    0m0,160s

#time ls /mnt/test/files
...
real    0m0,479s
user    0m0,044s
sys    0m0,029s 

Actual results:
Listing of dirs is much slower that listing of files

Expected results:
Listing of dirs must be faster or at least the same as listing of files

Additional info:
The same behaviour is observed on 3.13 version

Comment 1 Raghavendra G 2018-02-09 07:56:27 UTC
From the email Marian had sent, the configuration is a single brick DHT. The problem was first observed in a setup of 192x2 distribute-replicate, but can be seen on a DHT with single subvolume too.

I suspect handling of dentries pointing to directories in dht_readdirp_cbk to be the cause for this bug. Note that usually ls is aliased with some options and listing is usually accompanied by stat on the dentries.

I think following are the reasons for increased latency:

* readdirplus won't provide nodeid/gfid and stat information for dentries pointing to a directory. So, if application is interested in stat/inode information (as most likely in this case), it has to do a lookup.
* In configurations where DHT has large number of subvolumes, the latency of this lookup is the latency of slowest lookup to each of the subvolumes. Note that latency is NOT cumulative latency of all lookup latencies but is equal to the LARGEST latency.

We had discussed this problem with performance engineering group in Redhat. In a realistic workload this problem might not be too common as this problem occurs only in datasets with directories constitute significantly larger percentage of data. Nevertheless, we would be interested in knowing of real world use-cases where users have large number of directories.

Note that all this I've said above is just a hypothesis and we need to do some experiments to confirm the hypothesis. Some of the experiments I can think of are:
* Making sure that ls just issues readdir without requiring stat. This can be done by unaliasing ls before issuing the cmd, so that any options are stripped out and issuing plain ls without any options. If directories and files consume same time for listing in this test, we can be sure that the extra latencies seen for directories is due to lookup/stat.
* Disabling readdirplus in the entire stack [1] and running ls. Note that it is not necessary to unalias ls in this test as there'll be a lookup for files too (since readdirplus is disabled for them too). Note that this test might yield slightly different results for large volumes due to the fact that lookup latency will be different for directory and files. But, in a single subvolume test they should be same.

Based on the above reasoning, changing the component to distribute. But, can be changed if experiments show different results.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html

Comment 2 Raghavendra G 2018-02-09 07:58:46 UTC
Marian,

Is it possible to do the experiments mentioned in comment #2?

regards,
Raghavendra

Comment 3 Marian Horban 2018-02-12 12:50:27 UTC
Hi Raghavendra,

My 'ls' command executed stat for each item.
So I tried python
os.listdir('/dirs')
And behaviour was the same.

I tried to set an option cluster.readdir-optimize and from first prospective it looks like it fixes the problem. But I'm still testing...

Comment 4 Raghavendra G 2018-02-13 03:24:24 UTC
(In reply to Marian Horban from comment #3)
> Hi Raghavendra,
> 
> My 'ls' command executed stat for each item.
> So I tried python
> os.listdir('/dirs')
> And behaviour was the same.
> 
> I tried to set an option cluster.readdir-optimize and from first prospective
> it looks like it fixes the problem. But I'm still testing...

Are you setting this option in a volume with single brick? or in a larger volume (192 x 2)?

Comment 5 Marian Horban 2018-02-13 08:52:24 UTC
It helps in small local env and in (192 x 2) env. Still testing...

Comment 6 Marian Horban 2018-02-14 11:32:36 UTC
I can see that problem still can be reproduced when cache is clear by restarting glusterfsd processes.

I've created 100000 empty files and 63000 directories inside 200_000_files_dirs directory
First call for listing directories is very long:
>>> b = datetime.datetime.now(); l = volume.listdir('/200_000_files_dirs'); a = datetime.datetime.now(); print len(l);  print a-b;
163720
0:01:14.730083
>>> b = datetime.datetime.now(); l = volume.listdir('/200_000_files_dirs'); a = datetime.datetime.now(); print len(l);  print a-b;
163720
0:00:17.222857

Comment 7 Shyamsundar 2018-06-20 18:24:29 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.