Red Hat Bugzilla – Bug 1290036
[RFE] Slow `ls` performance and directory listing within Samba
Last modified: 2017-03-25 12:26:30 EDT
Description of problem:
The ls command takes a big amount of time to show results within a large Gluster volume. The same goes for a directory listing within a SMB share.
This is due to the nature of GlusterFS.
Version-Release number of selected component (if applicable):
Setup a large Gluster volume, using at least 4 nodes.
Mount the glustervolume, put data into it and run the `ls` cmd on the mountpoint.
Steps to Reproduce:
1. Setup Red Hat Gluster Storage using at least 4 nodes.
2. Create a distributed-replicated volume and mount the volume
3. Put data into the Glustervol and then run the `ls` cmd
Slow response, results take a long time. Much more time when compared to a local filesystem.
Normal response to the `ls` command
Would there perhaps be an option to make use of the Glusterfind mechanism, or a ocal db file or some other mechanism to query the glusterfind db or info to speed up the ls findings significantly ? Similar for SMB/
Is it possible to give your volume info? I want to know how many bricks were there in the volume. Your description says its 4 node brick, but there can be multiple bricks from same node and hence we cannot deduce how many bricks were there.
volume information can be found using:
#gluster volume info <your-volume-name>
I need some more information too.
1. Can you confirm whether ls is unaliased? The reason I am asking this question is that commonly ls is aliased to include options (like ls --color) which will result in stat being done on each directory entry. However plain ls will result in just readdir. So, we will be able to figure out whether the performance hit is in readdir or stat or some other syscall. It would be helpful if you can give us some numbers on plain ls (readdir) performance.
2. What data is present on the mount point? How many files/directories are present? Are directories nested? If yes, how are they nested - are they nested deeply in a vertical fashion (/a/b/c/d/e/f etc) or are they nested in horizontal fashion (/a/b, /a/c, /a/d, /a/e, /a/f etc).
I don't have a production setup available by myself, this is actually what I get as a feedback from customers and administrators. So I cannot give additional technical info at this moment, other then the fact that `ls` and directory listings by SMB are showing poor performances in common.
It is known that this is a kind of common behaviour for Gluster (BZ1117833).
So while this cannot be changed easily, my suggestion here is to see or check from a technical perspective if it would perhaps be possible to make some smart use of the Glusterfind information, to query that instead of the Gluster filesystem by itself.
Normally, the Glusterfind info is maintained when changes on the GlusterFS occur. Therefor I was wondering if it would perhaps be possible to query against the Glusterfind DB instead of performing an FS crawl when a directory listing is requested as a suggestion.
So, I'm not asking to dive into the current situation but suggest for a possible enhancement by this BZ.
I hope this explains the context somewhat more :-)