Description of problem: The current behavior of the option disperse.eager-lock is not optimal: disperse.eager-lock on: good performance on large-file read/write, but actually degrades performance for many file create workloads. The degradation for file create workloads seems to be due to lock contention on directories. disperse.eager-lock off: loses the performance advantages of eager-locking for large-file access, but better performance on file create workloads than with disperse.eager-lock on. We should fix eager locking so that it can be kept on without incurring a performance penalty on file create workloads. Version-Release number of selected component (if applicable): glusterfs*-3.12.1-2.el7.x86_64 How reproducible: Consistently
IMO, a solution that adds a separate option to control eager locking in the case of directories would be acceptable, and probably simple. So the default could be: disperse.dir-eager-lock off: applies to directories disperse.eager-lock on: applies to regular files Would that work?
I guess this problem happens when multiple clients are creating files on the same directory, right ? otherwise, eager-locking shouldn't interfere with file creation (in fact it should be faster). In cases where multiple clients access the same directory, then yes, we could keep a separate configuration for this purpose. However, is it really necessary to have it disabled by default ? I think that an scenario where multiple clients are writing to the same directory is less probable than one where all writes to a single directory come from the same client.
BTW, I realize that the norm would be to provide performance results as supporting evidence. I'm currently waiting on some systems to become available, so can't do that right away. Will do so as soon as I can. But this problem has been seen multiple times in the past, in customer cases as well as our internal testing, and this bz has been long pending. So decided to go ahead and open it to avoid more delays. (In reply to Xavier Hernandez from comment #2) > I guess this problem happens when multiple clients are creating files on the > same directory, right ? otherwise, eager-locking shouldn't interfere with > file creation (in fact it should be faster). Yes, with multiple clients. But we have seen degradation even when each client is creating its data set in its own private directory. Seemed like there was contention on directories in the path.
I've uploaded a patch on branch master (Bug #1502610) to create the new option. I've named the option 'other-eager-lock' since it will control eager locking for entries other than regular files (directories, symbolic links, pipes, ...). For now I've left the default value to 'on', but you can change it via 'gluster volume set' to do tests. Depending on the results, we can decide the final default value.
I have built the rpms from the master. Will try it out once I take care of some other work.
After some discussion, it was decided that this option won't be backported to 3.12 because it's considered a new feature. It's present on 3.13+.