Red Hat Bugzilla – Bug 889334
cluster.min-free-disk does not appear to work as advertised
Last modified: 2013-07-24 13:15:46 EDT
Description of problem:
My test rig consists of 2 nodes, each with 2T of disk running ZFS on Linux. The ZFS mountpoint is /tank. I created a distributed volume called export, mounted using GlusterFS-FUSE at /export. I have a gluster volume quota set at 4TB. I tried setting cluster.min-free-disk at 20% and also tried a value of 3TB. In either case, Gluster will allow the volume to completely fill, which in turn means each brick becomes 100% full.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create distributed volume
2. gluster volume set cluster.min-free-disk 20%
3. Keep copying files to the gluster mount until it is 100% full
4. Observe that each brick is 100% full, disregarding cluster.min-free-disk
Each brick becomes 100% full
Each brick becomes 80% full then stops sending data to that brick
seems like bug 852889. We had not backported the fix to release-3.3 branch, and hence the issue is still seen.
Hi Scott, can you please try with Byte size instead of % value to see if that works?
Avra, please backport http://review.gluster.org/3918 to release-3.3 branch too.
I tried with a byte size, e.g. 3000GB and it did not work. The ZFS volume filled to 100%.
According to the comment in function dht_create:
/* Choose the minimum filled volume, and create the
files there */
When a brick is filled(say avail_percent < min_free_disk）,
we will choose the minimum filled volume, no matter how filled it is, as long as it is the minimum filled one. We create the files there.
And there is a quote from glusterfs manual：
cluster.min-free-disk: Specified the percentage of disk space that must be kept free
So, I`m totally confused.
By the way, the volume with only one brick will omit the option cluster.min-free-disk because it does not have dht xlator.
The comment you see in the code is right, ie, the intention of the option was to provide a mechanism to balance out when the limit is reached, but filesystem won't become read-only or stop functioning once the min-free-disk limit is met.
Where did you find the docs? I guess, we should change the documentation to reflect it than fixing the behavior. distribute's option is like a soft quota, where it starts warning about disk limit getting reached, where as if somebody needs 'hard' quota where they want to stop writing to the volume after the limit is reached, they have to use 'gluster volume quota <VOL> enable'
Created attachment 669528 [details]
Patch to fix the description
This should be our 'help' string for the option, so there is no misunderstanding.
you are right, 'help' string is the point.
The problem with not fixing this behavior is that in some cases (i.e. ZFS) you absolutely do not want to fill a brick to 100% usage because file access times will slow to a crawl. Setting a volume quota would work fine IF you have homogenous bricks. It will not work correctly if you have a heterogeneous environment.
I again ask that min-free-disk become a hard limit, so as to limit the amount of space used by Gluster on any one node.
(In reply to comment #7)
> The problem with not fixing this behavior is that in some cases (i.e. ZFS)
> you absolutely do not want to fill a brick to 100% usage because file access
> times will slow to a crawl. Setting a volume quota would work fine IF you
> have homogenous bricks. It will not work correctly if you have a
> heterogeneous environment.
Good point, yes, we don't have that feature yet, but can't we rather have brick quota from ZFS itself set while setting up the filesystem? Why do you think it should be implemented by the top level distributed filesystem? We don't believe in tuning options as per backend filesystem from glusterfs level. What if few bricks are ZFS and few are XFS ?
> I again ask that min-free-disk become a hard limit, so as to limit the
> amount of space used by Gluster on any one node.
Right now, the min-free-disk option is checked for only new file creates and not for writes, hence making it a hard-limit is not a minor work. Also we believe in doing one thing at each translator and this hard quota limit should be supported only from quota translator.
Technically, it is possible today by loading quota translator on brick process, and setting quota limit there, thus each quota limit takes care of implementing hard-quota, while distribute's min-free-disk does re-directing creates to the right place. (This has to be done by hand-edited volume file, as CLI doesn't support this yet).
http://review.gluster.org/4393 fixed the option documentation...