Bug 889334 - cluster.min-free-disk does not appear to work as advertised
cluster.min-free-disk does not appear to work as advertised
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
3.3.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Avra Sengupta
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-20 15:27 EST by scotty
Modified: 2013-07-24 13:15 EDT (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:15:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix the description (1.99 KB, patch)
2012-12-27 00:45 EST, Amar Tumballi
no flags Details | Diff

  None (edit)
Description scotty 2012-12-20 15:27:25 EST
Description of problem:

My test rig consists of 2 nodes, each with 2T of disk running ZFS on Linux.  The ZFS mountpoint is /tank.  I created a distributed volume called export, mounted using GlusterFS-FUSE at /export.  I have a gluster volume quota set at 4TB.  I tried setting cluster.min-free-disk at 20% and also tried a value of 3TB.  In either case, Gluster will allow the volume to completely fill, which in turn means each brick becomes 100% full.

Version-Release number of selected component (if applicable):

GlusterFS 3.3.1

How reproducible:

Every time

Steps to Reproduce:
1. Create distributed volume
2. gluster volume set cluster.min-free-disk 20%
3. Keep copying files to the gluster mount until it is 100% full
4. Observe that each brick is 100% full, disregarding cluster.min-free-disk
  
Actual results:

Each brick becomes 100% full

Expected results:

Each brick becomes 80% full then stops sending data to that brick

Additional info:
Comment 1 Amar Tumballi 2012-12-21 01:12:48 EST
seems like bug 852889. We had not backported the fix to release-3.3 branch, and hence the issue is still seen.

Hi Scott, can you please try with Byte size instead of % value to see if that works? 

Avra, please backport http://review.gluster.org/3918 to release-3.3 branch too.
Comment 2 scotty 2012-12-21 10:19:01 EST
I tried with a byte size, e.g. 3000GB and it did not work.  The ZFS volume filled to 100%.
Comment 3 Jules Wang 2012-12-26 22:28:27 EST
According to the comment in function dht_create:

/* Choose the minimum filled volume, and create the
           files there */


When a brick is filled(say avail_percent < min_free_disk),

we will choose the minimum filled volume, no matter how filled it is, as long as it is the minimum filled one. We create the files there.


And there is a quote from glusterfs manual:

cluster.min-free-disk: Specified the percentage of disk space that must be kept free


So, I`m totally confused.

By the way, the volume with only one brick will omit the option cluster.min-free-disk because it does not have dht xlator.
Comment 4 Amar Tumballi 2012-12-27 00:33:31 EST
Jules Wang,

The comment you see in the code is right, ie, the intention of the option was to provide a mechanism to balance out when the limit is reached, but filesystem won't become read-only or stop functioning once the min-free-disk limit is met.

Where did you find the docs? I guess, we should change the documentation to reflect it than fixing the behavior. distribute's option is like a soft quota, where it starts warning about disk limit getting reached, where as if somebody needs 'hard' quota where they want to stop writing to the volume after the limit is reached, they have to use 'gluster volume quota <VOL> enable'
Comment 5 Amar Tumballi 2012-12-27 00:45:07 EST
Created attachment 669528 [details]
Patch to fix the description

This should be our 'help' string for the option, so there is no misunderstanding.
Comment 6 Jules Wang 2012-12-27 02:26:59 EST
you are right, 'help' string is the point.
Comment 7 scotty 2012-12-27 09:14:48 EST
The problem with not fixing this behavior is that in some cases (i.e. ZFS) you absolutely do not want to fill a brick to 100% usage because file access times will slow to a crawl.  Setting a volume quota would work fine IF you have homogenous bricks.  It will not work correctly if you have a heterogeneous environment.

I again ask that min-free-disk become a hard limit, so as to limit the amount of space used by Gluster on any one node.
Comment 8 Amar Tumballi 2012-12-28 03:29:04 EST
(In reply to comment #7)
> The problem with not fixing this behavior is that in some cases (i.e. ZFS)
> you absolutely do not want to fill a brick to 100% usage because file access
> times will slow to a crawl.  Setting a volume quota would work fine IF you
> have homogenous bricks.  It will not work correctly if you have a
> heterogeneous environment.

Good point, yes, we don't have that feature yet, but can't we rather have brick quota from ZFS itself set while setting up the filesystem? Why do you think it should be implemented by the top level distributed filesystem? We don't believe in tuning options as per backend filesystem from glusterfs level. What if few bricks are ZFS and few are XFS ?

> 
> I again ask that min-free-disk become a hard limit, so as to limit the
> amount of space used by Gluster on any one node.

Right now, the min-free-disk option is checked for only new file creates and not for writes, hence making it a hard-limit is not a minor work. Also we believe in doing one thing at each translator and this hard quota limit should be supported only from quota translator.

Technically, it is possible today by loading quota translator on brick process, and setting quota limit there, thus each quota limit takes care of implementing hard-quota, while distribute's min-free-disk does re-directing creates to the right place. (This has to be done by hand-edited volume file, as CLI doesn't support this yet).
Comment 9 Amar Tumballi 2013-02-06 01:03:17 EST
http://review.gluster.org/4393 fixed the option documentation...

Note You need to log in before you can comment on or make changes to this bug.