Bug 298931 - on-disk unlinked inode meta-data leak
on-disk unlinked inode meta-data leak
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Abhijith Das
Cluster QE
Depends On: 218576
  Show dependency treegraph
Reported: 2007-09-20 14:56 EDT by Josef Bacik
Modified: 2010-10-22 14:49 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-09-17 13:06:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Have scand reclaim metadata from X% of rgrps at a time (4.38 KB, patch)
2008-08-25 13:15 EDT, Abhijith Das
no flags Details | Diff
New and improved (5.94 KB, patch)
2008-09-10 17:52 EDT, Abhijith Das
no flags Details | Diff
Simple and improved - tunable is now absolute number (5.54 KB, patch)
2008-09-12 17:19 EDT, Abhijith Das
no flags Details | Diff

  None (edit)
Comment 3 RHEL Product and Program Management 2008-06-04 18:48:41 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 4 Abhijith Das 2008-08-25 13:15:11 EDT
Created attachment 314932 [details]
Have scand reclaim metadata from X% of rgrps at a time

This patch is not the same idea as Wendy suggested in an earlier comment. It tries to free metadata from a subset of total rgrps limited by a tunable.

This might not be as efficient as keeping an extra list of rgrps that were deleted from; I haven't run any performance numbers.

The tunable rgrp_free_perc is the percentage of the total rgrps (sdp->sd_rgcount). With the default value of 5% (chosen arbitrarily), a maximum of 5% of the total rgrps' unused metadata will be freed. Let me know what would be preferable: a) percent of total rgrps (as is) or b) absolute value for max rgrps to free metadata from.

Bob/Steve, please let me know your thoughts on this. If you think this won't work or will be inefficient, I'll fall back to the list idea.
Comment 5 Abhijith Das 2008-09-10 17:52:14 EDT
Created attachment 316363 [details]
New and improved

Added a flag RD_FL_META2FREE to struct rgrpd that is set when there is freeable metadata in the rgrp and cleared when we reclaim from it.
Comment 6 Abhijith Das 2008-09-12 17:19:24 EDT
Created attachment 316624 [details]
Simple and improved - tunable is now absolute number

The percentage of total rgrps tunable is now an absolute value. Specify the max number of rgrps to free metadata from in each gfs_inoded cycle.
Comment 7 Abhijith Das 2008-09-12 17:48:06 EDT
pushed above patch to RHEL5, STABLE2 and master.
Comment 8 Abhijith Das 2008-09-16 10:23:17 EDT
The patch above locks the entire filesystem each time it needs to free metadata from an rgrp. I ran a simple test case to see what the impact is on performance.

The test is run and timed simultaneously on all cluster nodes. Each node operates in a separate directory. It creates 10 subdirectories and fills each subdirectory with 10000 1MB files and removes a subdirectory as soon as the 10000th file is written into it.

Let me know your thoughts about this... whether I should back out the patch or not.

GFS without the reclaim patch

real    113m4.499s
user    1m22.279s
sys     19m54.774s
[root@camel ~]#

real    126m18.504s
user    1m22.707s
sys     19m52.207s
[root@merit ~]#

real    127m0.553s
user    1m22.073s
sys     19m48.148s
[root@winston ~]#

real    110m1.599s
user    0m54.464s
sys     20m33.615s
[root@salem ~]#

GFS with the reclaim patch (tunable default max rgrps to free from = 5)

real    123m20.349s
user    1m21.887s
sys     20m1.453s
[root@camel ~]#

real    133m12.838s
user    1m22.173s
sys     19m44.736s
[root@merit ~]#

real    132m35.304s
user    1m22.271s
sys     20m5.701s
[root@winston ~]#

real    111m58.369s
user    0m54.438s
sys     20m45.796s
[root@salem ~]#
Comment 9 Abhijith Das 2008-09-16 16:45:43 EDT
I've got a few more numbers and they don't look pretty for this patch, unfortunately :(. I added a timer to time the beginning and end of gfs_reclaim_metadata(). This tells us the number of rgrps from which metadata was freed during that cycle and how long it took in seconds.
The test is the same as before, just that, to save time, I copied zero length files instead of 1MB. Cuts the test run time to about 10 mins.

real    10m37.203s
user    0m57.045s
sys     3m4.040s

Freed 1: time: 0.908551 s
Freed 2: time: 2.255598 s
Freed 2: time: 7.011848 s
Freed 2: time: 4.146901 s
Freed 2: time: 4.406757 s
Freed 2: time: 1.308091 s
Freed 2: time: 2.713996 s
Freed 2: time: 0.632563 s
Freed 2: time: 1.445605 s
Freed 2: time: 1.444344 s

real    9m5.089s
user    0m57.568s
sys     3m5.484s

Freed 1: time: 0.420229 s
Freed 2: time: 3.806471 s
Freed 2: time: 4.327761 s
Freed 2: time: 1.462549 s
Freed 2: time: 5.276235 s
Freed 2: time: 2.105195 s
Freed 2: time: 0.948678 s
Freed 2: time: 0.816133 s
Freed 1: time: 1.596697 s
Freed 2: time: 3.547658 s
real    9m13.812s
user    0m58.030s
sys     3m8.511s

Freed 1: time: 27.362559 s
Freed 1: time: 1.007744 s
Freed 2: time: 1.057754 s
Freed 2: time: 3.942767 s
Freed 2: time: 1.332333 s
Freed 2: time: 5.283363 s
Freed 2: time: 3.798999 s
Freed 1: time: 15.359068 s
Freed 1: time: 5.588834 s
Freed 2: time: 0.879379 s
Freed 2: time: 4.959445 s
Freed 2: time: 5.476087 s

real    9m4.354s
user    0m38.753s
sys     2m29.936s

Freed 1: time: 0.606883 s
Freed 2: time: 3.385412 s
Freed 2: time: 4.650354 s
Freed 3: time: 1.552037 s
Freed 2: time: 4.872653 s
Freed 3: time: 5.114137 s
Freed 1: time: 0.392321 s
Freed 2: time: 1.153888 s
Freed 2: time: 8.842174 s
Freed 2: time: 1.612880 s
Freed 1: time: 8.140755 s
Comment 10 Abhijith Das 2008-09-17 13:06:17 EDT
Automatic reclaim of unlinked metadata presents greater cost than benefit. The
reclaim operation locks the filesystem while it attempts to free the unused
metadata, which hurts filesystem performance and certainly doesn't scale well.

The sensible way to perform reclaim is to execute "gfs_tool reclaim
<mountpoint>" as a routine gfs administration activity when the filesystem is
relatively idle.
Comment 11 Jerry Uanino 2008-09-17 19:05:20 EDT
is this fixed in gfs2?
Comment 12 Abhijith Das 2008-09-18 08:53:21 EDT
GFS2 works differently when it comes to reclaiming unlinked inode metadata... and doesn't have this problem IMO. Steve Whitehouse can probably add more.

Note You need to log in before you can comment on or make changes to this bug.