Red Hat Bugzilla – Bug 298931
on-disk unlinked inode meta-data leak
Last modified: 2010-10-22 14:49:25 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Created attachment 314932 [details]
Have scand reclaim metadata from X% of rgrps at a time
This patch is not the same idea as Wendy suggested in an earlier comment. It tries to free metadata from a subset of total rgrps limited by a tunable.
This might not be as efficient as keeping an extra list of rgrps that were deleted from; I haven't run any performance numbers.
The tunable rgrp_free_perc is the percentage of the total rgrps (sdp->sd_rgcount). With the default value of 5% (chosen arbitrarily), a maximum of 5% of the total rgrps' unused metadata will be freed. Let me know what would be preferable: a) percent of total rgrps (as is) or b) absolute value for max rgrps to free metadata from.
Bob/Steve, please let me know your thoughts on this. If you think this won't work or will be inefficient, I'll fall back to the list idea.
Created attachment 316363 [details]
New and improved
Added a flag RD_FL_META2FREE to struct rgrpd that is set when there is freeable metadata in the rgrp and cleared when we reclaim from it.
Created attachment 316624 [details]
Simple and improved - tunable is now absolute number
The percentage of total rgrps tunable is now an absolute value. Specify the max number of rgrps to free metadata from in each gfs_inoded cycle.
pushed above patch to RHEL5, STABLE2 and master.
The patch above locks the entire filesystem each time it needs to free metadata from an rgrp. I ran a simple test case to see what the impact is on performance.
The test is run and timed simultaneously on all cluster nodes. Each node operates in a separate directory. It creates 10 subdirectories and fills each subdirectory with 10000 1MB files and removes a subdirectory as soon as the 10000th file is written into it.
Let me know your thoughts about this... whether I should back out the patch or not.
GFS without the reclaim patch
GFS with the reclaim patch (tunable default max rgrps to free from = 5)
I've got a few more numbers and they don't look pretty for this patch, unfortunately :(. I added a timer to time the beginning and end of gfs_reclaim_metadata(). This tells us the number of rgrps from which metadata was freed during that cycle and how long it took in seconds.
The test is the same as before, just that, to save time, I copied zero length files instead of 1MB. Cuts the test run time to about 10 mins.
Freed 1: time: 0.908551 s
Freed 2: time: 2.255598 s
Freed 2: time: 7.011848 s
Freed 2: time: 4.146901 s
Freed 2: time: 4.406757 s
Freed 2: time: 1.308091 s
Freed 2: time: 2.713996 s
Freed 2: time: 0.632563 s
Freed 2: time: 1.445605 s
Freed 2: time: 1.444344 s
Freed 1: time: 0.420229 s
Freed 2: time: 3.806471 s
Freed 2: time: 4.327761 s
Freed 2: time: 1.462549 s
Freed 2: time: 5.276235 s
Freed 2: time: 2.105195 s
Freed 2: time: 0.948678 s
Freed 2: time: 0.816133 s
Freed 1: time: 1.596697 s
Freed 2: time: 3.547658 s
Freed 1: time: 27.362559 s
Freed 1: time: 1.007744 s
Freed 2: time: 1.057754 s
Freed 2: time: 3.942767 s
Freed 2: time: 1.332333 s
Freed 2: time: 5.283363 s
Freed 2: time: 3.798999 s
Freed 1: time: 15.359068 s
Freed 1: time: 5.588834 s
Freed 2: time: 0.879379 s
Freed 2: time: 4.959445 s
Freed 2: time: 5.476087 s
Freed 1: time: 0.606883 s
Freed 2: time: 3.385412 s
Freed 2: time: 4.650354 s
Freed 3: time: 1.552037 s
Freed 2: time: 4.872653 s
Freed 3: time: 5.114137 s
Freed 1: time: 0.392321 s
Freed 2: time: 1.153888 s
Freed 2: time: 8.842174 s
Freed 2: time: 1.612880 s
Freed 1: time: 8.140755 s
Automatic reclaim of unlinked metadata presents greater cost than benefit. The
reclaim operation locks the filesystem while it attempts to free the unused
metadata, which hurts filesystem performance and certainly doesn't scale well.
The sensible way to perform reclaim is to execute "gfs_tool reclaim
<mountpoint>" as a routine gfs administration activity when the filesystem is
is this fixed in gfs2?
GFS2 works differently when it comes to reclaiming unlinked inode metadata... and doesn't have this problem IMO. Steve Whitehouse can probably add more.