Bug 298931
Summary: | on-disk unlinked inode meta-data leak | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Josef Bacik <jbacik> | ||||||||
Component: | gfs-kmod | Assignee: | Abhijith Das <adas> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.0 | CC: | djansa, edamato, johnson.eric, juanino, kanderso, rpeterso, tao, teigland | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-09-17 17:06:17 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 218576 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Comment 3
RHEL Program Management
2008-06-04 22:48:41 UTC
Created attachment 314932 [details]
Have scand reclaim metadata from X% of rgrps at a time
This patch is not the same idea as Wendy suggested in an earlier comment. It tries to free metadata from a subset of total rgrps limited by a tunable.
This might not be as efficient as keeping an extra list of rgrps that were deleted from; I haven't run any performance numbers.
The tunable rgrp_free_perc is the percentage of the total rgrps (sdp->sd_rgcount). With the default value of 5% (chosen arbitrarily), a maximum of 5% of the total rgrps' unused metadata will be freed. Let me know what would be preferable: a) percent of total rgrps (as is) or b) absolute value for max rgrps to free metadata from.
Bob/Steve, please let me know your thoughts on this. If you think this won't work or will be inefficient, I'll fall back to the list idea.
Created attachment 316363 [details]
New and improved
Added a flag RD_FL_META2FREE to struct rgrpd that is set when there is freeable metadata in the rgrp and cleared when we reclaim from it.
Created attachment 316624 [details]
Simple and improved - tunable is now absolute number
The percentage of total rgrps tunable is now an absolute value. Specify the max number of rgrps to free metadata from in each gfs_inoded cycle.
pushed above patch to RHEL5, STABLE2 and master. The patch above locks the entire filesystem each time it needs to free metadata from an rgrp. I ran a simple test case to see what the impact is on performance. The test is run and timed simultaneously on all cluster nodes. Each node operates in a separate directory. It creates 10 subdirectories and fills each subdirectory with 10000 1MB files and removes a subdirectory as soon as the 10000th file is written into it. Let me know your thoughts about this... whether I should back out the patch or not. GFS without the reclaim patch ----------------------------- real 113m4.499s user 1m22.279s sys 19m54.774s [root@camel ~]# real 126m18.504s user 1m22.707s sys 19m52.207s [root@merit ~]# real 127m0.553s user 1m22.073s sys 19m48.148s [root@winston ~]# real 110m1.599s user 0m54.464s sys 20m33.615s [root@salem ~]# GFS with the reclaim patch (tunable default max rgrps to free from = 5) ----------------------------------------------------------------------- real 123m20.349s user 1m21.887s sys 20m1.453s [root@camel ~]# real 133m12.838s user 1m22.173s sys 19m44.736s [root@merit ~]# real 132m35.304s user 1m22.271s sys 20m5.701s [root@winston ~]# real 111m58.369s user 0m54.438s sys 20m45.796s [root@salem ~]# I've got a few more numbers and they don't look pretty for this patch, unfortunately :(. I added a timer to time the beginning and end of gfs_reclaim_metadata(). This tells us the number of rgrps from which metadata was freed during that cycle and how long it took in seconds. The test is the same as before, just that, to save time, I copied zero length files instead of 1MB. Cuts the test run time to about 10 mins. camel: real 10m37.203s user 0m57.045s sys 3m4.040s Freed 1: time: 0.908551 s Freed 2: time: 2.255598 s Freed 2: time: 7.011848 s Freed 2: time: 4.146901 s Freed 2: time: 4.406757 s Freed 2: time: 1.308091 s Freed 2: time: 2.713996 s Freed 2: time: 0.632563 s Freed 2: time: 1.445605 s Freed 2: time: 1.444344 s merit: real 9m5.089s user 0m57.568s sys 3m5.484s Freed 1: time: 0.420229 s Freed 2: time: 3.806471 s Freed 2: time: 4.327761 s Freed 2: time: 1.462549 s Freed 2: time: 5.276235 s Freed 2: time: 2.105195 s Freed 2: time: 0.948678 s Freed 2: time: 0.816133 s Freed 1: time: 1.596697 s Freed 2: time: 3.547658 s winston: real 9m13.812s user 0m58.030s sys 3m8.511s Freed 1: time: 27.362559 s Freed 1: time: 1.007744 s Freed 2: time: 1.057754 s Freed 2: time: 3.942767 s Freed 2: time: 1.332333 s Freed 2: time: 5.283363 s Freed 2: time: 3.798999 s Freed 1: time: 15.359068 s Freed 1: time: 5.588834 s Freed 2: time: 0.879379 s Freed 2: time: 4.959445 s Freed 2: time: 5.476087 s salem: real 9m4.354s user 0m38.753s sys 2m29.936s Freed 1: time: 0.606883 s Freed 2: time: 3.385412 s Freed 2: time: 4.650354 s Freed 3: time: 1.552037 s Freed 2: time: 4.872653 s Freed 3: time: 5.114137 s Freed 1: time: 0.392321 s Freed 2: time: 1.153888 s Freed 2: time: 8.842174 s Freed 2: time: 1.612880 s Freed 1: time: 8.140755 s Automatic reclaim of unlinked metadata presents greater cost than benefit. The reclaim operation locks the filesystem while it attempts to free the unused metadata, which hurts filesystem performance and certainly doesn't scale well. The sensible way to perform reclaim is to execute "gfs_tool reclaim <mountpoint>" as a routine gfs administration activity when the filesystem is relatively idle. is this fixed in gfs2? GFS2 works differently when it comes to reclaiming unlinked inode metadata... and doesn't have this problem IMO. Steve Whitehouse can probably add more. |