Bug 218576

Summary:	on-disk unlinked inode meta-data leak
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Wendy Cheng <nobody+wcheng>
Component:	gfs	Assignee:	Wendy Cheng <nobody+wcheng>
Status:	CLOSED WONTFIX	QA Contact:	GFS Bugs <gfs-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	johnson.eric, juanino
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-09-18 15:48:23 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	298931

Description Wendy Cheng 2006-12-06 05:28:30 UTC

Description of problem:
From linux-cluster mailing list - look like we never put the unlinked 
inode meta-data blocks back to free-list until gfs_reclaim command is 
run:

Date: Tue, 5 Dec 2006 10:51:49 -0500
From: "eric johnson" <johnson.eric>
To: "linux clustering" <linux-cluster>

Suppose I'm on a GFS partition with 50 GBs free.

And then I rudely drain it of inodes with a perl script that looks like this.

Executive summary - make a bunch of randomly named files...

--makefiles.pl

my $i=0;
my $max=shift(@ARGV);
my $d=shift(@ARGV);
if (not defined $d) {
    $d="";
}

foreach(my $i=0;$i<$max;$i++) {
    my $filename=sprintf("%s-%d%s",rand()*100000,$i,$d);
    open FOO, ">$filename";
    print FOO "This is fun!!\n";
    close FOO;
}

In fact, to be extra cruel, I make a bunch of subdirectories on my
partition, and then run an instance of this script in each
subdirectory until the box is saturated with work to do and everyone
is busily draining away the inodes.  This takes a good 6 hours.

Then when the inodes are drained, I kill the run away scripts and then
delete all the files that were created.

While everything has been deleted, I still don't get back all of my
space.  In fact, the vast majority of it still appears to be chewed up
by GFS and it won't give back the space.  So when I copy large files
to it I get errors like...

cp: writing `./src.tar.gz': No space left on device
When that file is clearly under a gig and should fit.

But if I run a gfs_reclaim on that partition, then it all magically
comes back and then I can place largish files back on the disk.

Is this a well known characterstic of GFS that I somehow missed reading about?

-Eric

Comment 1 Wendy Cheng 2006-12-06 05:38:03 UTC

We could piggyback the reclaim logic into gfs_inoded. However, the fix will
need a little bit more involved since we have to make sure no other node can
still hold this inode before assigning this inode to another new file.

Comment 2 Rob Kenna 2006-12-06 15:01:08 UTC

PM ack for 4.5

Comment 3 Jerry Uanino 2006-12-08 15:53:13 UTC

redhat support ticket# 1157176 opened to match this.

Comment 5 Wendy Cheng 2006-12-13 15:50:58 UTC

ok, thanks .... just read the support ticket now. Will see whether the fix can
make R4.5 cut-off date.

Comment 6 Wendy Cheng 2007-01-04 06:14:31 UTC

Really think about this issue .. we hardly return meta-data back to RG unless
"gfs_tool reclaim" is specifically called. I'm amazed we have this problem
hidden for so long. 

Fixing this may not be trivial though. We could piggy-back the reclaim logic 
into one of gfs daemons but the current (reclaim) logic walks thru every RG 
- that is a huge performance hit for whoever we piggy-back the logic onto. 
If we add the reclaiming into unlink code (do it whenever file is deleted
- this is how GFS2 does), other than performance hit, we could bump into 
similar locking issue as bugzilla 221237 (GFS2 rename deadlock under
anaconda). 

Will try to get a test patch ready tomorrow - I'm leaning toward doing it 
during file unlink time by chaining the rgd into a list; then let 
gfs_inoded do the actual processing - so we only walk thru the rg that 
needs clean-up and alleviate performance hit to the current gfs unlink 
code. 

I would like to make this fix identical to both GFS1 and GFS2 (for bz 221237).

Comment 8 Wendy Cheng 2007-09-18 15:48:23 UTC

With current development resources, we would not be able to work on this
issue on RHEL 4. Add this into TO-DO list for RHEL5.2.