Bug 490649

Summary: GFS2: gfs2_grow fails on a full file system
Product: Red Hat Enterprise Linux 5 Reporter: Christine Caulfield <ccaulfie>
Component: gfs2-utilsAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: high    
Version: 5.3CC: adas, casmith, ccoffey, iannis, mjuricek, rpeterso, sbradley, swhiteho, tao
Target Milestone: rc   
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: gfs2-utils-0.1.62-29.el5 Doc Type: Bug Fix
Doc Text:
In order to grow a gfs2 filesystem, gfs2 needs to add additional resource groups to manage the new space. gfs2_grow does this by writing to the rindex file. If there are no free blocks available in the filesystem at its current size, and the last block of the rindex file is too full to add another resource group entry, gfs2_grow will be unable to write out the necessary information for gfs2 to be able to use the new space. When this happens, gfs2_grow is unable to grow the filesystem. This problem can only happen on filesytems where the last block of the rindex file is too full to add another resource group entry. Whether or not this is the case is based on the filesystem size, the blocksize, and the resource group size. If this problem occurs, gfs2_grow will report "Error writing new rindex entries;aborted." In this case, the user must remove or truncate a file to free up spacce for gfs2_grow to complete. Once the filesystem has been grown, the file can safely be added back to the gfs2 filesystem.
Story Points: ---
Clone Of:
: 659123 711451 (view as bug list) Environment:
Last Closed: 2011-07-21 11:02:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 626585, 661904    
Bug Blocks: 711451    
Attachments:
Description Flags
information on how much space is necessary to grow the filesystem.
none
Patch to make gfs2_grow write one resource group first, and then the rest none

Description Christine Caulfield 2009-03-17 14:05:26 UTC
Description of problem:

If I have a full GFS2 filesystem, it seemed reasonable to use lvextend and gfs2_grow to add more space to it. However this didn't work. I had to delete files before I could add more space.


Version-Release number of selected component (if applicable):
Tested using STABLE3 but bob reckons it's also a problem with RHEL5


How reproducible:
Easily

Steps to Reproduce:
1. Create a GFS2 filesystem
2. Mount it and fill it up to 100%
3. gfs2_grow /mnt/test1
  
Actual results:
# gfs2_grow /mnt/test2
FS: Mount Point: /mnt/test2
FS: Device:      /dev/mapper/guest-test2
FS: Size:        2621438 (0x27fffe)
FS: RG size:     65535 (0xffff)
DEV: Size:       5242880 (0x500000)
The file system grew by 10240MB.
Error writing new rindex entries;aborted.
gfs2_grow complete.
# 

The exit status is 0 and the file system has not been extended.

Expected results:

Either a non-zero exit code or, better, an extended filesystem

Additional info:

Removing some files alleviates the problem, and the file system can be grown successfully.

Comment 1 Steve Whitehouse 2009-03-18 14:00:17 UTC
Looks like we should preallocate an extra block for the rindex. We could do that at mkfs time easily, but we can't do that at grow time until we have support for fallocate(). The fallocate() mode in question needs checking to ensure that we don't fall foul of any of the unwritten rules of gfs2 by allowing files to grow beyond the height dictated by the file size.

So I think this will have to remain a "feature" for now. It will only affect filesystems where the rindex entries fill a complete block so that its not possible to add even a single extra entry without further allocations.

A potential workaround for production scenarios would be to use quotas to ensure that at least one spare block is kept on the filesystem at all times.

Comment 2 Robert Peterson 2009-05-05 20:38:53 UTC
See my notes in bug #498469.  If we implement my new statfs_fast
patch in gfs2, we can perhaps unlink the old statfs sync file
and reuse that block for writing to the rindex file.  Then, once
the new rg is in place, we can recreate said file for the next time
this happens, or some such.  There are concerns, however, which I
noted in that bug record.

Comment 4 Robert Peterson 2010-01-12 19:08:36 UTC
I may need to bump the priority on this one.  I had a user who did
a gfs2_grow on a full file system.  The gfs2_grow program wrote the
new RG info to the rindex file until it ran out of space.  When it
ran out of blocks, it couldn't write to the rindex any more, but
the alarming thing is that it didn't write a multiple of 96 bytes
because of block boundaries, and therefore the rindex file was
left with an invalid dinode size.  That confused both the kernel
and the rindex repair function of fsck.gfs2.  This was on
gfs2-utils-0.1.62-1.el5.

I hand-patched the dinode size with gfs2_edit to a multiple of 96
and the file system was usable again.  Some of the new rgs were
there, which meant the file system was usable and a subsequent
gfs2_grow was able to fix the problem.

I think I changed gfs2_grow not that long ago (within the last year)
so that it writes the first rindex entry first, then writes the
rest in a big chunk.  That may fix the problem for 99% of the
users just by moving to the latest code.  However, it doesn't
solve the case where the rindex file is full AND its last block
isn't big enough to hold another entry.  We still need to figure
out a way to squeeze out one more block.  (Previously I had
suggested deleting a system inode like statfs temporarily but
perhaps there's a better way).  I'd like to make gfs2_grow figure
out if there are any free blocks to work with (and/or free space
in the last block of rindex) and take extra measures if there
aren't either.

The other possibility is that the initial 96-byte write to rindex
failed to trigger more free blocks in the kernel code for the
subsequent writes to rindex, which would be another bug to slay.

Comment 5 chris 2010-02-24 15:20:03 UTC
Hello,

Ran into this bug yesterday.  I was able to grow the filesystem after removing some data.  However, even though df reports plenty of space, upon writing data to the filesystem, out of space errors are generated.  After taking the filesystem offline and fsck'ing, I was able to successfully use the new space.  Coincidence?  Didn't test to see if it was the remount that was the fix.

Is it possible to increase the priority of this bug?  Thank you.

Comment 6 Ben Marzinski 2010-04-26 20:41:14 UTC
What kernel and gfs2-utils version were you using?  Are you using a cluster? If so, how is it set up?  How did you fill it up.  I used kernel-2.6.18-197.el5, gfs2-utils-0.1.62-20.el5, and I filled it up by dd'ing one massive file to take up  almost all of the space, and then I created a bunch of emtpy files until I got the
number of available blocks reported by df to say 0.

When I tried to grow it after this, it worked fine.

Comment 7 chris 2010-04-27 16:17:21 UTC
Sorry for the lack of info:

2.6.18-128.1.14.el5
gfs2-utils-0.1.62-1.el5

A user filled it up doing a copy over nfs to the gfs2 filesystem.  This is a single node installation.

I just did a another test as I was a bit foggy on this issue:

- created 3GB gfs2 filesystem
- filled it up with 10MB files
- filled it up more with smaller files
- at this point I'm out of space
- I extend the lv by 1GB
- gfs2_grow the lv ... it succeeds ... weird!

But at this point even though df reports space available, I cannot create new files as I am told there is no space left on the device.  I unmounted the filesystem, then remounted, and I can now write files.  So, this test case was a bit different as I didn't run into the issue with the gfs2_grow reporting inability to write rindex files. 

FYI .. details ....

===============================================================================
df -kl
/dev/mapper/PDS_VG-test_bug_lv
                       3145376   3144656       720 100% /test_bug

[root@pds test_bug]# lvextend -L+1G /dev/PDS_VG/test_bug_lv
  Extending logical volume test_bug_lv to 4.00 GB
  Logical volume test_bug_lv successfully resized
[root@pds test_bug]# gfs2_grow /dev/PDS_VG/test_bug_lv
FS: Mount Point: /test_bug
FS: Device:      /dev/mapper/PDS_VG-test_bug_lv
FS: Size:        786431 (0xbffff)
FS: RG size:     65534 (0xfffe)
DEV: Size:       1048576 (0x100000)
The file system grew by 1024MB.
gfs2_grow complete.

df -kl
/dev/mapper/PDS_VG-test_bug_lv
                       3931712   3144656    787056  80% /test_bug

[root@pds test_bug]# for i in `seq 500 600`; do dd if=/dev/zero of=file_$i count=1 bs=1M;done
dd: opening `file_500': No space left on device
dd: opening `file_501': No space left on device

===============================================================================

Hope this helps, thanks.

Comment 8 Ben Marzinski 2010-05-04 22:14:10 UTC
The bug you describe in Comment #7 is bug #482756. It has been fixed in the 2.6.18-174.el5 kernel, so it looks like we're back to only the original problem.

Comment 13 Robert Peterson 2010-09-08 19:33:54 UTC
Comment #11 indicates there may have been rgrp corruption that
fsck.gfs2 was unable to fix.  I've been working on a complex
patch that is hopefully better at fixing this kind of damage.
I'm doing so in the name of bug #576640, which is a RHEL6 bug,
and so I don't have a back-port to RHEL5.5 yet.  The patch tests
my brutal "gfs2_fsck_hellfire" test, but it still needs some testing.

Once I get the fix tested, I'll back-port the patch to RHEL5.x.
If the problem is really due to damage, I'm hoping the patch will
correctly identify and fix the problem.

If the problem is actually due to the file system being full, today's
gfs2_grow has no way to extend it.  There's no simple solution.
Our team has discussed ways we can solve the problem but we haven't
tried to implement any of those ideas yet.

This work is somewhat backed up behind other fsck.gfs2 work I have
pending.

Comment 15 Ben Marzinski 2010-10-13 17:38:53 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In order to grow a gfs2 filesystem, gfs2 needs to add additional resource groups to manage the new space. gfs2_grow does this by writing to the rindex file. If there are no free blocks available in the filesystem at its current size, and the last block of the rindex file is too full to add another resource group entry, gfs2_grow will be unable to write out the necessary information for gfs2 to be able to use the new space. When this happens, gfs2_grow is unable to grow the filesystem.

This problem can only happen on filesytems where the last block of the rindex file is too full to add another resource group entry.  Whether or not this is the case is based on the filesystem size, the blocksize, and the resource group size.

If this problem occurs, gfs2_grow will report "Error writing new rindex entries;aborted."  In this case, the user must remove or truncate a file to free up spacce for gfs2_grow to complete.  Once the filesystem has been grown, the file can safely be added back to the gfs2 filesystem.

Comment 16 Ben Marzinski 2010-11-18 15:47:29 UTC
Created attachment 461327 [details]
information on how much space is necessary to grow the filesystem.

Comment 19 Steve Whitehouse 2011-01-24 13:11:30 UTC
Ben, the RHEL57 branch is open, so you should be able to commit this now.

Comment 20 Robert Peterson 2011-02-05 19:20:21 UTC
Ben's patch is in the RHEL57 branch of the git tree.  I also
built this into gfs2-utils-0.1.62-29.el5.  Changing status
to Modified.

Comment 22 Nate Straz 2011-05-18 18:52:33 UTC
Verified against kernel-2.6.18-261.el5 and gfs2-utils-0.1.62-30.el5.

Comment 23 Martin Juricek 2011-05-19 15:21:27 UTC
When gfs2_grow finishes with Error, it shouldn't return zero exit code. Please fix that.


[root@a3:~]$ uname -a
Linux a3 2.6.18-261.el5 #1 SMP Thu May 12 16:47:19 EDT 2011 ia64 ia64 ia64 GNU/Linux
(10:16:41) [root@a3:~]$ rpm -q gfs2-utils
gfs2-utils-0.1.62-30.el5


[root@a3:/opt]$ gfs2_grow /mnt/test
FS: Mount Point: /mnt/test
FS: Device:      /dev/mapper/vg1-lv1
FS: Size:        524288 (0x80000)
FS: RG size:     65533 (0xfffd)
DEV: Size:       53311488 (0x32d7800)
The file system grew by 206200MB.
Error writing new rindex entries;aborted.
gfs2_grow complete.
(10:08:53) [root@a3:/opt]$ echo $?
0

Comment 24 Ben Marzinski 2011-05-19 20:33:52 UTC
My reproducer now grows the filesystem correctly.  Could you please provide some more information about your failing test.

What would be most helpful is if you could give me

1. The initial size of the logical volume, by running

# lvs --units s

before making the filesystem.

2. The command used to create the filesystem

3. The size of the logical volume after growing, by running

# lvs --units s

after you resize the lv.

Comment 27 Ben Marzinski 2011-05-23 21:01:58 UTC
I'm not sure how we want to deal with this.  This is a completely different bug than the one fixed here.  It has nothing to do with fallocate. However, it still can cause gfs2_grow to fail to grow a completely full filesystem.

Here's the issue. GFS2 is writing to the rindex file a page at a time.  The fix for this bug (and the bugs it depends on) made sure that at the end of the last block, there was enough space for another resource group. For that to make any difference at all, block size must be equal to page size. If the page size is bigger than the block size, you may still need to do an allocation to write out
the first page of data to the rindex file.

That's the general problem. What your are seeing is a corner case. When the rindex file is stuffed, there's not enough space for a page full of data, even if the page size and block size are equal. So you will still need to do an allocation
to write the entire page.

The end result is that if your rindex file is stuffed, or if your page size doesn't equal you blocksize (and your file doesn't have enough allocated blocks to be aligned on a page boundary anyway), it's still possible that you will need
to do an allocation before you can write any rource groups entries to the rindex file, which will cause a hang.

The easiest solution is to have gfs2_grow do what it did in RHEL5, which is to write out the first resource group entry by itself, and then write then write out the rest.  The fallocate fix will guarantee that there is enough space for one more rindex group, and once you write that one, there will be tons of space
for the rest of the rindex file.

Another possible solution is to make mkfs.gfs2 always create an unstuffed rindex file, and then to make fallocate alway allocate enough space to make the file a multiple of the page size.

Comment 28 Ben Marzinski 2011-05-26 00:45:38 UTC
Created attachment 500962 [details]
Patch to make gfs2_grow write one resource group first, and then the rest

Comment 29 Ben Marzinski 2011-05-27 19:36:04 UTC
Like I mentioned in comment #27, the general problem here is that multipath must be able to write the first page of data out without allocating, and this can also happen when the page size and the cache size aren't equal.

To test that, you can use the same LV sizes from Comment #25, but when you make the filesystem, make it with -b 1024.  That will give you a non-page aligned rindex file (but with the smaller block size, it won't be stuffed).  The only other difference is that you won't be able to fill all of the blocks of the filesystem by simply touching files in the the filesystems root directory.  The file creation will fail, but df will still show blocks left. You will need to create a couple of directories, and keep touching files in there, until the filesystem really is out of free blocks.

Without this fix, that test fails. With the patch, the filesystem will grow correctly.

Comment 30 Ben Marzinski 2011-05-31 15:02:46 UTC
Posted

Comment 31 Ben Marzinski 2011-06-07 14:36:16 UTC
Comment on attachment 500962 [details]
Patch to make gfs2_grow write one resource group first, and then the rest

This fix is being dealt with in Bug #711451

Comment 32 errata-xmlrpc 2011-07-21 11:02:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1042.html