Description of problem:
When performing an online resize that consumes the number of reserved GDT blocks on an ext4 filesystem the resize2fs will fail as expected but will become corrupted following a mount after completing the resize2fs on the filesystem when its unmounted.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a logical volume 100M in size and format it with ext4
2. Fill the filesystem with data
3. Extend the filesystem using LVM utilities by 110G, or *1024 it's original size so that the reserved GDT blocks are fully consumed.
4. The on-line resize will fail with 'Operation not permitted adding group #XXX'
5. Unmount the filesystem and perform a resize2fs to finish out the resize. The resize will prompt to run e2fsck -f first.
6. Run e2fsck -f on the filesystem then finish the resize2fs offline.
7. Mount the filesystem then unmount it
8. Run e2fsck -fy on the filesystem, it will be heavily corrupted:
e2fsck 1.41.12 (17-May-2010)
e2fsck: Group descriptors look bad... trying backup blocks...
Pass 1: Checking inodes, blocks, and sizes
Group 1's inode table at 538 conflicts with some other fs block.
On-line resize fails with 'Operation not permitted adding group #XXX'. The resize2fs then abruptly stops and the filesystem is inconsistent following a remount.
The on-line resize should consume all reserved GDT blocks and remove the resize_inode feature to prevent new on-line resizes.
- This issue is not reproducible on RHEL7. RHEL7 most likely handles it with the introduction of the meta_bg feature and the online resize patches discussed here: http://www.spinics.net/lists/linux-ext4/msg33898.html
ext4: grow the s_flex_groups array as needed when resizing
ext4: grow the s_group_info array as needed
ext4: set bg_itable_unused when resizing
ext4: convert file system to meta_bg if needed during resizing
ext4: log a resize update to the console every 10 seconds
ext4: advertise the fact that the kernel supports meta_bg resizing
ext4: don't copy non-existent gdt blocks when resizing
ext4: avoid duplicate writes of the backup bg descriptor blocks
ext4: add online resizing support for meta_bg and 64-bit file systems
- I also want to point out that I completely understand that the disk format has a limitation and I'm not trying to change that but the fact that consuming all GDT blocks during an online resize trashes the filesystem seems to be a pretty bad issue.
- I wasn't able to reproduce it on my machines but this customer hit the following ext4 messages which seem related to https://bugzilla.redhat.com/show_bug.cgi?id=516580
Jun 30 16:16:53 hostname kernel: EXT4-fs error (device dm-7): ext4_mb_generate_buddy: EXT4-fs: group 1: 28688 blocks in bitmap, 32383 in gd
Aug 4 10:27:57 hostname kernel: EXT4-fs error (device dm-7): ext4_mb_generate_buddy: EXT4-fs: group 9087: 0 blocks in bitmap, 32768 in gd
Created attachment 926539 [details]
fsck log of corrupted ext4 filesystem following offline resize and mount/umount
I think this is a dup of bug #1036122.
> 4. The on-line resize will fail with 'Operation not permitted adding group #XXX'
> 5. Unmount the filesystem and perform a resize2fs to finish out the resize. The resize will prompt to run e2fsck -f first.
the problem here is that offline resize has a bug. Online resize stopped due to the limitations, but the user kept trying with offline, which "succeeded" due to this bug, corrupting the filesystem in the process.
Please retest with e2fsprogs-1.41.12-19.el6 I think this should be fixed for RHEL6.6 and RHEL6.5.z as well.
This commit is what should have fixed it.
Author: Theodore Ts'o <email@example.com>
Date: Sat Dec 29 00:53:16 2012 -0500
resize2fs: reserve all metadata blocks for flex_bg file systems
For flex_bg file systems, if we need to relocate an allocation bitmap
or inode table, we need to make sure that all metadata blocks have
been reserved, lest we end up overwriting a metadata block belonging
to a different block group.
This change fixes the following test case:
rm -f foo.img; touch foo.img
truncate -s 32G foo.img
mke2fs -F -t ext4 -E resize=12582912 foo.img
e2fsck -f foo.img
truncate -s 64G foo.img
e2fsck -fy foo.img
Signed-off-by: "Theodore Ts'o" <firstname.lastname@example.org>
(In reply to Eric Sandeen from comment #2)
> I think this is a dup of bug #1036122.
> > 4. The on-line resize will fail with 'Operation not permitted adding group #XXX'
> > 5. Unmount the filesystem and perform a resize2fs to finish out the resize. The resize will prompt to run e2fsck -f first.
> the problem here is that offline resize has a bug. Online resize stopped
> due to the limitations, but the user kept trying with offline, which
> "succeeded" due to this bug, corrupting the filesystem in the process.
> Please retest with e2fsprogs-1.41.12-19.el6 I think this should be fixed for
> RHEL6.6 and RHEL6.5.z as well.
I'll try that out thanks Eric.
e2fsprogs-1.42.12-20.el6 fixes this issue, marking this BZ as a dup. Thank you for your help Eric.
*** This bug has been marked as a duplicate of bug 1036122 ***
Great, thanks for checking. Always nice to knock out bugs this way. ;)