Bug 1129799 - Consumption of reserved GDT blocks during an online resize results in corruption following the offline resize to an ext4 fs
Summary: Consumption of reserved GDT blocks during an online resize results in corrupt...
Keywords:
Status: CLOSED DUPLICATE of bug 1036122
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: e2fsprogs
Version: 6.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-13 17:09 UTC by Kyle Squizzato
Modified: 2018-12-09 18:21 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-13 20:16:37 UTC


Attachments (Terms of Use)
fsck log of corrupted ext4 filesystem following offline resize and mount/umount (837.63 KB, text/plain)
2014-08-13 17:16 UTC, Kyle Squizzato
no flags Details

Description Kyle Squizzato 2014-08-13 17:09:24 UTC
Description of problem:
When performing an online resize that consumes the number of reserved GDT blocks on an ext4 filesystem the resize2fs will fail as expected but will become corrupted following a mount after completing the resize2fs on the filesystem when its unmounted.

Version-Release number of selected component (if applicable):
e2fsprogs-1.41.12-18.el6.x86_64
kernel-2.6.32-431.17.1.el6

How reproducible:
Always

Steps to Reproduce:
1. Create a logical volume 100M in size and format it with ext4
2. Fill the filesystem with data
3. Extend the filesystem using LVM utilities by 110G, or *1024 it's original size so that the reserved GDT blocks are fully consumed.
4. The on-line resize will fail with 'Operation not permitted adding group #XXX'
5. Unmount the filesystem and perform a resize2fs to finish out the resize.  The resize will prompt to run e2fsck -f first.
6. Run e2fsck -f on the filesystem then finish the resize2fs offline. 
7. Mount the filesystem then unmount it
8. Run e2fsck -fy on the filesystem, it will be heavily corrupted:

e2fsck 1.41.12 (17-May-2010)
e2fsck: Group descriptors look bad... trying backup blocks...
Pass 1: Checking inodes, blocks, and sizes
Group 1's inode table at 538 conflicts with some other fs block.
Relocate<y>? 

Actual results:
On-line resize fails with 'Operation not permitted adding group #XXX'.  The resize2fs then abruptly stops and the filesystem is inconsistent following a remount.

Expected results:
The on-line resize should consume all reserved GDT blocks and remove the resize_inode feature to prevent new on-line resizes. 

Additional info:
 - This issue is not reproducible on RHEL7.  RHEL7 most likely handles it with the introduction of the meta_bg feature and the online resize patches discussed here: http://www.spinics.net/lists/linux-ext4/msg33898.html

  ext4: grow the s_flex_groups array as needed when resizing
  ext4: grow the s_group_info array as needed
  ext4: set bg_itable_unused when resizing
  ext4: convert file system to meta_bg if needed during resizing
  ext4: log a resize update to the console every 10 seconds
  ext4: advertise the fact that the kernel supports meta_bg resizing
  ext4: don't copy non-existent gdt blocks when resizing
  ext4: avoid duplicate writes of the backup bg descriptor blocks
  ext4: add online resizing support for meta_bg and 64-bit file systems

 - I also want to point out that I completely understand that the disk format has a limitation and I'm not trying to change that but the fact that consuming all GDT blocks during an online resize trashes the filesystem seems to be a pretty bad issue.
 - I wasn't able to reproduce it on my machines but this customer hit the following ext4 messages which seem related to https://bugzilla.redhat.com/show_bug.cgi?id=516580

Jun 30 16:16:53 hostname kernel: EXT4-fs error (device dm-7): ext4_mb_generate_buddy: EXT4-fs: group 1: 28688 blocks in bitmap, 32383 in gd
Aug  4 10:27:57 hostname kernel: EXT4-fs error (device dm-7): ext4_mb_generate_buddy: EXT4-fs: group 9087: 0 blocks in bitmap, 32768 in gd

Comment 1 Kyle Squizzato 2014-08-13 17:16:50 UTC
Created attachment 926539 [details]
fsck log of corrupted ext4 filesystem following offline resize and mount/umount

Comment 2 Eric Sandeen 2014-08-13 17:57:19 UTC
I think this is a dup of bug #1036122.

> 4. The on-line resize will fail with 'Operation not permitted adding group #XXX'
> 5. Unmount the filesystem and perform a resize2fs to finish out the resize.  The resize will prompt to run e2fsck -f first.

the problem here is that offline resize has a bug.  Online resize stopped due to the limitations, but the user kept trying with offline, which "succeeded" due to this bug, corrupting the filesystem in the process.

Please retest with e2fsprogs-1.41.12-19.el6 I think this should be fixed for RHEL6.6 and RHEL6.5.z as well.

Comment 3 Eric Sandeen 2014-08-13 17:58:18 UTC
This commit is what should have fixed it.

commit 4b04fb30e01c7418331caa01ecf071bd55672f1a
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Dec 29 00:53:16 2012 -0500

    resize2fs: reserve all metadata blocks for flex_bg file systems
    
    For flex_bg file systems, if we need to relocate an allocation bitmap
    or inode table, we need to make sure that all metadata blocks have
    been reserved, lest we end up overwriting a metadata block belonging
    to a different block group.
    
    This change fixes the following test case:
    
    rm -f foo.img; touch foo.img
    truncate -s 32G foo.img
    mke2fs -F -t ext4 -E resize=12582912 foo.img
    e2fsck -f foo.img
    truncate -s 64G foo.img
    ./resize2fs foo.img
    e2fsck -fy foo.img
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Comment 4 Kyle Squizzato 2014-08-13 19:05:08 UTC
(In reply to Eric Sandeen from comment #2)
> I think this is a dup of bug #1036122.
> 
> > 4. The on-line resize will fail with 'Operation not permitted adding group #XXX'
> > 5. Unmount the filesystem and perform a resize2fs to finish out the resize.  The resize will prompt to run e2fsck -f first.
> 
> the problem here is that offline resize has a bug.  Online resize stopped
> due to the limitations, but the user kept trying with offline, which
> "succeeded" due to this bug, corrupting the filesystem in the process.
> 
> Please retest with e2fsprogs-1.41.12-19.el6 I think this should be fixed for
> RHEL6.6 and RHEL6.5.z as well.

I'll try that out thanks Eric.

Comment 5 Kyle Squizzato 2014-08-13 20:16:37 UTC
e2fsprogs-1.42.12-20.el6 fixes this issue, marking this BZ as a dup.  Thank you for your help Eric.

*** This bug has been marked as a duplicate of bug 1036122 ***

Comment 6 Eric Sandeen 2014-08-14 20:58:41 UTC
Great, thanks for checking.  Always nice to knock out bugs this way.  ;)


Note You need to log in before you can comment on or make changes to this bug.