Bug 857201 - Filling up an ext4 disk w/o journal triggers kernel BUG at fs/ext4/mballoc.c:3837!
Filling up an ext4 disk w/o journal triggers kernel BUG at fs/ext4/mballoc.c:...
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
Depends On:
Blocks: 810353
  Show dependency treegraph
Reported: 2012-09-13 15:01 EDT by Bill Kuzeja
Modified: 2017-09-13 03:35 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-12-14 17:45:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3181161 None None None 2017-09-13 03:35 EDT

  None (edit)
Description Bill Kuzeja 2012-09-13 15:01:15 EDT
Description of problem:

System crash occurred after a disk full triggered a BUG_ON on one of our test modules:

crash> bt 25260
PID: 25260  TASK: ffff88105a87d500  CPU: 5   COMMAND: "disk_test_64"
 #0 [ffff88104a1852e0] die at ffffffff8100f22b
 #1 [ffff88104a185310] do_trap at ffffffff815015c4
 #2 [ffff88104a185370] do_invalid_op at ffffffff8100cdf5
 #3 [ffff88104a185410] invalid_op at ffffffff8100be9b
    [exception RIP: ext4_mb_free_blocks+0x80d]
    RIP: ffffffffa014295d  RSP: ffff88104a1854c8  RFLAGS: 00010283
    RAX: ffff88105344b280  RBX: 0000000000000001  RCX: ffff88084f0132c0
    RDX: ffff88105344af18  RSI: 0000000000000000  RDI: ffff88105a6dd5a0
    RBP: ffff88104a1855b8   R8: 0000000000000000   R9: ffff881039b6f400
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff88105a6dd5a0
    R13: ffff88085b487980  R14: ffff88086854c390  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff88104a1854c0] ext4_mb_free_blocks at ffffffffa0142688 [ext4]
 #5 [ffff88104a1855c0] ext4_free_blocks at ffffffffa010e50d [ext4]
 #6 [ffff88104a1855f0] ext4_alloc_branch at ffffffffa0114c5f [ext4]
 #7 [ffff88104a1856e0] ext4_ind_get_blocks at ffffffffa011680d [ext4]
 #8 [ffff88104a1857e0] ext4_get_blocks at ffffffffa0116e20 [ext4]
 #9 [ffff88104a185860] ext4_get_block at ffffffffa011779d [ext4]
#10 [ffff88104a1858b0] __block_prepare_write at ffffffff811b145b
#11 [ffff88104a1859a0] block_write_begin_newtrunc at ffffffff811b1acc
#12 [ffff88104a1859f0] block_write_begin at ffffffff811b1ed3
#13 [ffff88104a185a40] ext4_write_begin at ffffffffa011b886 [ext4]
#14 [ffff88104a185af0] ext4_da_write_begin at ffffffffa011bab8 [ext4]
#15 [ffff88104a185b90] generic_file_buffered_write at ffffffff81113823
#16 [ffff88104a185c60] __generic_file_aio_write at ffffffff811151c0
#17 [ffff88104a185d20] generic_file_aio_write at ffffffff8111545f
#18 [ffff88104a185d70] ext4_file_write at ffffffffa0110131 [ext4]
#19 [ffff88104a185dc0] do_sync_write at ffffffff8117c3ca
#20 [ffff88104a185ef0] vfs_write at ffffffff8117c6c8
#21 [ffff88104a185f30] sys_write at ffffffff8117cf31
#22 [ffff88104a185f80] system_call_fastpath at ffffffff8100b0b2
    RIP: 000000340c0da3c0  RSP: 00007fff5386d730  RFLAGS: 00000293
    RAX: 0000000000000001  RBX: ffffffff8100b0b2  RCX: 0000000000000004
    RDX: 0000000000010000  RSI: 0000000002195a60  RDI: 0000000000000005
    RBP: 00007fff5386d810   R8: ffffffffffffffff   R9: 000022abc6f621aa
    R10: 33343a3731203131  R11: 0000000000000246  R12: 00007fff5386dbc0
    R13: 0000000000401340  R14: 0000000000000000  R15: 0000000000401340
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

The check in question which crashes the machine is this:

static void ext4_mb_return_to_preallocation(struct inode *inode,
                                        struct ext4_buddy *e4b,
                                        sector_t block, int count)

To sanity check, the disassembly of this is here:

0xffffffffa0142688 <ext4_mb_free_blocks+0x538>: mov    -0xe0(%rbp),%rdx
0xffffffffa014268f <ext4_mb_free_blocks+0x53f>: mov    -0xe8(%rbp),%rax
0xffffffffa0142696 <ext4_mb_free_blocks+0x546>: cmp    0x368(%rdx),%rax  <==list
0xffffffffa014269d <ext4_mb_free_blocks+0x54d>: jne    0xffffffffa014295d <ext4_mb_free_blocks+0x80d>

The structure looks like this - sure enough the offset of i_prealloc_list is 0x368 off rdx:

crash> ext4_inode_info -o
struct ext4_inode_info {
    [0x0] __le32 i_data[15];
   [0x3c] __u32 i_dtime;
   [0x40] ext4_fsblk_t i_file_acl;
   [0x48] ext4_group_t i_block_group;
   [0x50] long unsigned int i_state_flags;
   [0x58] long unsigned int i_flags;
   [0x60] ext4_lblk_t i_dir_start_lookup;
   [0x68] struct rw_semaphore xattr_sem;
   [0x88] struct list_head i_orphan;
   [0x98] loff_t i_disksize;
   [0xa0] struct rw_semaphore i_data_sem;
   [0xc0] struct inode vfs_inode;
  [0x310] struct jbd2_inode jinode;
  [0x340] struct ext4_ext_cache i_cached_extent;
  [0x358] struct timespec i_crtime;
  [0x368] struct list_head i_prealloc_list;   <===== Here's the list address
  [0x378] spinlock_t i_prealloc_lock;

Looking at this structure using the contents of RDX yields:

crash> ext4_inode_info ffff88105344af18
struct ext4_inode_info {
  i_data = {0x1301, 0x1302, 0x1303, 0x1304, 0x1305, 0x1306, 0x1307, 0x1308, 0x1309, 0x130a, 0x130b, 0x130c, 0x584, 0x243, 0x0},
  i_dtime = 0x0,
  i_file_acl = 0xc0fa,
  i_block_group = 0x6,
  i_state_flags = 0x0,


  i_prealloc_list = {
    next = 0xffff8810535d6c60,  <== Surely enough, an empty list
    prev = 0xffff8810535d6c60

Looking upstream, it looks like this routine was eliminated starting with the 2.6.38 kernel:

In ext4_free_blocks:

4655                 ext4_lock_group(sb, block_group);
4656                 mb_clear_bits(bitmap_bh->b_data, bit, count);
4657                 mb_free_blocks(inode, &e4b, bit, count);
***************** Here's where the call was *******************
4658         }

Version-Release number of selected component (if applicable):

How reproducible:
This has happened a couple of times in test. The test engineer seems to think this is very reproducible.

Steps to Reproduce:
1. Fill up an ext4 disk
2. Enjoy your crash
Actual results:

Expected results:
Complaints but no crash

Additional info:
Comment 2 Eric Sandeen 2012-09-13 16:22:33 EDT
commit a5196f8cdfbf6ccb20f093aaf48852d6d23b4e0b
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Mon Jan 10 12:47:07 2011 -0500

    ext4: remove ext4_mb_return_to_preallocation()
    This function was never implemented, except for a BUG_ON which was
    tripping when ext4 is run without a journal.  The problem is that
    although the comment asserts that "truncate (which is the only way to
    free block) discards all preallocations", ext4_free_blocks() is also
    called in various error recovery paths when blocks have been
    allocated, but for various reasons, we were not able to use those data
    blocks (for example, because we ran out of memory while trying to
    manipulate the extent tree, or some other similar situation).
    In addition to the fact that this function isn't implemented except
    for the incorrect BUG_ON, the single caller of this function,
    ext4_free_blocks(), doesn't use it all if the journal is enabled.
    So remove the (stub) function entirely for now.  If we decide it's
    better to add it back, it's only going to be useful with a relatively
    large number of code changes anyway.
    Google-Bug-Id: 3236408
    Cc: Jiaying Zhang <jiayingz@google.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Do you happen to be running with the journal turned off?

# dumpe2fs -h /dev/sdXX | grep -i journal
Comment 3 Eric Sandeen 2012-09-13 16:23:57 EDT
In any case, looks like we should simply remove it as well.
Comment 4 Bill Kuzeja 2012-09-14 10:12:03 EDT
Thanks Eric.

As it turns out, journaling was turned off on this volume. Probably still shouldn't crash the system though.
Comment 5 Eric Sandeen 2012-09-14 10:15:47 EDT
Agreed, we should probably go ahead & backport the patch in any case.


Running w/o journal isn't supported by RHEL, and we do not test it, so you may well run into other bugs like this... 

Comment 6 Ric Wheeler 2012-09-14 10:22:23 EDT
Bill, is running without a journal something stratus supports or advises your users to do on RHEL?

We definitely don't want to encourage that at all - I think even the google people have started to rethink it :)
Comment 7 Bill Kuzeja 2012-09-14 10:33:09 EDT
Turns out (and I just found this out) this was actually a mistake made when creating the file system. The test person did a mkfs instead of mkfs.ext4 then mounted the filesystem as an ext4 (which he was allowed to do). So, obviously the filesystem had no journaling.

When he went to fill up the disk, he hit the crash. So, this was stumbled upon quite by accident. It DEFINITELY is not our policy to do this - this was user error.
Comment 8 RHEL Product and Program Management 2012-12-14 02:36:08 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 9 Ric Wheeler 2012-12-14 17:45:19 EST
I think that this will get fixed eventually in RHEL as we rebase ext4, but it is not a supported case so I am going to close this RHEL6 BZ.

Note You need to log in before you can comment on or make changes to this bug.