Red Hat Bugzilla – Bug 588599
Kernel BUG at fs/ext3/super.c:425
Last modified: 2011-03-03 03:29:24 EST
Description of problem:
__journal_remove_journal_head: freeing b_frozen_data
EXT3-fs error (device loop0): ext3_put_super: Couldn't clean up the journal
sb orphan head is 49156
sb_info orphan list:
inode loop0:49156 at ffff880017156a78: mode 100644, nlink 1, next 49153
inode loop0:49153 at ffff88001f59f488: mode 100644, nlink 1, next 0
Assertion failure in ext3_put_super() at fs/ext3/super.c:426:
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/ext3/super.c:425
Version-Release number of selected component (if applicable):
redhat enterprise linux 5.5
Steps to Reproduce:
1. Create a large sparse file in a ext3 volume. The file size should be larger than the volume size left.
2. setup this file as a loopback device.
3. try to do direct IO.
4. If step 3 ends with ENOSPC, umount the volume and the kernel bug shows up.
Acutally I have investigated the bug, it is caused by the function ext3_direct_IO. In case of write, we add the inodes in orphan first, and then after blockdev_direct_IO, we remove it. But that is a corner case. What if we succeed in adding while fail in removing? That would leave the inodes in the ext3_sb->s_orphans, so when we do umount it would trigger kernel bug.
Created attachment 411173 [details]
a temp fix.
Here is my temp fix, and I take what ext4_ind_direct_IO did as a reference.
But we can also call ext3_truncate there and then change ext3_truncate like
mainline commit ef43618a47179b41e7203a624f2c7445e7da488c. That way we need to change 2 places and a little more codes to change.
I am not familiar with ext3 enough, so I choose the simplest way to work around it.
You're patch seems to be correct, but it's not upstream. Please post it upstream and then we can pull it into RHEL. Thanks,
Duh, of course I then go look at upstream and see what's done there and we call ext3_truncate there. I'll backport the upstream patches.
Created attachment 411368 [details]
I tried to reproduce your problem, but I couldn't. This is what I did
mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
mount /dev/sdb1 /mnt/test
dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
losetup /dev/loop0 /mnt/test/file
dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144
Here is a backport of a couple of upstream bz's that should fix the problem. It's a little odd that the thing is failing in the journal_start, but if thats what's happening then this fix is definitely needed. Can you verify this patch fixes the problem? Thanks,
(In reply to comment #4)
> Duh, of course I then go look at upstream and see what's done there and we call
> ext3_truncate there. I'll backport the upstream patches.
So you have choose the 2nd way I said above. use ext3_truncate and backport commit ef43618a47179b41e7203a624f2c7445e7da488c. ;)
I have no objection to it.
(In reply to comment #5)
> Created an attachment (id=411368) [details]
> potential fix
> I tried to reproduce your problem, but I couldn't. This is what I did
yes, it is not very easy to trigger. Sorry for not pasting the test script at the very beginning.
> mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
> mount /dev/sdb1 /mnt/test
> dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
> losetup /dev/loop0 /mnt/test/file
> dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144
here the volume is needed to be mounted and then be written into.
mount /dev/loop0 /mnt/test1
dd if=/dev/zero of=testfile1 bs=1024k &
dd if=/dev/zero of=testfile2 bs=1024k oflag=direct &
dd if=/dev/zero of=testfile3 bs=24k &
dd if=/dev/zero of=testfile4 bs=1024k oflag=direct seek=10000 &
Sometimes you will trigger it.
> Here is a backport of a couple of upstream bz's that should fix the problem.
> It's a little odd that the thing is failing in the journal_start, but if thats
> what's happening then this fix is definitely needed. Can you verify this patch
> fixes the problem? Thanks,
OK, I will test it.
yeah, we have tested the patch. It works. Thanks.
Great, thanks for doing all of the actual work :).
BTW this is a backport of
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
You can download this test kernel from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.