Description of problem: __journal_remove_journal_head: freeing b_frozen_data ext3_abort called. EXT3-fs error (device loop0): ext3_put_super: Couldn't clean up the journal sb orphan head is 49156 sb_info orphan list: inode loop0:49156 at ffff880017156a78: mode 100644, nlink 1, next 49153 inode loop0:49153 at ffff88001f59f488: mode 100644, nlink 1, next 0 Assertion failure in ext3_put_super() at fs/ext3/super.c:426: "list_empty(&sbi->s_orphan)" ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/ext3/super.c:425 Version-Release number of selected component (if applicable): redhat enterprise linux 5.5 How reproducible: It depends. Steps to Reproduce: 1. Create a large sparse file in a ext3 volume. The file size should be larger than the volume size left. 2. setup this file as a loopback device. 3. try to do direct IO. 4. If step 3 ends with ENOSPC, umount the volume and the kernel bug shows up. Actual results: Expected results: Additional info:
Acutally I have investigated the bug, it is caused by the function ext3_direct_IO. In case of write, we add the inodes in orphan first, and then after blockdev_direct_IO, we remove it. But that is a corner case. What if we succeed in adding while fail in removing? That would leave the inodes in the ext3_sb->s_orphans, so when we do umount it would trigger kernel bug.
Created attachment 411173 [details] a temp fix. Here is my temp fix, and I take what ext4_ind_direct_IO did as a reference. But we can also call ext3_truncate there and then change ext3_truncate like mainline commit ef43618a47179b41e7203a624f2c7445e7da488c. That way we need to change 2 places and a little more codes to change. I am not familiar with ext3 enough, so I choose the simplest way to work around it.
You're patch seems to be correct, but it's not upstream. Please post it upstream and then we can pull it into RHEL. Thanks, Josef
Duh, of course I then go look at upstream and see what's done there and we call ext3_truncate there. I'll backport the upstream patches.
Created attachment 411368 [details] potential fix I tried to reproduce your problem, but I couldn't. This is what I did mkfs.ext3 -b 4096 /dev/sdb1 (500m partition) mount /dev/sdb1 /mnt/test dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1 losetup /dev/loop0 /mnt/test/file dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144 Here is a backport of a couple of upstream bz's that should fix the problem. It's a little odd that the thing is failing in the journal_start, but if thats what's happening then this fix is definitely needed. Can you verify this patch fixes the problem? Thanks, Josef
(In reply to comment #4) > Duh, of course I then go look at upstream and see what's done there and we call > ext3_truncate there. I'll backport the upstream patches. So you have choose the 2nd way I said above. use ext3_truncate and backport commit ef43618a47179b41e7203a624f2c7445e7da488c. ;) I have no objection to it.
(In reply to comment #5) > Created an attachment (id=411368) [details] > potential fix > I tried to reproduce your problem, but I couldn't. This is what I did yes, it is not very easy to trigger. Sorry for not pasting the test script at the very beginning. > mkfs.ext3 -b 4096 /dev/sdb1 (500m partition) > mount /dev/sdb1 /mnt/test > dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1 > losetup /dev/loop0 /mnt/test/file > dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144 here the volume is needed to be mounted and then be written into. mkfs.ext3 /dev/loop0 mount /dev/loop0 /mnt/test1 cd /mnt/test1 dd if=/dev/zero of=testfile1 bs=1024k & dd if=/dev/zero of=testfile2 bs=1024k oflag=direct & dd if=/dev/zero of=testfile3 bs=24k & dd if=/dev/zero of=testfile4 bs=1024k oflag=direct seek=10000 & Sometimes you will trigger it. > Here is a backport of a couple of upstream bz's that should fix the problem. > It's a little odd that the thing is failing in the journal_start, but if thats > what's happening then this fix is definitely needed. Can you verify this patch > fixes the problem? Thanks, OK, I will test it. > Josef
yeah, we have tested the patch. It works. Thanks.
Great, thanks for doing all of the actual work :). Josef
BTW this is a backport of ef43618a47179b41e7203a624f2c7445e7da488c 7eb4969e04060dcf3fbd46af9c21b1059b853068
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-211.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html