Bug 588599 - Kernel BUG at fs/ext3/super.c:425
Kernel BUG at fs/ext3/super.c:425
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Josef Bacik
Igor Zhang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-03 23:10 EDT by Tao Ma
Modified: 2011-03-03 03:29 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-01-13 16:30:44 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
a temp fix. (478 bytes, patch)
2010-05-03 23:30 EDT, Tao Ma
no flags Details | Diff
potential fix (2.18 KB, patch)
2010-05-04 14:47 EDT, Josef Bacik
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 05:37:42 EST

  None (edit)
Description Tao Ma 2010-05-03 23:10:09 EDT
Description of problem:
__journal_remove_journal_head: freeing b_frozen_data 
ext3_abort called. 
EXT3-fs error (device loop0): ext3_put_super: Couldn't clean up the journal 
sb orphan head is 49156 
sb_info orphan list: 
  inode loop0:49156 at ffff880017156a78: mode 100644, nlink 1, next 49153 
  inode loop0:49153 at ffff88001f59f488: mode 100644, nlink 1, next 0 
Assertion failure in ext3_put_super() at fs/ext3/super.c:426: 
"list_empty(&sbi->s_orphan)" 
----------- [cut here ] --------- [please bite here ] --------- 
Kernel BUG at fs/ext3/super.c:425 


Version-Release number of selected component (if applicable):
redhat enterprise linux 5.5

How reproducible:
It depends.

Steps to Reproduce:
1. Create a large sparse file in a ext3 volume. The file size should be larger than the volume size left.
2. setup this file as a loopback device.
3. try to do direct IO.
4. If step 3 ends with ENOSPC, umount the volume and the kernel bug shows up.
  
Actual results:


Expected results:


Additional info:
Comment 1 Tao Ma 2010-05-03 23:18:45 EDT
Acutally I have investigated the bug, it is caused by the function ext3_direct_IO. In case of write, we add the inodes in orphan first, and then after blockdev_direct_IO, we remove it. But that is a corner case. What if we succeed in adding while fail in removing? That would leave the inodes in the ext3_sb->s_orphans, so when we do umount it would trigger kernel bug.
Comment 2 Tao Ma 2010-05-03 23:30:56 EDT
Created attachment 411173 [details]
a temp fix.

Here is my temp fix, and I take what ext4_ind_direct_IO did as a reference.
But we can also call ext3_truncate there and then change ext3_truncate like 
mainline commit ef43618a47179b41e7203a624f2c7445e7da488c. That way we need to change 2 places and a little more codes to change.
I am not familiar with ext3 enough, so I choose the simplest way to work around it.
Comment 3 Josef Bacik 2010-05-04 10:45:11 EDT
You're patch seems to be correct, but it's not upstream.  Please post it upstream and then we can pull it into RHEL.  Thanks,

Josef
Comment 4 Josef Bacik 2010-05-04 10:47:07 EDT
Duh, of course I then go look at upstream and see what's done there and we call ext3_truncate there.  I'll backport the upstream patches.
Comment 5 Josef Bacik 2010-05-04 14:47:26 EDT
Created attachment 411368 [details]
potential fix

I tried to reproduce your problem, but I couldn't.  This is what I did

mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
mount /dev/sdb1 /mnt/test
dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
losetup /dev/loop0 /mnt/test/file
dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144

Here is a backport of a couple of upstream bz's that should fix the problem.  It's a little odd that the thing is failing in the journal_start, but if thats what's happening then this fix is definitely needed.  Can you verify this patch fixes the problem?  Thanks,

Josef
Comment 6 Tao Ma 2010-05-04 19:56:00 EDT
(In reply to comment #4)
> Duh, of course I then go look at upstream and see what's done there and we call
> ext3_truncate there.  I'll backport the upstream patches.    

So you have choose the 2nd way I said above. use ext3_truncate and backport commit ef43618a47179b41e7203a624f2c7445e7da488c. ;)
I have no objection to it.
Comment 7 Tao Ma 2010-05-04 20:02:08 EDT
(In reply to comment #5)
> Created an attachment (id=411368) [details]
> potential fix
> I tried to reproduce your problem, but I couldn't.  This is what I did
yes, it is not very easy to trigger. Sorry for not pasting the test script at the very beginning.
> mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
> mount /dev/sdb1 /mnt/test
> dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
> losetup /dev/loop0 /mnt/test/file
> dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144
here the volume is needed to be mounted and then be written into.
mkfs.ext3 /dev/loop0
mount /dev/loop0 /mnt/test1
cd /mnt/test1
dd if=/dev/zero of=testfile1 bs=1024k & 
dd if=/dev/zero of=testfile2 bs=1024k oflag=direct & 
dd if=/dev/zero of=testfile3 bs=24k  & 
dd if=/dev/zero of=testfile4 bs=1024k oflag=direct seek=10000 & 

Sometimes you will trigger it.
> Here is a backport of a couple of upstream bz's that should fix the problem. 
> It's a little odd that the thing is failing in the journal_start, but if thats
> what's happening then this fix is definitely needed.  Can you verify this patch
> fixes the problem?  Thanks,
OK, I will test it.
> Josef
Comment 8 Tao Ma 2010-05-05 10:25:16 EDT
yeah, we have tested the patch. It works. Thanks.
Comment 9 Josef Bacik 2010-05-05 11:24:00 EDT
Great, thanks for doing all of the actual work :).

Josef
Comment 10 Josef Bacik 2010-05-05 11:33:51 EDT
BTW this is a backport of

ef43618a47179b41e7203a624f2c7445e7da488c
7eb4969e04060dcf3fbd46af9c21b1059b853068
Comment 11 RHEL Product and Program Management 2010-08-06 01:50:10 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 Jarod Wilson 2010-08-10 20:12:26 EDT
in kernel-2.6.18-211.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.
Comment 17 errata-xmlrpc 2011-01-13 16:30:44 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.