Bug 588599
Summary: | Kernel BUG at fs/ext3/super.c:425 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Tao Ma <tao.ma> | ||||||
Component: | kernel | Assignee: | Josef Bacik <jbacik> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Igor Zhang <yugzhang> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.5 | CC: | guru.anbalagane, herbert.van.den.bergh, jbacik, sunil.mushran, yugzhang | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-01-13 21:30:44 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Tao Ma
2010-05-04 03:10:09 UTC
Acutally I have investigated the bug, it is caused by the function ext3_direct_IO. In case of write, we add the inodes in orphan first, and then after blockdev_direct_IO, we remove it. But that is a corner case. What if we succeed in adding while fail in removing? That would leave the inodes in the ext3_sb->s_orphans, so when we do umount it would trigger kernel bug. Created attachment 411173 [details]
a temp fix.
Here is my temp fix, and I take what ext4_ind_direct_IO did as a reference.
But we can also call ext3_truncate there and then change ext3_truncate like
mainline commit ef43618a47179b41e7203a624f2c7445e7da488c. That way we need to change 2 places and a little more codes to change.
I am not familiar with ext3 enough, so I choose the simplest way to work around it.
You're patch seems to be correct, but it's not upstream. Please post it upstream and then we can pull it into RHEL. Thanks, Josef Duh, of course I then go look at upstream and see what's done there and we call ext3_truncate there. I'll backport the upstream patches. Created attachment 411368 [details]
potential fix
I tried to reproduce your problem, but I couldn't. This is what I did
mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
mount /dev/sdb1 /mnt/test
dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
losetup /dev/loop0 /mnt/test/file
dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144
Here is a backport of a couple of upstream bz's that should fix the problem. It's a little odd that the thing is failing in the journal_start, but if thats what's happening then this fix is definitely needed. Can you verify this patch fixes the problem? Thanks,
Josef
(In reply to comment #4) > Duh, of course I then go look at upstream and see what's done there and we call > ext3_truncate there. I'll backport the upstream patches. So you have choose the 2nd way I said above. use ext3_truncate and backport commit ef43618a47179b41e7203a624f2c7445e7da488c. ;) I have no objection to it. (In reply to comment #5) > Created an attachment (id=411368) [details] > potential fix > I tried to reproduce your problem, but I couldn't. This is what I did yes, it is not very easy to trigger. Sorry for not pasting the test script at the very beginning. > mkfs.ext3 -b 4096 /dev/sdb1 (500m partition) > mount /dev/sdb1 /mnt/test > dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1 > losetup /dev/loop0 /mnt/test/file > dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144 here the volume is needed to be mounted and then be written into. mkfs.ext3 /dev/loop0 mount /dev/loop0 /mnt/test1 cd /mnt/test1 dd if=/dev/zero of=testfile1 bs=1024k & dd if=/dev/zero of=testfile2 bs=1024k oflag=direct & dd if=/dev/zero of=testfile3 bs=24k & dd if=/dev/zero of=testfile4 bs=1024k oflag=direct seek=10000 & Sometimes you will trigger it. > Here is a backport of a couple of upstream bz's that should fix the problem. > It's a little odd that the thing is failing in the journal_start, but if thats > what's happening then this fix is definitely needed. Can you verify this patch > fixes the problem? Thanks, OK, I will test it. > Josef yeah, we have tested the patch. It works. Thanks. Great, thanks for doing all of the actual work :). Josef BTW this is a backport of ef43618a47179b41e7203a624f2c7445e7da488c 7eb4969e04060dcf3fbd46af9c21b1059b853068 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-211.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |