Bug 172284 - Repeated ext3 journal errors on Megaraid RAID 1 array
Summary: Repeated ext3 journal errors on Megaraid RAID 1 array
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 5
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-11-02 13:13 UTC by Adam Huffman
Modified: 2015-01-04 22:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-17 23:23:00 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Adam Huffman 2005-11-02 13:13:58 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051025 Firefox/1.5

Description of problem:
There are two dual Opteron machines here with 8G RAM and /home on a RAID1 array connected to a Megaraid controller.

Several times now the journal has been aborted and in one case fsck could not resolve the situation, meaning I had to delete the filesystem.

Here are the error messages:

EXT3-fs error (device sdb2): ext3_new_block: Allocating block in system zone - block = 55593720
Aborting journal on device sdb2.
EXT3-fs error (device sdb2) in ext3_prepare_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
__journal_remove_journal_head: freeing b_committed_data

There are two other machines identical to those but with only 4G RAM.  They haven't shown any such errors.

Version-Release number of selected component (if applicable):
2.6.13-1.1532_FC4smp

How reproducible:
Always

Steps to Reproduce:
1. Use system as normal
2. Journal aborts, /home is mounted read-only
3.
  

Actual Results:  Massive filesystem corruption in first case

Additional info:

Comment 1 Adam Huffman 2005-11-02 13:17:09 UTC
The motherboard is a Tyan Thunder K8W

Comment 2 Dave Jones 2005-11-10 19:32:24 UTC
2.6.14-1.1637_FC4 has been released as an update for FC4.
Please retest with this update, as a large amount of code has been changed in
this release, which may have fixed your problem.

Thank you.


Comment 3 Adam Huffman 2005-11-22 14:51:15 UTC
Rebooted to the latest kernel yesterday, which seemed to resolve some
performance problems with the MegaRAID card.  However, a user logged in to the
machine today and immediately found the /home had been remounted read-only,
because of the journal had aborted.

So, the latest kernel does not appear to have solved this problem.

I think this may be related to the following bug in Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=305977

Comment 4 Aaron Straus 2005-12-30 15:55:58 UTC
Hello.  We just saw this as well running 2.6.14-1.1653_FC4 on a Athlon XP.  Just
 a standard ATA disk, no RAID involved.  Happy to provide more info.

Here is the tail of dmesg:

EXT3-fs error (device hdb1): ext3_new_block: Allocating block in system zone -
block = 91127808
Aborting journal on device hdb1.
EXT3-fs error (device hdb1) in ext3_prepare_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device hdb1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data

Comment 5 Aaron Straus 2005-12-30 17:41:33 UTC
PS During fsck, there is a __ton__ of filesystem damage, more than I've ever
seen before.

Comment 6 Dave Jones 2006-02-03 06:48:20 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 7 Adam Huffman 2006-02-03 22:36:38 UTC
I shall try to test on the one machine remaining in this configuration.  The
first one has been altered to use software RAID as I had to get it working. 
There has been no repeat of the filesystem corruption using the MD layer.  So it
seems there is a bug specific to the Megaraid driver and >4G RAM on x86_64.

Comment 8 Adam Huffman 2006-02-08 16:47:03 UTC
I noticed that there's a new revision of the megaraid driver in 2.6.16-rc2, so
I've just booted the machine to your 2.6.15-1.2005_FC4smp testing kernel, which
contains the new driver, to see if it makes any difference.

Comment 9 Adam Huffman 2006-02-13 20:10:36 UTC
There is now an entry for this problem in the kernel bugzilla:

http://bugzilla.kernel.org/show_bug.cgi?id=6052

Comment 10 Dave Jones 2006-09-17 02:07:02 UTC
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.

Comment 11 Dave Jones 2006-10-16 21:07:09 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 12 Adam Huffman 2006-10-17 23:23:00 UTC
Altering the setup of the megaraid card acted as a workaround for this bug,
which hasn't been seen now for some time.


Note You need to log in before you can comment on or make changes to this bug.