From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20051025 Firefox/1.5 Description of problem: There are two dual Opteron machines here with 8G RAM and /home on a RAID1 array connected to a Megaraid controller. Several times now the journal has been aborted and in one case fsck could not resolve the situation, meaning I had to delete the filesystem. Here are the error messages: EXT3-fs error (device sdb2): ext3_new_block: Allocating block in system zone - block = 55593720 Aborting journal on device sdb2. EXT3-fs error (device sdb2) in ext3_prepare_write: Journal has aborted ext3_abort called. EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only __journal_remove_journal_head: freeing b_committed_data There are two other machines identical to those but with only 4G RAM. They haven't shown any such errors. Version-Release number of selected component (if applicable): 2.6.13-1.1532_FC4smp How reproducible: Always Steps to Reproduce: 1. Use system as normal 2. Journal aborts, /home is mounted read-only 3. Actual Results: Massive filesystem corruption in first case Additional info:
The motherboard is a Tyan Thunder K8W
2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you.
Rebooted to the latest kernel yesterday, which seemed to resolve some performance problems with the MegaRAID card. However, a user logged in to the machine today and immediately found the /home had been remounted read-only, because of the journal had aborted. So, the latest kernel does not appear to have solved this problem. I think this may be related to the following bug in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=305977
Hello. We just saw this as well running 2.6.14-1.1653_FC4 on a Athlon XP. Just a standard ATA disk, no RAID involved. Happy to provide more info. Here is the tail of dmesg: EXT3-fs error (device hdb1): ext3_new_block: Allocating block in system zone - block = 91127808 Aborting journal on device hdb1. EXT3-fs error (device hdb1) in ext3_prepare_write: Journal has aborted ext3_abort called. EXT3-fs error (device hdb1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_committed_data
PS During fsck, there is a __ton__ of filesystem damage, more than I've ever seen before.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you.
I shall try to test on the one machine remaining in this configuration. The first one has been altered to use software RAID as I had to get it working. There has been no repeat of the filesystem corruption using the MD layer. So it seems there is a bug specific to the Megaraid driver and >4G RAM on x86_64.
I noticed that there's a new revision of the megaraid driver in 2.6.16-rc2, so I've just booted the machine to your 2.6.15-1.2005_FC4smp testing kernel, which contains the new driver, to see if it makes any difference.
There is now an entry for this problem in the kernel bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=6052
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Altering the setup of the megaraid card acted as a workaround for this bug, which hasn't been seen now for some time.