Bug 155162
Summary: | ext3 journal aborts | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | David Juran <djuran> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | davej, pfrields, sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-05-04 13:41:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Juran
2005-04-17 11:22:48 UTC
The journal abort message simply means that something happened that the filesystem considered too serious to continue to write to the disk. In this case, ext3_free_blocks_sb: bit already cleared for block 6644165 you've got corruption in either a bitmap or indirect block: you'll need to force a full fsck on the filesystem (the journal abort should record an error status in the journal that will force a full fsck automatically on the next boot.) But that doesn't tell us where the original error came from; the Apr 16 11:30:23 c83-248-2-72 kernel: hdb: dma_timer_expiry: dma status == 0x60 Apr 16 11:30:23 c83-248-2-72 kernel: hdb: DMA timeout retry Apr 16 11:30:23 c83-248-2-72 kernel: hdb: timeout waiting for DMA Apr 16 11:30:28 c83-248-2-72 kernel: hdb: status timeout: status=0xd0 { Busy } errors indicate that the root cause of this problem is probably in the IDE layer, not in the filesystem at all. Well, I realize this report is a bit thin on detail )-: And yes, I managed to do a full recovery running fsck manually. I have the log from the fsck run, but I doubt that it would give you substantially more to go on. A coupple of illegal blocks cleared a 'Extended attribute block with reference count 15 instead of 16', a 'free blocks count wrong for group #202' ... I've now upgraded to kernel-smp-2.6.11-1.1240_FC4, but if the problem would reoccur, is there anything I could log from a running kernel that would help you pinpoint the error if I would notice that the filesystem is acting up again? The incorrect xattr refcounts are most likely the result of a reference counting bug in the intial FC3 release which we've since fixed. It is conceivable that the "bit already cleared" also resulted from that same problem, as an incorrect xattr refcount could in theory lead to such a block being released early. Can I assume you have SELinux enabled on this filesystem? Yes, I have selinux enabled (Though I did briefly, just before installing this kernel turn it off a boot time). This also was the first kernel that I've been running that deviated from the ones released for FC3. OK, then it's possible that the filesystem/fsck complaints were just due to the old xattr bug. That still leaves the ATA DMA complaints, though. Can you please report back if you see any further filesystem problems even with recent kernels? A very similar thing happened again, this time with kernel-smp-2.6.12-1.1398_FC4. One note that might be of value is that this happened under heave filesystem stress while running yum, copying a DVD and a couple of other things. Below is an excerpt from dmesg hdb: dma_timer_expiry: dma status == 0x60 hdb: DMA timeout retry hdb: timeout waiting for DMA hdb: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hda: DMA disabled hdb: drive not ready for command ide0: reset: success UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'DVDVolume', timestamp 2036/02/07 10:58 (1000) SELinux: initialized (dev hdd, type udf), uses genfs_contexts EXT3-fs error (device hdb2): ext3_add_entry: bad entry in directory #230378: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 Aborting journal on device hdb2. EXT3-fs error (device hdb2) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device hdb2) in ext3_dirty_inode: Journal has aborted ext3_abort called. EXT3-fs error (device hdb2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device hdb2) in start_transaction: Journal has aborted EXT3-fs error (device hdb2) in ext3_create: IO failure __journal_remove_journal_head: freeing b_committed_data . . . __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data . . . __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_committed_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_frozen_data __journal_remove_journal_head: freeing b_committed_data journal commit I/O error journal commit I/O error Mass update to all FC4 bugs: An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks. 2.6.14-1.1637_FC4 has been released as an update for FC4. Please retest with this update, as a large amount of code has been changed in this release, which may have fixed your problem. Thank you. This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you. Closing per previous comment. |