Bug 166151
Summary: | Short read causes journal corruption in ext3 fs | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ken Presser <capnlinux> |
Component: | kernel | Assignee: | Stephen Tweedie <sct> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | davej, hafflys, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
URL: | http://forums.fedoraforum.org/forum/showthread.php?t=72860&highlight=short+read | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-08-30 11:36:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ken Presser
2005-08-17 15:23:35 UTC
Please see the url listed for a more complete discussion of the problem on FedoraForum.org. Buffer I/O error on device hda1, logical block 526 hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=1115, sector=1115 is a sign of a hardware error. I don't think there's a bug here --- just a bad sector which, due to seriously bad luck, landed bang in the middle of the journal. So yes, recreating the journal is the workaround; but the underlying problem is hardware, not software. Does this then bring up the question of when the journal was created in the first place during the formatting of the partition, that there might be inadequate checks to make sure that there are no bad sectors, or that bad sectors are mapped out before the journal and other structures are written? Hardware problems may have caused this issue in one case, but there are more of us that have experienced this problem than just Mr. Tweedie. I believe there might be something else going on here, but I don't have the skill to say what it is. I just know that all of a sudden, my partition became read-only, and since I didn't know how to recreate the journal, I wound up wiping the whole installation and reinstalling. Fortunately, I had a backup of my home partition so it didn't take too long to get back to where I was. Stephen Bad sector checking is by-and-large just not useful on modern disk drives. If there has been an error on a sector, it gets remapped transparently on the next write; the O/S never sees it. Certainly, having code in e2fsprogs to re-write the journal automatically if it detects this sort of thing could be useful. But the dma_intr: error=0x40 { UncorrectableError } error is just the kernel reporting what the disk drive told us about a bad sector --- it's not something that the kernel can handle on its own. The problem is, I didn't get this error message when my partition became read-only. I didn't get any error message at all. I rebooted, and tried to do a FSCHK, but didn't know how to answer the questions. That is when I wiped and reinstalled. Does the problem I had differ enough that it should be listed as another bug, leaving this one closed? When the URL mentioned in the Additional Bug Information is viewed, at least two of us had this problem that does not look to be connected with the bad sector error message. that caplinux had. If it was a kernel error, it might be a moot point since kernel 2.6.12-1.1447_FC4 just came through on Fedora Updates. I am just a bit paranoid about this since it did take the better part of the week to get all the programs set back up they way they were before. I don't want to have to go through this again. Respectfully, Stephen The kernel *always* emits an error when turning the partition read-only. The only code in the whole of the kernel capable of making a partition read-only in response to an error (as opposed to in response to an explicit user request) unconditionally emits an error to say that it is doing so. Now, you may have missed it --- if you were running under X, and the kernel error logs were on the root filesystem, and the root fs itself became readonly, then obviously you'd miss the console message and the syslog copy would not be writable. It's quite common not to *see* the error due to this combination. But it will still be produced. The question is how to capture it; serial or network console is the recommended mechanism. I guess I can accept that this was just a hardware problem and not a bug. It being just a coincidence that 3 people had the exact same problem running the exact same code. At least it has been pointed out so that future occurances might be suspect as indicating a real problem. My system has been running fine for several weeks now since recovering the journal. |