Bug 57592
Summary: | disk corruption on 2.4.9-13smp w/adaptec SCSI | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Preston Brown <pbrown> | ||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | Brock Organ <borgan> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.2 | CC: | sct, wendell | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2003-06-07 23:48:48 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Preston Brown
2001-12-17 03:55:50 UTC
Created attachment 40787 [details]
ksymoops output
The BUG() is coming from J_ASSERT_JH(jh, buffer_uptodate(jh2bh(jh))); which is somewhere deep in the journaling layer getting upset about the fact that there have been IO failures elsewhere. That shouldn't happen, but the failure is in an inode table, which is a bit hard to recover from if the block goes bad after you've already started using it. Part of the underlying problem here is the stupid block device layer, which only has one bit of error state and which heavy-handedly marks blocks as being non-uptodate if a write error occurs. That should be fixed for 2.5, but all filesystems will have the problem in 2.4 that they cannot reliably tell what blocks are actually uptodate in the presence of write errors. So the ext3 assert fail should probably be relaxed: I'll reproduce this and fix. Just so that I can decode the trace a little more accurately, can you tell me which kernel version this is? We have 3 different 2.4.9-13smp kernels: one each for i586, i686 and athlon. I don't know if this answers your problem: it's not clear from the report whether you are wanting the SCSI IO errors or the filesystem fixed. this is with the i686 -13smp kernel. I hadn't had any I/O errors for several days leading up to this crash, and the disk had been fsck'd, so I was assuming this might not be due to those previous SCSI I/O problems. I will update the report if anything else happens of consequence. I'm seeing basically the exact same problem. IBM xSeries 350 with integrated Adapted U160 SCSI with a secondary /data filesystem on an external RAID using the Adaptec SCSI. Primary boot filesystems on an IBM RaidServ 4LX RAID controller and are fine. The filesystem on the Adaptec is getting I/O errors, hangs, load average goes to 9+, etc... Running 2.4.9-13smp RedHat kernel, 2 x 700Mhz P3 XEON procs, 1.5GB RAM. Am planning on booting back to the stock 2.4.7 kernel to see if it's more stable. The server I described has an _exact_ twin sitting right beside it as a load- balanced fault-tolerant redundant server and it's experiencing the exact same problem. I'm fairly certain it's not faulty hardware. Wendell, what sort of errors are being reported in your logs? If there are no scsi errors being reported then this may be a new fs bug; otherwise it's likely to be the fs getting confused by errors coming back from the scsi layer, and that may indicate a driver or controller fault. No reply in over a year - closing |