Bug 38429
Summary: | Ext2 file corruption with RH71 2.4.2-2 kernel and ServerWorks chipset | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Rob Todd <rtodd> | ||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Brock Organ <borgan> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.1 | CC: | dragon, gary.collins, pekkas, rtodd, vat.bilavarn | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2001-05-21 15:17:55 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Rob Todd
2001-04-30 18:06:41 UTC
Created attachment 16866 [details]
dmesg typical of crashed node
Created attachment 16867 [details]
lspci -v typical of crashed node
The IDE part of the serverworks chipset is not used very often, as most machines with this chipset have SCSI disks. It seems that you are not the only one seeing these problems; as a workaround you can disable DMA altogether with the kernel commandline option "ide=nodma". This is a hard problem to solve as ServerWorks are not very eager with giving documentation about their chipset. Does ide=nodma work ? Yes, ide=nodma appears to work. In addition, SuperMicro has released a new version of their BIOS (1.3B) for the 370DLE Motherboard which appears to fix at least some of the problems. I've only been running the new BIOS with DMA enabled for a few days but I haven't encountered any corruption yet so I'm keeping my fingers crossed. Rob That sounds like a biosbug then... I'm closing "NOTABUG" as it's closest to "NOTOURBUG" If corruption comes back, please reopen this bug. I hope this won't mean a workaround wouldn't be sought.. People whose data just got corrupted while 2.2 kernel worked fine don't want to hear "sorry, bios bug". :-/ Well, some BIOS bugs ARE BIOS bugs no way around it. If we find a way to work around broken bioses we will; however this requires information from either the bios vendor or the chipset vendor ..... I have a related problem here on a Tyan Thunder S2510 board, which is a ServerWorks LE chipset. I tested it with RedHat 6.2 (2.2.14 and 2.2.19 RedHat kernels) and it happened there as well. The problem here happens whenever drives on both IDE interfaces are used at the same time (ie, hda+hdc running at the same time causes it, but hda+hdb could be run all day long and there wouldn't be a problem). The result is fast and major filesystem corruption. After rebooting, the filesystems are usually not even fsck'able. I'm using the latest Tyan BIOS already, so there's no fix there. Also, disabling DMA reduces the hard drive speed by somewhere in the 80-90% league (in terms of transfer rate), so I can't see how that'd be seen as a serious workaround for anything other than a lightly-used desktop. Serverworks is now working with the IDE people to get a better driver; until now they considered IDE to be a "CDROM attachment protocol", now people are working to get a better IDE driver even for harddisks... Will be continued. What is the consensus on whether this is really a bug in the 2.2.19 kernel as well? There is a report above that corruption has been seen on the Tyan board with this kernel and that the Supermicro board is OK. Here we have 136 machines, 2xPIII 1 GHZ, Supermicro 370DLE board, Serverworks LE chipset. We are running 2.2.19-6.2.1 kernel as compiled at RedHat. To date we have tried three different types of IDE system disks on these systems. With Seagate 30Gb drive we saw massive filesystem corruption as described above--nothing in syslog, just files disappearing from directories for no good reason. The vendor replaced all of them with 20Gb Western Digital which were OK for about a month but now we are seeing a few of them that show a large number of I/O errors, hanging the machine, but the machine recovers fine after reboot and fsck. The vendor says they have now seen similar problems on a Tyan board with serverworks chipset, and are suggesting trying IBM system drives. So, can people working on this bug please let us know what drive they are using, and also if a CD-ROM is connected to the same bus. It is with ours. Also, are they using SMP or not. Older serverworks chipsets have serious IDE bugs; only very recently did the IDE maintainers get enough information to MOSTLY work around the problems; In any case, do NOT use the 2nd channel (cable) for anything that uses DMA. Further information on this bug...the 2.4.7-2.9 rawhide kernel as compiled by RedHat doesn't fix this bug either...we still see the same errors as we saw under 2.2.19 kernel...namely a large number of I/O errors on the system disk, system hangs, is back fine after reboot and fsck. So far we have seen this on one out of 63 nodes in 5 days of running. |