From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90) I've experienced serious file corruption using a PIII with a SuperMicro 370DLE MB (ServerWorks LE chipset) and the stock 2.4.2-2 RH71 kernel. See attached dmesg and lspci -v. I noticed in the kernel source RPM that the OSB4 support is disabled because it is known to cause data corruption and I have tried turning that option on and off with the same results. I've since compiled a 2.4.4 kernel with the same config file used to compile the 2.4.2-2 RH71 kernel and the problem still exists (plus the added bonus that DMA no longer works reliably). We have 32 of these exact machines and around 12-15 of them experienced at least some file corruption. I am currently running RH71 with kernel 2.2.19 on them with no problem (DMA is working with no file corruption). Rob Todd Reproducible: Sometimes Steps to Reproduce: 1.Installed RH71 (installation would always complete fine) 2.Rebooted after installation complete Actual Results: Corruption would sometimes occur during installation (resulting in a system that gave general FS weirdness like I/O errors when you try to access a file) or would sometimes occur during normal operation. A reboot would result in fsck complaining and sometimes MANY errors on the filesystem. Expected Results: Umm... guess.
Created attachment 16866 [details] dmesg typical of crashed node
Created attachment 16867 [details] lspci -v typical of crashed node
The IDE part of the serverworks chipset is not used very often, as most machines with this chipset have SCSI disks. It seems that you are not the only one seeing these problems; as a workaround you can disable DMA altogether with the kernel commandline option "ide=nodma". This is a hard problem to solve as ServerWorks are not very eager with giving documentation about their chipset.
Does ide=nodma work ?
Yes, ide=nodma appears to work. In addition, SuperMicro has released a new version of their BIOS (1.3B) for the 370DLE Motherboard which appears to fix at least some of the problems. I've only been running the new BIOS with DMA enabled for a few days but I haven't encountered any corruption yet so I'm keeping my fingers crossed. Rob
That sounds like a biosbug then... I'm closing "NOTABUG" as it's closest to "NOTOURBUG" If corruption comes back, please reopen this bug.
I hope this won't mean a workaround wouldn't be sought.. People whose data just got corrupted while 2.2 kernel worked fine don't want to hear "sorry, bios bug". :-/
Well, some BIOS bugs ARE BIOS bugs no way around it. If we find a way to work around broken bioses we will; however this requires information from either the bios vendor or the chipset vendor .....
I have a related problem here on a Tyan Thunder S2510 board, which is a ServerWorks LE chipset. I tested it with RedHat 6.2 (2.2.14 and 2.2.19 RedHat kernels) and it happened there as well. The problem here happens whenever drives on both IDE interfaces are used at the same time (ie, hda+hdc running at the same time causes it, but hda+hdb could be run all day long and there wouldn't be a problem). The result is fast and major filesystem corruption. After rebooting, the filesystems are usually not even fsck'able. I'm using the latest Tyan BIOS already, so there's no fix there. Also, disabling DMA reduces the hard drive speed by somewhere in the 80-90% league (in terms of transfer rate), so I can't see how that'd be seen as a serious workaround for anything other than a lightly-used desktop.
Serverworks is now working with the IDE people to get a better driver; until now they considered IDE to be a "CDROM attachment protocol", now people are working to get a better IDE driver even for harddisks... Will be continued.
What is the consensus on whether this is really a bug in the 2.2.19 kernel as well? There is a report above that corruption has been seen on the Tyan board with this kernel and that the Supermicro board is OK. Here we have 136 machines, 2xPIII 1 GHZ, Supermicro 370DLE board, Serverworks LE chipset. We are running 2.2.19-6.2.1 kernel as compiled at RedHat. To date we have tried three different types of IDE system disks on these systems. With Seagate 30Gb drive we saw massive filesystem corruption as described above--nothing in syslog, just files disappearing from directories for no good reason. The vendor replaced all of them with 20Gb Western Digital which were OK for about a month but now we are seeing a few of them that show a large number of I/O errors, hanging the machine, but the machine recovers fine after reboot and fsck. The vendor says they have now seen similar problems on a Tyan board with serverworks chipset, and are suggesting trying IBM system drives. So, can people working on this bug please let us know what drive they are using, and also if a CD-ROM is connected to the same bus. It is with ours. Also, are they using SMP or not.
Older serverworks chipsets have serious IDE bugs; only very recently did the IDE maintainers get enough information to MOSTLY work around the problems; In any case, do NOT use the 2nd channel (cable) for anything that uses DMA.
Further information on this bug...the 2.4.7-2.9 rawhide kernel as compiled by RedHat doesn't fix this bug either...we still see the same errors as we saw under 2.2.19 kernel...namely a large number of I/O errors on the system disk, system hangs, is back fine after reboot and fsck. So far we have seen this on one out of 63 nodes in 5 days of running.