Red Hat Bugzilla – Bug 38429
Ext2 file corruption with RH71 2.4.2-2 kernel and ServerWorks chipset
Last modified: 2007-04-18 12:32:55 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)
I've experienced serious file corruption using a PIII with a SuperMicro
370DLE MB (ServerWorks LE chipset) and the stock 2.4.2-2 RH71 kernel. See
attached dmesg and lspci -v.
I noticed in the kernel source RPM that the OSB4 support is disabled
because it is known to cause data corruption and I have tried turning that
option on and off with the same results. I've since compiled a 2.4.4
kernel with the same config file used to compile the 2.4.2-2 RH71 kernel
and the problem still exists (plus the added bonus that DMA no longer
works reliably). We have 32 of these exact machines and around 12-15 of
them experienced at least some file corruption. I am currently running
RH71 with kernel 2.2.19 on them with no problem (DMA is working with no
Steps to Reproduce:
1.Installed RH71 (installation would always complete fine)
2.Rebooted after installation complete
Actual Results: Corruption would sometimes occur during installation
(resulting in a system that gave general FS weirdness like I/O errors when
you try to access a file) or would sometimes occur during normal
operation. A reboot would result in fsck complaining and sometimes MANY
errors on the filesystem.
Expected Results: Umm... guess.
Created attachment 16866 [details]
dmesg typical of crashed node
Created attachment 16867 [details]
lspci -v typical of crashed node
The IDE part of the serverworks chipset is not used very often, as most machines
with this chipset have SCSI disks. It seems that you are not the only one seeing
these problems; as a workaround you can disable DMA altogether with the
kernel commandline option "ide=nodma". This is a hard problem to solve as
ServerWorks are not very eager with giving documentation about their chipset.
Does ide=nodma work ?
Yes, ide=nodma appears to work. In addition, SuperMicro has released a new
version of their BIOS (1.3B) for the 370DLE Motherboard which appears to fix at
least some of the problems. I've only been running the new BIOS with DMA
enabled for a few days but I haven't encountered any corruption yet so I'm
keeping my fingers crossed.
That sounds like a biosbug then...
I'm closing "NOTABUG" as it's closest to "NOTOURBUG"
If corruption comes back, please reopen this bug.
I hope this won't mean a workaround wouldn't be sought.. People whose data just
got corrupted while 2.2 kernel worked fine don't want to hear "sorry, bios bug".
Well, some BIOS bugs ARE BIOS bugs no way around it. If we find a way to
work around broken bioses we will; however this requires information from either
the bios vendor or the chipset vendor .....
I have a related problem here on a Tyan Thunder S2510 board, which is a
ServerWorks LE chipset. I tested it with RedHat 6.2 (2.2.14 and 2.2.19 RedHat
kernels) and it happened there as well. The problem here happens whenever
drives on both IDE interfaces are used at the same time (ie, hda+hdc running at
the same time causes it, but hda+hdb could be run all day long and there
wouldn't be a problem). The result is fast and major filesystem corruption.
After rebooting, the filesystems are usually not even fsck'able.
I'm using the latest Tyan BIOS already, so there's no fix there. Also,
disabling DMA reduces the hard drive speed by somewhere in the 80-90% league (in
terms of transfer rate), so I can't see how that'd be seen as a serious
workaround for anything other than a lightly-used desktop.
Serverworks is now working with the IDE people to get a better driver; until now
they considered IDE to be a "CDROM attachment protocol", now people are working
to get a better IDE driver even for harddisks... Will be continued.
What is the consensus on whether this is really a bug in the
2.2.19 kernel as well? There is a report above that corruption has
been seen on the Tyan board with this kernel and that the Supermicro
board is OK.
Here we have 136 machines, 2xPIII 1 GHZ, Supermicro 370DLE board,
Serverworks LE chipset. We are running 2.2.19-6.2.1 kernel as compiled
at RedHat. To date we have tried three different types of IDE system
disks on these systems. With Seagate 30Gb drive we saw massive
filesystem corruption as described above--nothing in syslog,
just files disappearing from directories for no good reason.
The vendor replaced all of them with 20Gb Western Digital which were
OK for about a month but now we are seeing a few of them that
show a large number of I/O errors, hanging the machine, but the
machine recovers fine after reboot and fsck. The vendor says they
have now seen similar problems on a Tyan board with serverworks chipset,
and are suggesting trying IBM system drives. So, can people working
on this bug please let us know what drive they are using, and also
if a CD-ROM is connected to the same bus. It is with ours. Also, are they
using SMP or not.
Older serverworks chipsets have serious IDE bugs; only very recently did the IDE
maintainers get enough information to MOSTLY work around the problems;
In any case, do NOT use the 2nd channel (cable) for anything that uses DMA.
Further information on this bug...the 2.4.7-2.9 rawhide kernel
as compiled by RedHat doesn't fix this bug either...we still see
the same errors as we saw under 2.2.19 kernel...namely a large
number of I/O errors on the system disk, system hangs, is back
fine after reboot and fsck.
So far we have seen this on one out of 63 nodes in 5 days of running.