Bug 38429

Summary: Ext2 file corruption with RH71 2.4.2-2 kernel and ServerWorks chipset
Product: [Retired] Red Hat Linux Reporter: Rob Todd <rtodd>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED NOTABUG QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: dragon, gary.collins, pekkas, rtodd, vat.bilavarn
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-05-21 15:17:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg typical of crashed node
none
lspci -v typical of crashed node none

Description Rob Todd 2001-04-30 18:06:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)


I've experienced serious file corruption using a PIII with a SuperMicro 
370DLE MB (ServerWorks LE chipset) and the stock 2.4.2-2 RH71 kernel.  See 
attached dmesg and lspci -v.

I noticed in the kernel source RPM that the OSB4 support is disabled 
because it is known to cause data corruption and I have tried turning that 
option on and off with the same results.  I've since compiled a 2.4.4 
kernel with the same config file used to compile the 2.4.2-2 RH71 kernel 
and the problem still exists (plus the added bonus that DMA no longer 
works reliably).  We have 32 of these exact machines and around 12-15 of 
them experienced at least some file corruption.  I am currently running 
RH71 with kernel 2.2.19 on them with no problem (DMA is working with no 
file corruption).

Rob Todd

Reproducible: Sometimes
Steps to Reproduce:
1.Installed RH71 (installation would always complete fine)
2.Rebooted after installation complete

	

Actual Results:  Corruption would sometimes occur during installation 
(resulting in a system that gave general FS weirdness like I/O errors when 
you try to access a file) or would sometimes occur during normal 
operation.  A reboot would result in fsck complaining and sometimes MANY 
errors on the filesystem.

Expected Results:  Umm... guess.

Comment 1 Rob Todd 2001-04-30 18:07:28 UTC
Created attachment 16866 [details]
dmesg typical of crashed node

Comment 2 Rob Todd 2001-04-30 18:08:29 UTC
Created attachment 16867 [details]
lspci -v typical of crashed node

Comment 3 Arjan van de Ven 2001-05-01 08:54:31 UTC
The IDE part of the serverworks chipset is not used very often, as most machines
with this chipset have SCSI disks. It seems that you are not the only one seeing
these problems; as a workaround you can disable DMA altogether with the
kernel commandline option "ide=nodma". This is a hard problem to solve as
ServerWorks are not very eager with giving documentation about their chipset.

Comment 4 Arjan van de Ven 2001-05-21 10:14:02 UTC
Does ide=nodma work ?

Comment 5 Rob Todd 2001-05-21 15:17:51 UTC
Yes, ide=nodma appears to work.  In addition, SuperMicro has released a new 
version of their BIOS (1.3B) for the 370DLE Motherboard which appears to fix at 
least some of the problems.  I've only been running the new BIOS with DMA 
enabled for a few days but I haven't encountered any corruption yet so I'm 
keeping my fingers crossed.

Rob

Comment 6 Arjan van de Ven 2001-05-21 15:23:28 UTC
That sounds like a biosbug then...
I'm closing "NOTABUG" as it's closest to "NOTOURBUG"
If corruption comes back, please reopen this bug.

Comment 7 Pekka Savola 2001-05-21 15:51:54 UTC
I hope this won't mean a workaround wouldn't be sought..  People whose data just
got corrupted while 2.2 kernel worked fine don't want to hear "sorry, bios bug".
:-/


Comment 8 Arjan van de Ven 2001-05-21 15:54:12 UTC
Well, some BIOS bugs ARE BIOS bugs no way around it. If we find a way to
work around broken bioses we will; however this requires information from either
the bios vendor or the chipset vendor .....

Comment 9 Bob Farmer 2001-05-28 07:07:36 UTC
I have a related problem here on a Tyan Thunder S2510 board, which is a
ServerWorks LE chipset.  I tested it with RedHat 6.2 (2.2.14 and 2.2.19 RedHat
kernels) and it happened there as well.  The problem here happens whenever
drives on both IDE interfaces are used at the same time (ie, hda+hdc running at
the same time causes it, but hda+hdb could be run all day long and there
wouldn't be a problem).  The result is fast and major filesystem corruption. 
After rebooting, the filesystems are usually not even fsck'able.

I'm using the latest Tyan BIOS already, so there's no fix there.  Also,
disabling DMA reduces the hard drive speed by somewhere in the 80-90% league (in
terms of transfer rate), so I can't see how that'd be seen as a serious
workaround for anything other than a lightly-used desktop.



Comment 10 Arjan van de Ven 2001-05-28 09:15:16 UTC
Serverworks is now working with the IDE people to get a better driver; until now
they considered IDE to be a "CDROM  attachment protocol", now people are working
to get a better IDE driver even for harddisks... Will be continued.

Comment 11 Steve Timm 2001-09-10 17:04:57 UTC
What is the consensus on whether this is really a bug in the 
2.2.19 kernel as well?  There is a report above that corruption has 
been seen on the Tyan board with this kernel and that the Supermicro
board is OK.

Here we have 136 machines, 2xPIII 1 GHZ, Supermicro 370DLE board,
Serverworks LE chipset.  We are running 2.2.19-6.2.1 kernel as compiled
at RedHat.  To date we have tried three different types of IDE system 
disks on these systems.  With Seagate 30Gb drive we saw massive
filesystem corruption as described above--nothing in syslog, 
just files disappearing from directories for no good reason.  
The vendor replaced all of them with 20Gb Western Digital which were
OK for about a month but now we are seeing a few of them that
show a large number of I/O errors, hanging the machine, but the 
machine recovers fine after reboot and fsck.  The vendor says they
have now seen similar problems on a Tyan board with serverworks chipset,
and are suggesting trying IBM system drives.  So, can people working
on this bug please let us know what drive they are using, and also 
if a CD-ROM is connected to the same bus.  It is with ours.  Also, are they 
using SMP or not.





Comment 12 Arjan van de Ven 2001-09-10 17:07:49 UTC
Older serverworks chipsets have serious IDE bugs; only very recently did the IDE
maintainers get enough information to MOSTLY work around the problems;
In any case, do NOT use the 2nd channel (cable) for anything that uses DMA.

Comment 13 Steve Timm 2001-09-25 17:45:35 UTC
Further information on this bug...the 2.4.7-2.9 rawhide kernel 
as compiled by RedHat doesn't fix this bug either...we still see
the same errors as we saw under 2.2.19 kernel...namely a large 
number of I/O errors on the system disk, system hangs, is back
fine after reboot and fsck.

So far we have seen this on one out of 63 nodes in 5 days of running.