Bug 170699

Summary: sata_sil: SiI 3112 data corruption / hang
Product: Red Hat Enterprise Linux 4 Reporter: Joe Orton <jorton>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jgarzik
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-26 15:32:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/dmesg
none
"lspci -v" output
none
current dmesg output none

Description Joe Orton 2005-10-13 20:35:07 UTC
Description of problem:
Using a SiI 3112A card, I'm getting data corruption and eventually the devices
hang.  Reproduced with a pair of Hitachi 250Gb disks (7K250).

Version-Release number of selected component (if applicable):
kernel-2.6.9-22.EL

How reproducible:
always

Steps to Reproduce:
1. mdadm -Cv /dev/md0 -l0 -n2 -c128 /dev/sd{a,b}1
2. mke2fs -m0 -R stride=64 /dev/md0
3. mount /dev/md0 /foo
4. copy lots of data on /foo
  
Actual results:
The drives initially produced lots of errors like:

ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x04 { DriveStatusError }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x04 { DriveStatusError }
ata1: status=0x51 { DriveReady SeekComplete Error }

and data written is corrupted.  Eventually, the drives

ata1: command 0x35 timeout, stat 0xd8 host_stat 0x1
ata1: status=0xd8 { Busy }
SCSI error : <0 0 0 0> return code = 0x8000002
Current sda: sense key Aborted Command
Additional sense: Scsi parity error
end_request: I/O error, dev sda, sector 289145151
ATA: abnormal status 0xD8 on port 0xD082A087
ATA: abnormal status 0xD8 on port 0xD082A087
ATA: abnormal status 0xD8 on port 0xD082A087
ata1: command 0x35 timeout, stat 0xd8 host_stat 0x1
ata1: status=0xd8 { Busy }

Comment 1 Joe Orton 2005-10-13 20:35:07 UTC
Created attachment 119945 [details]
/var/log/dmesg

Comment 2 Joe Orton 2005-10-13 20:36:21 UTC
Created attachment 119946 [details]
"lspci -v" output

Comment 3 Joe Orton 2005-10-13 20:38:18 UTC
Created attachment 119947 [details]
current dmesg output

The comment should have read... "eventually the devices hang "

Comment 4 Dan Carpenter 2005-10-13 22:16:02 UTC
What firmware are you using?  I had some problems with software RAID on sata sil
card.  It worked fine under a lot of stress but I couldn't install software RAID
on it until I upgraded to the 5.0.48 firmware.



Comment 5 Joe Orton 2005-10-14 08:00:32 UTC
No idea, how do I tell?

Comment 6 Dan Carpenter 2005-10-14 21:03:59 UTC
It should say in BIOS post while you boot.


Comment 7 Joe Orton 2005-10-17 10:18:42 UTC
Ah, it says version 4.1.34 on boot; I can find 4.2.50 available from:

http://www.siliconimage.com/support/supportsearchresults.aspx?pid=63&cid=15&ctid=2&

but no 5.0.48.  Am I missing anything?

Comment 8 Joe Orton 2005-10-17 16:04:22 UTC
I tried flashing one of the cards to the 4.2.50 firmware and now the machine
refuses to boot with any drives attached to that card.

Comment 9 Dan Carpenter 2005-10-18 23:58:14 UTC
Gah...  That really bites.  :/

I'm going to hide in the corner.  Try pinging support to get
the old firmware back.



Comment 10 Joe Orton 2005-10-26 15:32:00 UTC
Well, moving the card to a new machine seems to have fixed it, so I think this
was some issue with the creaky old motherboard in the original test box.