Bug 216638

Summary: frequent Adaptec 29160 Ultra160 SCSI Controller crash
Product: [Fedora] Fedora Reporter: Fabrice Bellet <fabrice>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: davej, fabrice, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-04 11:29:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages none

Description Fabrice Bellet 2006-11-21 09:49:28 UTC
Hi,

I observe frequently a crash of my ultra160 scsi controller under high load,
during nightly tape backups. I run 2.6.18-1.2200.fc5smp on a dual xeon 2.40GHz
box, I have two ultra160 scsi controllers, one for an external disk array in hw
raid configuration, and one for an external LTO tape unit :

[root@disk aic7xxx]# more /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: transtec Model:                  Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 03 Lun: 00
  Vendor: CERTANCE Model: ULTRIUM 3        Rev: 1722
  Type:   Sequential-Access                ANSI SCSI revision: 04

==> lspci
03:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 177
        BIST result: 00
        I/O ports at 3000 [disabled] [size=256]
        Memory at e2120000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
[...]
04:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 185
        BIST result: 00
        I/O ports at 4000 [disabled] [size=256]
        Memory at e2200000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

Whan crashing, the card attached to the disk array dumps its state, and the
attached disk partition is remounted ro, see attached log. A simple reboot is
usually sufficient to recover from the situation, the disks array appears to be
healthy, disks don't have bad sectors AFAIK, and the LED panel of the disk array
shows no RAID error.

Few months ago, when facing a similar crash, I replaced the Adaptec card with
another one, same brand, same model, with no luck. So, I'd say that the scsi
internal card is problably safe.

May it be a driver error ? A dying raid controller ? Bad scsi connectivity ? How
could I further investigate the problem ?

Comment 1 Fabrice Bellet 2006-11-21 09:49:28 UTC
Created attachment 141747 [details]
/var/log/messages

Comment 2 Fabrice Bellet 2007-01-04 11:29:49 UTC
The problem was caused by an almost-dying disk in the raid array. After a cold
reboot, the disk definitely failed to spin up again. And since it has been
replaced, one month ago, the error never occured again. So we can probably close
this bug.