Bug 216638 - frequent Adaptec 29160 Ultra160 SCSI Controller crash
Summary: frequent Adaptec 29160 Ultra160 SCSI Controller crash
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 5
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-11-21 09:49 UTC by Fabrice Bellet
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2007-01-04 11:29:49 UTC


Attachments (Terms of Use)
/var/log/messages (58.23 KB, text/plain)
2006-11-21 09:49 UTC, Fabrice Bellet
no flags Details

Description Fabrice Bellet 2006-11-21 09:49:28 UTC
Hi,

I observe frequently a crash of my ultra160 scsi controller under high load,
during nightly tape backups. I run 2.6.18-1.2200.fc5smp on a dual xeon 2.40GHz
box, I have two ultra160 scsi controllers, one for an external disk array in hw
raid configuration, and one for an external LTO tape unit :

[root@disk aic7xxx]# more /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: transtec Model:                  Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 03 Lun: 00
  Vendor: CERTANCE Model: ULTRIUM 3        Rev: 1722
  Type:   Sequential-Access                ANSI SCSI revision: 04

==> lspci
03:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 177
        BIST result: 00
        I/O ports at 3000 [disabled] [size=256]
        Memory at e2120000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
[...]
04:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 185
        BIST result: 00
        I/O ports at 4000 [disabled] [size=256]
        Memory at e2200000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

Whan crashing, the card attached to the disk array dumps its state, and the
attached disk partition is remounted ro, see attached log. A simple reboot is
usually sufficient to recover from the situation, the disks array appears to be
healthy, disks don't have bad sectors AFAIK, and the LED panel of the disk array
shows no RAID error.

Few months ago, when facing a similar crash, I replaced the Adaptec card with
another one, same brand, same model, with no luck. So, I'd say that the scsi
internal card is problably safe.

May it be a driver error ? A dying raid controller ? Bad scsi connectivity ? How
could I further investigate the problem ?

Comment 1 Fabrice Bellet 2006-11-21 09:49:28 UTC
Created attachment 141747 [details]
/var/log/messages

Comment 2 Fabrice Bellet 2007-01-04 11:29:49 UTC
The problem was caused by an almost-dying disk in the raid array. After a cold
reboot, the disk definitely failed to spin up again. And since it has been
replaced, one month ago, the error never occured again. So we can probably close
this bug.


Note You need to log in before you can comment on or make changes to this bug.