Bug 216638 - frequent Adaptec 29160 Ultra160 SCSI Controller crash
frequent Adaptec 29160 Ultra160 SCSI Controller crash
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-21 04:49 EST by Fabrice Bellet
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-04 06:29:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages (58.23 KB, text/plain)
2006-11-21 04:49 EST, Fabrice Bellet
no flags Details

  None (edit)
Description Fabrice Bellet 2006-11-21 04:49:28 EST
Hi,

I observe frequently a crash of my ultra160 scsi controller under high load,
during nightly tape backups. I run 2.6.18-1.2200.fc5smp on a dual xeon 2.40GHz
box, I have two ultra160 scsi controllers, one for an external disk array in hw
raid configuration, and one for an external LTO tape unit :

[root@disk aic7xxx]# more /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: transtec Model:                  Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 03 Lun: 00
  Vendor: CERTANCE Model: ULTRIUM 3        Rev: 1722
  Type:   Sequential-Access                ANSI SCSI revision: 04

==> lspci
03:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 177
        BIST result: 00
        I/O ports at 3000 [disabled] [size=256]
        Memory at e2120000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
[...]
04:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 185
        BIST result: 00
        I/O ports at 4000 [disabled] [size=256]
        Memory at e2200000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

Whan crashing, the card attached to the disk array dumps its state, and the
attached disk partition is remounted ro, see attached log. A simple reboot is
usually sufficient to recover from the situation, the disks array appears to be
healthy, disks don't have bad sectors AFAIK, and the LED panel of the disk array
shows no RAID error.

Few months ago, when facing a similar crash, I replaced the Adaptec card with
another one, same brand, same model, with no luck. So, I'd say that the scsi
internal card is problably safe.

May it be a driver error ? A dying raid controller ? Bad scsi connectivity ? How
could I further investigate the problem ?
Comment 1 Fabrice Bellet 2006-11-21 04:49:28 EST
Created attachment 141747 [details]
/var/log/messages
Comment 2 Fabrice Bellet 2007-01-04 06:29:49 EST
The problem was caused by an almost-dying disk in the raid array. After a cold
reboot, the disk definitely failed to spin up again. And since it has been
replaced, one month ago, the error never occured again. So we can probably close
this bug.

Note You need to log in before you can comment on or make changes to this bug.