Bug 216638

Summary:

frequent Adaptec 29160 Ultra160 SCSI Controller crash

Product:

[Fedora] Fedora

Reporter:

Fabrice Bellet <fabrice>

Component:

kernel

Assignee:

Tom Coughlan <coughlan>

Status:

CLOSED NOTABUG

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

davej, fabrice, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-01-04 11:29:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
/var/log/messages	none

Description Fabrice Bellet 2006-11-21 09:49:28 UTC

Hi,

I observe frequently a crash of my ultra160 scsi controller under high load,
during nightly tape backups. I run 2.6.18-1.2200.fc5smp on a dual xeon 2.40GHz
box, I have two ultra160 scsi controllers, one for an external disk array in hw
raid configuration, and one for an external LTO tape unit :

[root@disk aic7xxx]# more /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: transtec Model:                  Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi1 Channel: 00 Id: 03 Lun: 00
  Vendor: CERTANCE Model: ULTRIUM 3        Rev: 1722
  Type:   Sequential-Access                ANSI SCSI revision: 04

==> lspci
03:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 177
        BIST result: 00
        I/O ports at 3000 [disabled] [size=256]
        Memory at e2120000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4000000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
[...]
04:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160 Ultra160 SCSI Controller
        Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 185
        BIST result: 00
        I/O ports at 4000 [disabled] [size=256]
        Memory at e2200000 (64-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at c4100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

Whan crashing, the card attached to the disk array dumps its state, and the
attached disk partition is remounted ro, see attached log. A simple reboot is
usually sufficient to recover from the situation, the disks array appears to be
healthy, disks don't have bad sectors AFAIK, and the LED panel of the disk array
shows no RAID error.

Few months ago, when facing a similar crash, I replaced the Adaptec card with
another one, same brand, same model, with no luck. So, I'd say that the scsi
internal card is problably safe.

May it be a driver error ? A dying raid controller ? Bad scsi connectivity ? How
could I further investigate the problem ?

Comment 1 Fabrice Bellet 2006-11-21 09:49:28 UTC

Created attachment 141747 [details]
/var/log/messages

Comment 2 Fabrice Bellet 2007-01-04 11:29:49 UTC

The problem was caused by an almost-dying disk in the raid array. After a cold
reboot, the disk definitely failed to spin up again. And since it has been
replaced, one month ago, the error never occured again. So we can probably close
this bug.