Bug 465771 - sata_promise locks up machine after timeout
Summary: sata_promise locks up machine after timeout
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 9
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-06 11:23 UTC by Kieran Clancy
Modified: 2008-11-19 14:55 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-11-14 12:13:17 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Kieran Clancy 2008-10-06 11:23:35 UTC
Description of problem:

I have a Promise PCI SATA card, with two SATA disks (see lspci output below). Sometimes during heavy I/O (or just at seemingly random times), the machine will hang/lock up. I have had this problem since installing the PCI card and disks, both on FC7 kernels and FC9 kernels (fresh install), but it is brand new hardware (and from googling it seems other people have had this issue, so I suspect it's not a hardware problem). When it hangs it often stops responding to sysrq etc, requiring a manual reset. I have just hooked up a serial cable to get any errors from the machine.

Here is the error message from the last time it happened:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x6 frozen
ata2: SError: { 10B8B Dispar }
ata2.00: cmd 35/00:00:d9:05:6e/00:04:01:00:00/e0 tag 0 dma 524288 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: hard resetting link

Have also seen it with:
ata2.00: cmd 35/00:00:d9:cf:c5/00:04:07:00:00/e0 tag 0 dma 524288 out
ata2.00: cmd 35/00:80:59:1f:ad/00:03:00:00:00/e0 tag 0 dma 458752 out

The above error seems like it should be recoverable (read timeout; try again, or at least fail and let the other disk in my RAID take over), but it never comes back from trying to hard reset it.

I have just added the nmi_watchdog and acpi=off options to see if they do anything.

Version-Release number of selected component (if applicable):
kernel-2.6.26.5-45.fc9.i686

How reproducible:
Hard to reproduce -- happens only when you don't want it to. At the moment I have a nearly 500GB md raid1 partition with an unsynced disk, which usually only gets a few percent through recovering before my system locks up (so it starts again on the next reboot).

It does seem related to multiple concurrent disk access - when I try now to run two `hdparm -t` commands at once the problem can be triggered quite quickly.

Additional info:
# lspci -v
...
00:0a.0 RAID bus controller: Promise Technology, Inc. PDC20376 (FastTrak 376) (rev 02)
        Subsystem: Promise Technology, Inc. PDC20376 (FastTrak 376)
        Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 11
        I/O ports at c400 [size=64]
        I/O ports at c800 [size=16]
        I/O ports at cc00 [size=128]
        Memory at e3020000 (32-bit, non-prefetchable) [size=4K]
        Memory at e3000000 (32-bit, non-prefetchable) [size=128K]
        [virtual] Expansion ROM at 50010000 [disabled] [size=64K]
        Capabilities: [60] Power Management version 2
        Kernel driver in use: sata_promise
        Kernel modules: sata_promise
...

# lsmod | grep ata
ata_generic             8452  0 
pata_acpi               7680  0 
pata_via               11140  0 
sata_promise           13700  4 
libata                132456  4 ata_generic,pata_acpi,pata_via,sata_promise
scsi_mod              122876  4 sr_mod,sg,libata,sd_mod

# dmesg
...
libata version 3.00 loaded.
sata_promise 0000:00:0a.0: version 2.12
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
scsi0 : sata_promise
scsi1 : sata_promise
scsi2 : sata_promise
ata1: SATA max UDMA/133 mmio m4096@0xe3020000 ata 0xe3020200 irq 11
ata2: SATA max UDMA/133 mmio m4096@0xe3020000 ata 0xe3020280 irq 11
ata3: PATA max UDMA/133 mmio m4096@0xe3020000 ata 0xe3020300 irq 11
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-8: WDC WD5000AACS-00G8B0, 05.04C05, max UDMA/133
ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-8: WDC WD5000AACS-00G8B0, 05.04C05, max UDMA/133
ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      WDC WD5000AACS-0 05.0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: Direct-Access     ATA      WDC WD5000AACS-0 05.0 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
...

# grep 11 /proc/interrupts
 11:      22987    XT-PIC-XT        ehci_hcd:usb1, uhci_hcd:usb4, sata_promise, VIA8233

Comment 1 Chuck Ebbert 2008-11-05 12:46:22 UTC
Should be fixed in 2.6.27.4-24 and later:

https://admin.fedoraproject.org/updates/kernel-2.6.27.4-24.fc9

Comment 2 Kieran Clancy 2008-11-05 14:02:09 UTC
Hi,

Unfortunately I no longer have the machine in question to test the new kernel (could only put up with a flaky server for so long). It is likely that I will use the card in a different machine at some time though, so it's good to know that it should be fixed.

Out of curiosity, do you know which kernel patch it was fixed in?

Thanks.

Comment 3 Fedora Update System 2008-11-06 01:51:20 UTC
kernel-2.6.27.4-26.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.4-26.fc9

Comment 4 Fedora Update System 2008-11-07 02:56:31 UTC
kernel-2.6.27.4-26.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9467

Comment 5 Fedora Update System 2008-11-10 13:15:58 UTC
kernel-2.6.27.5-32.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-32.fc9

Comment 6 Fedora Update System 2008-11-12 02:57:59 UTC
kernel-2.6.27.5-32.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9583

Comment 7 Fedora Update System 2008-11-13 07:43:07 UTC
kernel-2.6.27.5-37.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-37.fc9

Comment 8 Fedora Update System 2008-11-14 11:54:24 UTC
kernel-2.6.27.5-41.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-41.fc9

Comment 9 Kieran Clancy 2008-11-14 12:13:17 UTC
Marking as CANTFIX, because I no longer have the hardware combination to test the updated kernels.

Comment 10 Fedora Update System 2008-11-19 14:54:49 UTC
kernel-2.6.27.5-41.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.