This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 465771 - sata_promise locks up machine after timeout
sata_promise locks up machine after timeout
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
All Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-06 07:23 EDT by Kieran Clancy
Modified: 2008-11-19 09:55 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-11-14 07:13:17 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kieran Clancy 2008-10-06 07:23:35 EDT
Description of problem:

I have a Promise PCI SATA card, with two SATA disks (see lspci output below). Sometimes during heavy I/O (or just at seemingly random times), the machine will hang/lock up. I have had this problem since installing the PCI card and disks, both on FC7 kernels and FC9 kernels (fresh install), but it is brand new hardware (and from googling it seems other people have had this issue, so I suspect it's not a hardware problem). When it hangs it often stops responding to sysrq etc, requiring a manual reset. I have just hooked up a serial cable to get any errors from the machine.

Here is the error message from the last time it happened:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x6 frozen
ata2: SError: { 10B8B Dispar }
ata2.00: cmd 35/00:00:d9:05:6e/00:04:01:00:00/e0 tag 0 dma 524288 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata2.00: status: { DRDY }
ata2: hard resetting link

Have also seen it with:
ata2.00: cmd 35/00:00:d9:cf:c5/00:04:07:00:00/e0 tag 0 dma 524288 out
ata2.00: cmd 35/00:80:59:1f:ad/00:03:00:00:00/e0 tag 0 dma 458752 out

The above error seems like it should be recoverable (read timeout; try again, or at least fail and let the other disk in my RAID take over), but it never comes back from trying to hard reset it.

I have just added the nmi_watchdog and acpi=off options to see if they do anything.

Version-Release number of selected component (if applicable):
kernel-2.6.26.5-45.fc9.i686

How reproducible:
Hard to reproduce -- happens only when you don't want it to. At the moment I have a nearly 500GB md raid1 partition with an unsynced disk, which usually only gets a few percent through recovering before my system locks up (so it starts again on the next reboot).

It does seem related to multiple concurrent disk access - when I try now to run two `hdparm -t` commands at once the problem can be triggered quite quickly.

Additional info:
# lspci -v
...
00:0a.0 RAID bus controller: Promise Technology, Inc. PDC20376 (FastTrak 376) (rev 02)
        Subsystem: Promise Technology, Inc. PDC20376 (FastTrak 376)
        Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 11
        I/O ports at c400 [size=64]
        I/O ports at c800 [size=16]
        I/O ports at cc00 [size=128]
        Memory at e3020000 (32-bit, non-prefetchable) [size=4K]
        Memory at e3000000 (32-bit, non-prefetchable) [size=128K]
        [virtual] Expansion ROM at 50010000 [disabled] [size=64K]
        Capabilities: [60] Power Management version 2
        Kernel driver in use: sata_promise
        Kernel modules: sata_promise
...

# lsmod | grep ata
ata_generic             8452  0 
pata_acpi               7680  0 
pata_via               11140  0 
sata_promise           13700  4 
libata                132456  4 ata_generic,pata_acpi,pata_via,sata_promise
scsi_mod              122876  4 sr_mod,sg,libata,sd_mod

# dmesg
...
libata version 3.00 loaded.
sata_promise 0000:00:0a.0: version 2.12
ACPI: PCI Interrupt 0000:00:0a.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
scsi0 : sata_promise
scsi1 : sata_promise
scsi2 : sata_promise
ata1: SATA max UDMA/133 mmio m4096@0xe3020000 ata 0xe3020200 irq 11
ata2: SATA max UDMA/133 mmio m4096@0xe3020000 ata 0xe3020280 irq 11
ata3: PATA max UDMA/133 mmio m4096@0xe3020000 ata 0xe3020300 irq 11
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-8: WDC WD5000AACS-00G8B0, 05.04C05, max UDMA/133
ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-8: WDC WD5000AACS-00G8B0, 05.04C05, max UDMA/133
ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 0/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      WDC WD5000AACS-0 05.0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 1:0:0:0: Direct-Access     ATA      WDC WD5000AACS-0 05.0 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
...

# grep 11 /proc/interrupts
 11:      22987    XT-PIC-XT        ehci_hcd:usb1, uhci_hcd:usb4, sata_promise, VIA8233
Comment 1 Chuck Ebbert 2008-11-05 07:46:22 EST
Should be fixed in 2.6.27.4-24 and later:

https://admin.fedoraproject.org/updates/kernel-2.6.27.4-24.fc9
Comment 2 Kieran Clancy 2008-11-05 09:02:09 EST
Hi,

Unfortunately I no longer have the machine in question to test the new kernel (could only put up with a flaky server for so long). It is likely that I will use the card in a different machine at some time though, so it's good to know that it should be fixed.

Out of curiosity, do you know which kernel patch it was fixed in?

Thanks.
Comment 3 Fedora Update System 2008-11-05 20:51:20 EST
kernel-2.6.27.4-26.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.4-26.fc9
Comment 4 Fedora Update System 2008-11-06 21:56:31 EST
kernel-2.6.27.4-26.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9467
Comment 5 Fedora Update System 2008-11-10 08:15:58 EST
kernel-2.6.27.5-32.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-32.fc9
Comment 6 Fedora Update System 2008-11-11 21:57:59 EST
kernel-2.6.27.5-32.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9583
Comment 7 Fedora Update System 2008-11-13 02:43:07 EST
kernel-2.6.27.5-37.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-37.fc9
Comment 8 Fedora Update System 2008-11-14 06:54:24 EST
kernel-2.6.27.5-41.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-41.fc9
Comment 9 Kieran Clancy 2008-11-14 07:13:17 EST
Marking as CANTFIX, because I no longer have the hardware combination to test the updated kernels.
Comment 10 Fedora Update System 2008-11-19 09:54:49 EST
kernel-2.6.27.5-41.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.