| Summary: | ATA errors and loss of DMA after Suspend / Resume on a Macbook | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Steven Ellis <steven.ellis> | ||||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 14 | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-08-16 13:44:22 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
Output of /proc/interrupts
CPU0 CPU1
0: 55314 60166 IO-APIC-edge timer
8: 1 0 IO-APIC-edge rtc0
9: 7062 1161 IO-APIC-fasteoi acpi
16: 151300 11718 IO-APIC-fasteoi uhci_hcd:usb4, uhci_hcd:usb5, eth1
18: 19990 8744 IO-APIC-fasteoi ata_piix, uhci_hcd:usb6
19: 1 0 IO-APIC-fasteoi firewire_ohci
20: 241 243 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb3
21: 28931 28675 IO-APIC-fasteoi ata_piix, ehci_hcd:usb1, uhci_hcd:usb7
40: 0 0 PCI-MSI-edge pciehp
41: 0 0 PCI-MSI-edge pciehp
42: 0 0 PCI-MSI-edge pciehp
43: 1801 838 PCI-MSI-edge i915
44: 1 0 PCI-MSI-edge sky2@pci:0000:03:00.0
45: 1631 117 PCI-MSI-edge hda_intel
NMI: 0 0 Non-maskable interrupts
LOC: 124710 119642 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
PND: 0 0 Performance pending work
RES: 6926 8436 Rescheduling interrupts
CAL: 2087 1766 Function call interrupts
TLB: 1006 1123 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 4 4 Machine check polls
ERR: 1
MIS: 0
Created attachment 475315 [details]
Dmidecode output
hdparm output
dparm -i /dev/sda
/dev/sda:
Model=WDC WD5000BEKT-00KA9T0, FwRev=01.01A01, SerialNo=WD-WXM1E60CC325
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7
* signifies the current active mode
Created attachment 475316 [details]
Output of sdparm -a /dev/sda before the error occurs.
Created attachment 475317 [details]
Output of smartctl -a /dev/sda before the issue occurs
APM status of the disk before we suspend [root@macdora steve]# hdparm -B /dev/sda /dev/sda: APM_level = 128 Tried booting kernel with various combinations of irqpoll and noacpi neither of which resolved the issue. Had the same issue with a Seagate ST9500420ASG drive. Using https://bugzilla.redhat.com/show_bug.cgi?id=549981 to try and trouble shoot this. Looks like a different problem. Checking for NCQ which isn't enabled cat /sys/block/sd[abc]/device/queue_depth 1 Output from lspci 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 03) 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03) 00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 03) 00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3) 00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 03) 00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03) 00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03) 02:00.0 Network controller: Broadcom Corporation BCM4321 802.11a/b/g/n (rev 03) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8058 PCI-E Gigabit Ethernet Controller (rev 13) 04:03.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 61) (In reply to comment #7) > Tried booting kernel with various combinations of irqpoll and noacpi neither of > which resolved the issue. Based on https://bugzilla.redhat.com/show_bug.cgi?id=462425#c80 i actuall tried noapic. I didn't change acpi. Got a fresh trace after two suspend/resume events and plugging the laptop into mains
Jan 26 13:35:39 macdora kernel: [ 3754.946362] irq 18: nobody cared (try booting with the "irqpoll" option)
Jan 26 13:35:39 macdora kernel: [ 3754.946367] Pid: 0, comm: swapper Tainted: P 2.6.35.10-74.fc14.x86_64 #1
Jan 26 13:35:39 macdora kernel: [ 3754.946369] Call Trace:
Jan 26 13:35:39 macdora kernel: [ 3754.946371] <IRQ> [<ffffffff810a6fdb>] __report_bad_irq.clone.1+0x3d/0x8b
Jan 26 13:35:39 macdora kernel: [ 3754.946381] [<ffffffff810a7143>] note_interrupt+0x11a/0x17f
Jan 26 13:35:39 macdora kernel: [ 3754.946384] [<ffffffff810a7c23>] handle_fasteoi_irq+0xa8/0xce
Jan 26 13:35:39 macdora kernel: [ 3754.946388] [<ffffffff8100c2ea>] handle_irq+0x88/0x90
Jan 26 13:35:39 macdora kernel: [ 3754.946392] [<ffffffff8146fb44>] do_IRQ+0x5c/0xb4
Jan 26 13:35:39 macdora kernel: [ 3754.946396] [<ffffffff8146a093>] ret_from_intr+0x0/0x11
Jan 26 13:35:39 macdora kernel: [ 3754.946397] <EOI> [<ffffffff8128f900>] ? raw_local_irq_enable+0x10/0x12
Jan 26 13:35:39 macdora kernel: [ 3754.946404] [<ffffffff81290526>] acpi_idle_enter_c1+0x98/0xb6
Jan 26 13:35:39 macdora kernel: [ 3754.946408] [<ffffffff81394201>] cpuidle_idle_call+0x8b/0xe9
Jan 26 13:35:39 macdora kernel: [ 3754.946412] [<ffffffff81008325>] cpu_idle+0xaa/0xcc
Jan 26 13:35:39 macdora kernel: [ 3754.946416] [<ffffffff81451906>] rest_init+0x8a/0x8c
Jan 26 13:35:39 macdora kernel: [ 3754.946420] [<ffffffff81ba1c49>] start_kernel+0x40b/0x416
Jan 26 13:35:39 macdora kernel: [ 3754.946423] [<ffffffff81ba12c6>] x86_64_start_reservations+0xb1/0xb5
Jan 26 13:35:39 macdora kernel: [ 3754.946426] [<ffffffff81ba13c2>] x86_64_start_kernel+0xf8/0x107
Jan 26 13:35:39 macdora kernel: [ 3754.946428] handlers:
Jan 26 13:35:39 macdora kernel: [ 3754.946430] [<ffffffff81314106>] (ata_bmdma_interrupt+0x0/0x1a)
Jan 26 13:35:39 macdora kernel: [ 3754.946434] [<ffffffff813335a4>] (usb_hcd_irq+0x0/0x7c)
Jan 26 13:35:39 macdora kernel: [ 3754.946438] Disabling IRQ #18
Jan 26 13:36:08 macdora kernel: [ 3783.776065] ata3: lost interrupt (Status 0x51)
Jan 26 13:36:08 macdora kernel: [ 3783.776091] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 26 13:36:08 macdora kernel: [ 3783.776095] ata3.00: BMDMA stat 0x6, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0,
Jan 26 13:36:08 macdora kernel: [ 3783.776102] ata3.00: failed command: READ DMA EXT
Jan 26 13:36:08 macdora kernel: [ 3783.776112] ata3.00: cmd 25/00:00:b2:a8:9f/00:01:2e:00:00/e0 tag 0 dma 131072 in
Jan 26 13:36:08 macdora kernel: [ 3783.776119] res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x24 (host bus error)
Jan 26 13:36:08 macdora kernel: [ 3783.776122] ata3.00: status: { DRDY }
Jan 26 13:36:08 macdora kernel: [ 3783.776133] ata3: soft resetting link
Jan 26 13:36:08 macdora kernel: [ 3784.046178] ata3.00: configured for UDMA/133
Jan 26 13:36:08 macdora kernel: [ 3784.046193] ata3: EH complete
Jan 26 13:37:10 macdora kernel: [ 3846.708082] ata3: lost interrupt (Status 0x51)
Jan 26 13:37:10 macdora kernel: [ 3846.708110] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 26 13:37:10 macdora kernel: [ 3846.708115] ata3.00: BMDMA stat 0x6, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0,
Jan 26 13:37:10 macdora kernel: [ 3846.708122] ata3.00: failed command: READ DMA EXT
Jan 26 13:37:10 macdora kernel: [ 3846.708132] ata3.00: cmd 25/00:00:32:3a:a6/00:02:2e:00:00/e0 tag 0 dma 262144 in
Jan 26 13:37:10 macdora kernel: [ 3846.708134] res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x24 (host bus error)
Jan 26 13:37:10 macdora kernel: [ 3846.708139] ata3.00: status: { DRDY }
Jan 26 13:37:10 macdora kernel: [ 3846.708157] ata3: soft resetting link
Jan 26 13:37:11 macdora kernel: [ 3846.963154] ata3.00: configured for UDMA/133
Jan 26 13:37:11 macdora kernel: [ 3846.963172] ata3: EH complete
Similar issue under Ubuntu * https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/664400 This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping |
Description of problem: Generation 3 white macbook running Fedora 14 64bit. CPU Intel(R) Core(TM)2 Duo CPU T8300 @ 2.40GHz Version-Release number of selected component (if applicable): Kernel - 2.6.35.10-74.fc14.x86_64 OS = Fedora 14 64bit How reproducible: Consistently Steps to Reproduce: 1. Boot Fedora 14 on Macbook and login 2. Suspend via menu or by closing lid 3. Resume system and perform normal operations 4. Repeat steps 2 & 3 until the following appears in the system logs. 5. Once error occurs I/O performance is seriously degraded as we have no DMA. Actual results: [ 5172.307016] irq 18: nobody cared (try booting with the "irqpoll" option) [ 5172.307022] Pid: 0, comm: swapper Tainted: P 2.6.35.10-74.fc14.x86_64 #1 [ 5172.307024] Call Trace: [ 5172.307026] <IRQ> [<ffffffff810a6fdb>] __report_bad_irq.clone.1+0x3d/0x8b [ 5172.307035] [<ffffffff810a7143>] note_interrupt+0x11a/0x17f [ 5172.307039] [<ffffffff810a7c23>] handle_fasteoi_irq+0xa8/0xce [ 5172.307043] [<ffffffff8100c2ea>] handle_irq+0x88/0x90 [ 5172.307046] [<ffffffff8146fb44>] do_IRQ+0x5c/0xb4 [ 5172.307050] [<ffffffff8146a093>] ret_from_intr+0x0/0x11 [ 5172.307051] <EOI> [<ffffffff8128f900>] ? raw_local_irq_enable+0x10/0x12 [ 5172.307058] [<ffffffff81290526>] acpi_idle_enter_c1+0x98/0xb6 [ 5172.307062] [<ffffffff81394201>] cpuidle_idle_call+0x8b/0xe9 [ 5172.307066] [<ffffffff81008325>] cpu_idle+0xaa/0xcc [ 5172.307069] [<ffffffff81451906>] rest_init+0x8a/0x8c [ 5172.307074] [<ffffffff81ba1c49>] start_kernel+0x40b/0x416 [ 5172.307077] [<ffffffff81ba12c6>] x86_64_start_reservations+0xb1/0xb5 [ 5172.307080] [<ffffffff81ba13c2>] x86_64_start_kernel+0xf8/0x107 [ 5172.307082] handlers: [ 5172.307083] [<ffffffff81314106>] (ata_bmdma_interrupt+0x0/0x1a) [ 5172.307088] [<ffffffff813335a4>] (usb_hcd_irq+0x0/0x7c) [ 5172.307092] Disabling IRQ #18 [ 5200.736090] ata3: lost interrupt (Status 0x51) [ 5200.736123] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 5200.736131] ata3.00: BMDMA stat 0x6, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, [ 5200.736140] ata3.00: failed command: READ DMA EXT [ 5200.736155] ata3.00: cmd 25/00:00:7a:9d:29/00:01:2d:00:00/e0 tag 0 dma 131072 in [ 5200.736158] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x24 (host bus error) [ 5200.736166] ata3.00: status: { DRDY } [ 5200.736189] ata3: soft resetting link [ 5201.008176] ata3.00: configured for UDMA/133 [ 5201.008190] ata3.00: device reported invalid CHS sector 0 [ 5201.008217] ata3: EH complete [ 5259.744199] ata3: lost interrupt (Status 0x51) [ 5259.744235] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 5259.744244] ata3.00: BMDMA stat 0x6, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, [ 5259.744282] ata3.00: failed command: READ DMA EXT [ 5259.744298] ata3.00: cmd 25/00:00:ba:15:62/00:02:2d:00:00/e0 tag 0 dma 262144 in [ 5259.744301] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x24 (host bus error) [ 5259.744310] ata3.00: status: { DRDY } [ 5259.744335] ata3: soft resetting link [ 5260.008298] ata3.00: configured for UDMA/133 [ 5260.008311] ata3.00: device reported invalid CHS sector 0 [ 5260.008337] ata3: EH complete Expected results: Suspend/Resume should not cause DMA errors. Additional info: Once issue has occurred a full power cycle won't fix the issue unless the Macbook is booted into OS-X before re-running fedora. Whilst we get DMA back on the reboot after a short period the above error messages will re-appear and we will loose DMA. After the error has occurred the DMA issue persists across suspend/resume and we can't get DMA back without a power cycle