Bug 667964

Summary: failed command: WRITE DMA
Product: [Fedora] Fedora Reporter: Edouard Bourguignon <madko>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-04 23:23:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Edouard Bourguignon 2011-01-07 12:57:25 UTC
Description of problem:
During boot process, I have this errors:
[  904.277012] ata7.00: failed command: WRITE DMA
[  904.277128] ata7.00: cmd ca/00:80:5d:43:47/00:00:00:00:00/e2 tag 0 dma 65536 out
[  904.277133]          res 51/84:00:dc:43:47/00:00:00:00:00/e2 Emask 0x10 (ATA bus error)
[  904.277511] ata7.00: status: { DRDY ERR }
[  904.277618] ata7.00: error: { ICRC ABRT }
[  904.277755] ata7: soft resetting link
[  904.439220] ata7.00: configured for UDMA/100
[  904.439247] ata7: EH complete
[ 3180.822861] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 3180.822985] ata7.00: BMDMA stat 0x64
[ 3180.823079] ata7.00: failed command: WRITE DMA EXT
[ 3180.823200] ata7.00: cmd 35/00:00:8d:84:47/00:04:02:00:00/e0 tag 0 dma 524288 out
[ 3180.823205]          res 51/84:00:8c:88:47/84:00:02:00:00/e0 Emask 0x10 (ATA bus error)
[ 3180.823477] ata7.00: status: { DRDY ERR }
[ 3180.823549] ata7.00: error: { ICRC ABRT }
[ 3180.823645] ata7: soft resetting link
[ 3180.983841] ata7.00: configured for UDMA/100
[ 3180.983861] ata7: EH complete
[ 3180.991741] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 3180.991861] ata7.00: BMDMA stat 0x64
[ 3180.991928] ata7.00: failed command: WRITE DMA
[ 3180.992033] ata7.00: cmd ca/00:c8:8d:88:47/00:00:00:00:00/e2 tag 0 dma 102400 out
[ 3180.992037]          res 51/84:00:54:89:47/84:00:02:00:00/e2 Emask 0x10 (ATA bus error)
[ 3180.992296] ata7.00: status: { DRDY ERR }
[ 3180.992368] ata7.00: error: { ICRC ABRT }
[ 3180.992461] ata7: soft resetting link
[ 3181.152850] ata7.00: configured for UDMA/100
[ 3181.152869] ata7: EH complete
[ 3186.177816] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 3186.177939] ata7.00: BMDMA stat 0x64
[ 3186.178009] ata7.00: failed command: WRITE DMA EXT
[ 3186.178111] ata7.00: cmd 35/00:00:5d:89:47/00:04:02:00:00/e0 tag 0 dma 524288 out
[ 3186.178115]          res 51/84:00:5c:8d:47/84:00:02:00:00/e0 Emask 0x10 (ATA bus error)
[ 3186.178374] ata7.00: status: { DRDY ERR }
[ 3186.178467] ata7.00: error: { ICRC ABRT }
[ 3186.178590] ata7: soft resetting link
[ 3186.339885] ata7.00: configured for UDMA/100
[ 3186.339910] ata7: EH complete
[ 3186.345678] ata7.00: limiting speed to UDMA/66:PIO4
[ 3186.345688] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 3186.345806] ata7.00: BMDMA stat 0x64
[ 3186.345874] ata7.00: failed command: WRITE DMA EXT
[ 3186.345969] ata7.00: cmd 35/00:00:5d:89:47/00:04:02:00:00/e0 tag 0 dma 524288 out
[ 3186.345972]          res 51/84:00:5c:8d:47/84:00:02:00:00/e0 Emask 0x10 (ATA bus error)
[ 3186.346255] ata7.00: status: { DRDY ERR }
[ 3186.346350] ata7.00: error: { ICRC ABRT }
[ 3186.346470] ata7: soft resetting link
[ 3186.508056] ata7.00: configured for UDMA/66
[ 3186.508081] ata7: EH complete

It then works for a while, but times to times the machine completely hangs.

The motherboard is a Gigabyte D510UD, with 4 HDD on sata and 1 HDD on the IDE. Only the IDE HDD is impacted by the problem.

Version-Release number of selected component (if applicable):
2.6.35.10-74.fc14.x86_64

How reproducible:
Each boot, seems very specific to this hardware

Steps to Reproduce:
1. Plug one HDD on IDE
2. Install F14
3. Boot
  
Actual results:
Some errors during boot (failed command: WRITE DMA EXT, etc). Seems the disk is forced to UDMA/66.

Expected results:
No ata/DMA errors.

Additional info:
00:00.0 Host bridge: Intel Corporation N10 Family DMI Bridge (rev 02)
00:02.0 VGA compatible controller: Intel Corporation N10 Family Integrated Graphics Controller (rev 02)
00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation NM10 Family LPC Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 02)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
02:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
02:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
03:00.0 Network controller: RaLink RT2561/RT61 802.11g PCI

SDE have just been changed by a brand new disk.

# hdparm /dev/sde

/dev/sde:
 multcount     =  0 (off)
 IO_support    =  0 (default) 
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 38913/255/63, sectors = 625142448, start = 0

On Redhat 5.x, I had bug #439391 until I upgrade the BIOS to the latest available version. But keep having errors on ATA so I try Fedora.

Comment 1 Chuck Ebbert 2011-01-10 21:08:09 UTC
ICRC is an interface CRC error, which could be caused by bad cabling or not enough power available for the drive. Can your power supply really support 5 drives?

Comment 2 Edouard Bourguignon 2011-01-11 10:08:19 UTC
Good question, How can I check that for sure? Any sensors?

Comment 3 Edouard Bourguignon 2011-01-15 17:14:49 UTC
I change the power supply by a new 500W 80+, same problem. If I unplug all the sata drives same problem too.

Comment 4 Chuck Ebbert 2011-01-19 13:14:08 UTC
You probably should have tried a new cable first.

Comment 5 Edouard Bourguignon 2011-01-22 12:32:37 UTC
That was the cable, thanks!

Comment 6 Edouard Bourguignon 2011-02-02 21:48:50 UTC
[   45.101794] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   45.101922] ata7.00: BMDMA stat 0x64
[   45.101994] ata7.00: failed command: WRITE DMA EXT
[   45.102092] ata7.00: cmd 35/00:c0:ad:3d:47/00:03:02:00:00/e0 tag 0 dma 491520                                                                                                                                                              out
[   45.102096]          res 51/84:00:6c:41:47/84:00:02:00:00/e0 Emask 0x10 (ATA                                                                                                                                                              bus error)
[   45.102370] ata7.00: status: { DRDY ERR }
[   45.102471] ata7.00: error: { ICRC ABRT }
[   45.102600] ata7: soft resetting link
[   45.263210] ata7.00: configured for UDMA/100
[   45.263241] ata7: EH complete
[   45.268702] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   45.268829] ata7.00: BMDMA stat 0x64
[   45.268901] ata7.00: failed command: WRITE DMA EXT
[   45.268999] ata7.00: cmd 35/00:c0:ad:3d:47/00:03:02:00:00/e0 tag 0 dma 491520                                                                                                                                                              out
[   45.269003]          res 51/84:00:6c:41:47/84:00:02:00:00/e0 Emask 0x10 (ATA                                                                                                                                                              bus error)
[   45.269293] ata7.00: status: { DRDY ERR }
[   45.269393] ata7.00: error: { ICRC ABRT }
[   45.269520] ata7: soft resetting link
[   45.430415] ata7.00: configured for UDMA/100
[   45.430446] ata7: EH complete
[   45.435890] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   45.436039] ata7.00: BMDMA stat 0x64
[   45.436134] ata7.00: failed command: WRITE DMA EXT
[   45.436261] ata7.00: cmd 35/00:c0:ad:3d:47/00:03:02:00:00/e0 tag 0 dma 491520                                                                                                                                                              out
[   45.436266]          res 51/84:00:6c:41:47/84:00:02:00:00/e0 Emask 0x10 (ATA                                                                                                                                                              bus error)
[   45.436557] ata7.00: status: { DRDY ERR }
[   45.436633] ata7.00: error: { ICRC ABRT }
[   45.436734] ata7: soft resetting link
[   45.598594] ata7.00: configured for UDMA/100
[   45.598636] ata7: EH complete
[   45.604113] ata7.00: limiting speed to UDMA/66:PIO4
[   45.604124] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   45.604250] ata7.00: BMDMA stat 0x64
[   45.604322] ata7.00: failed command: WRITE DMA EXT
[   45.604453] ata7.00: cmd 35/00:c0:ad:3d:47/00:03:02:00:00/e0 tag 0 dma 491520                                                                                                                                                              out
[   45.604458]          res 51/84:00:6c:41:47/84:00:02:00:00/e0 Emask 0x10 (ATA                                                                                                                                                              bus error)
[   45.604803] ata7.00: status: { DRDY ERR }
[   45.604904] ata7.00: error: { ICRC ABRT }
[   45.605011] ata7: soft resetting link
[   45.765856] ata7.00: configured for UDMA/66
[   45.765886] ata7: EH complete
[   55.863695] readahead-collector: starting delayed service auditd
[   56.227916] readahead-collector: sorting
[   56.350634] readahead-collector: finished


Oops problem's still here

Comment 7 Edouard Bourguignon 2011-02-07 21:32:16 UTC
After a while, it get worse:

[   35.923208] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   35.923368] ata7.00: BMDMA stat 0x64
[   35.923461] ata7.00: failed command: WRITE DMA EXT
[   35.923591] ata7.00: cmd 35/00:b8:4d:36:47/00:01:02:00:00/e0 tag 0 dma 225280 out
[   35.923596]          res 51/84:00:04:38:47/84:00:02:00:00/e0 Emask 0x10 (ATA bus error)
[   35.923940] ata7.00: status: { DRDY ERR }
[   35.924080] ata7.00: error: { ICRC ABRT }
[   35.924218] ata7: soft resetting link
[   36.086487] ata7.00: configured for UDMA/100
[   36.086513] ata7: EH complete
[   55.345630] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   55.345759] ata7.00: BMDMA stat 0x64
[   55.345831] ata7.00: failed command: WRITE DMA
[   55.345923] ata7.00: cmd ca/00:10:85:31:43/00:00:00:00:00/e4 tag 0 dma 8192 out
[   55.345927]          res 51/84:00:94:31:43/84:00:02:00:00/e4 Emask 0x10 (ATA bus error)
[   55.346214] ata7.00: status: { DRDY ERR }
[   55.346295] ata7.00: error: { ICRC ABRT }
[   55.346399] ata7: soft resetting link
[   55.558491] ata7.00: configured for UDMA/100
[   55.558522] ata7: EH complete
[   55.558854] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   55.558981] ata7.00: BMDMA stat 0x64
[   55.559068] ata7.00: failed command: WRITE DMA
[   55.559186] ata7.00: cmd ca/00:10:85:31:43/00:00:00:00:00/e4 tag 0 dma 8192 out
[   55.559192]          res 51/84:00:94:31:43/84:00:02:00:00/e4 Emask 0x10 (ATA bus error)
[   55.559553] ata7.00: status: { DRDY ERR }
[   55.559657] ata7.00: error: { ICRC ABRT }
[   55.559788] ata7: soft resetting link
[   55.720869] ata7.00: configured for UDMA/100
[   55.720899] ata7: EH complete
[   57.190714] readahead-collector: starting delayed service auditd
[   57.544900] readahead-collector: sorting
[   57.675320] readahead-collector: finished
[ 4920.016352] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 4920.016509] ata7.00: BMDMA stat 0x64
[ 4920.016597] ata7.00: failed command: WRITE DMA EXT
[ 4920.016721] ata7.00: cmd 35/00:d8:a5:42:97/00:01:04:00:00/e0 tag 0 dma 241664 out
[ 4920.016726]          res 51/84:00:7c:44:97/84:00:04:00:00/e0 Emask 0x10 (ATA bus error)
[ 4920.017084] ata7.00: status: { DRDY ERR }
[ 4920.017178] ata7.00: error: { ICRC ABRT }
[ 4920.017300] ata7: soft resetting link
[ 4920.178854] ata7.00: configured for UDMA/100
[ 4920.178885] ata7: EH complete
[11121.182764] ata6.00: exception Emask 0x10 SAct 0x1 SErr 0x400100 action 0x6
[11121.182901] ata6.00: irq_stat 0x08000000
[11121.182982] ata6: SError: { UnrecovData Handshk }
[11121.183094] ata6.00: failed command: READ FPDMA QUEUED
[11121.183224] ata6.00: cmd 60/88:00:17:24:d5/01:00:89:00:00/40 tag 0 ncq 200704 in
[11121.183229]          res 40/00:00:17:24:d5/00:00:89:00:00/40 Emask 0x10 (ATA bus error)
[11121.183533] ata6.00: status: { DRDY }
[11121.183625] ata6: hard resetting link
[11121.712457] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[11121.712578] ata7.00: BMDMA stat 0x64
[11121.712647] ata7.00: failed command: WRITE DMA
[11121.712736] ata7.00: cmd ca/00:08:9d:ca:4f/00:00:00:00:00/e0 tag 0 dma 4096 out
[11121.712740]          res 51/04:00:08:00:00/84:00:04:00:00/00 Emask 0x1 (device error)
[11121.712992] ata7.00: status: { DRDY ERR }
[11121.713064] ata7.00: error: { ABRT }
[11122.190032] ata6: failed to resume link (SControl 0)
[11126.736197] ata7.00: qc timeout (cmd 0xec)
[11126.736208] ata7.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[11126.736215] ata7.00: revalidation failed (errno=-5)
[11126.787036] ata7: soft resetting link
[11127.739031] ata6: failed to reset engine (errno=-5)
[11131.683033] ata6: softreset failed (1st FIS failed)
[11131.683129] ata6: hard resetting link
[11131.987036] ata7: link is slow to respond, please be patient (ready=0)
[11132.690038] ata6: failed to resume link (SControl D3A)
[11136.832030] ata7: SRST failed (errno=-16)
[11136.832118] ata7: soft resetting link
[11137.005982] ata7.01: NODEV after polling detection
[11137.012109] ata7.00: both IDENTIFYs aborted, assuming NODEV
[11137.012117] ata7.00: revalidation failed (errno=-2)
[11138.239032] ata6: failed to reset engine (errno=-5)
[11141.994043] ata7: soft resetting link
[11142.191023] ata6: softreset failed (1st FIS failed)
[11142.191123] ata6: hard resetting link
[11142.197123] ata7.00: both IDENTIFYs aborted, assuming NODEV
[11142.197131] ata7.00: revalidation failed (errno=-2)
[11142.197225] ata7.00: disabled
[11142.197259] ata7: EH complete
[11142.197302] sd 7:0:0:0: [sde] Unhandled error code
[11142.197308] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11142.197317] sd 7:0:0:0: [sde] CDB: Write(10): 2a 00 00 4f ca 9d 00 00 08 00
[11142.197340] end_request: I/O error, dev sde, sector 5229213
[11142.197444] Buffer I/O error on device sde2, logical block 627546
[11142.197549] lost page write due to I/O error on sde2
[11142.197616] sd 7:0:0:0: [sde] Unhandled error code
[11142.197623] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11142.197631] sd 7:0:0:0: [sde] CDB: Write(10): 2a 00 00 53 5a cd 00 00 08 00
[11142.197652] end_request: I/O error, dev sde, sector 5462733
[11142.197753] Buffer I/O error on device sde2, logical block 656736
[11142.197859] lost page write due to I/O error on sde2
[11142.197888] sd 7:0:0:0: [sde] Unhandled error code
[11142.197894] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11142.197901] sd 7:0:0:0: [sde] CDB: Write(10): 2a 00 00 55 38 cd 00 00 08 00
[11142.197922] end_request: I/O error, dev sde, sector 5585101
[11142.198045] Buffer I/O error on device sde2, logical block 672032
[11142.198182] lost page write due to I/O error on sde2
[11142.198242] sd 7:0:0:0: [sde] Unhandled error code
[11142.198250] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11142.198260] sd 7:0:0:0: [sde] CDB: Write(10): 2a 00 02 48 2e 35 00 00 20 00
[11142.198289] end_request: I/O error, dev sde, sector 38284853
[11142.198427] JBD2: Detected IO errors while flushing file data on sde2-8
[11142.198465] Aborting journal on device sde2-8.
[11142.198657] sd 7:0:0:0: [sde] Unhandled error code
[11142.198663] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11142.198673] sd 7:0:0:0: [sde] CDB: Write(10): 2a 00 02 47 2f cd 00 00 08 00
[11142.198696] end_request: I/O error, dev sde, sector 38219725
[11142.198804] Buffer I/O error on device sde2, logical block 4751360
[11142.198876] EXT4-fs error (device sde2): ext4_journal_start_sb: Detected aborted journal
[11142.198887] EXT4-fs (sde2): Remounting filesystem read-only
[11142.199057] EXT4-fs error (device sde2) in ext4_reserve_inode_write: Journal has aborted
[11142.199302] lost page write due to I/O error on sde2
[11142.199326] JBD2: I/O error detected when updating journal superblock for sde2-8.
[11143.198050] ata6: failed to resume link (SControl 71)
[11143.198189] ata6: SATA link down (SStatus 0 SControl 71)
[11143.198244] ata6: hard resetting link
[11144.214055] ata6: failed to resume link (SControl 0)
[11153.198079] ata6: softreset failed (1st FIS failed)
[11153.198194] ata6: hard resetting link
[11154.211063] ata6: failed to resume link (SControl 0)
[11163.199080] ata6: softreset failed (1st FIS failed)
[11163.199194] ata6: hard resetting link
[11164.215062] ata6: failed to resume link (SControl 0)
[11198.200081] ata6: softreset failed (1st FIS failed)
[11198.200207] ata6: limiting SATA link speed to 1.5 Gbps
[11198.200223] ata6: hard resetting link
[11199.207055] ata6: failed to resume link (SControl 41)
[11199.207168] ata6: SATA link down (SStatus 0 SControl 41)
[11204.207052] ata6: hard resetting link
[11205.214059] ata6: failed to resume link (SControl 341)
[11205.214177] ata6: SATA link down (SStatus 0 SControl 341)
[11205.214193] ata6.00: disabled
[11205.214232] sd 5:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[11205.214241] sd 5:0:0:0: [sdd] Sense Key : Aborted Command [current] [descriptor]
[11205.214252] Descriptor sense data with sense descriptors (in hex):
[11205.214257]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[11205.214282]         89 d5 24 17 
[11205.214292] sd 5:0:0:0: [sdd] Add. Sense: No additional sense information
[11205.214302] sd 5:0:0:0: [sdd] CDB: Read(10): 28 00 89 d5 24 17 00 01 88 00
[11205.214324] end_request: I/O error, dev sdd, sector 2312446999
[11205.214432] md/raid1:md1: sdd1: rescheduling sector 2312446936
[11205.214545] md/raid1:md1: sdd1: rescheduling sector 2312446976
[11205.214666] md/raid1:md1: sdd1: rescheduling sector 2312447008
[11205.214783] md/raid1:md1: sdd1: rescheduling sector 2312447040
[11205.220122] md/raid1:md1: sdd1: rescheduling sector 2312447072
[11205.225860] md/raid1:md1: sdd1: rescheduling sector 2312447104
[11205.231102] md/raid1:md1: sdd1: rescheduling sector 2312447136
[11205.236348] md/raid1:md1: sdd1: rescheduling sector 2312447168
[11205.241441] md/raid1:md1: sdd1: rescheduling sector 2312447200
[11205.246496] md/raid1:md1: sdd1: rescheduling sector 2312447232
[11205.251511] ata6: EH complete
[11205.251532] ata6.00: detaching (SCSI 5:0:0:0)
[11205.251539] sd 5:0:0:0: rejecting I/O to offline device
[11205.270380] md/raid1:md1: Disk failure on sdd1, disabling device.
[11205.270384] <1>md/raid1:md1: Operation continuing on 1 devices.
[11205.280429] sd 5:0:0:0: [sdd] Synchronizing SCSI cache
[11205.280613] sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11205.280624] sd 5:0:0:0: [sdd] Stopping disk
[11205.280668] sd 5:0:0:0: [sdd] START_STOP FAILED
[11205.280678] sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11205.329594] RAID1 conf printout:
[11205.329604]  --- wd:1 rd:2
[11205.329614]  disk 0, wo:1, o:0, dev:sdd1
[11205.329622]  disk 1, wo:0, o:1, dev:sda1
[11205.339037] RAID1 conf printout:
[11205.339046]  --- wd:1 rd:2
[11205.339054]  disk 1, wo:0, o:1, dev:sda1
[11260.230570] sd 7:0:0:0: [sde] Unhandled error code
[11260.230579] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11260.230588] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 02 cc cb 45 00 00 88 00
[11260.230611] end_request: I/O error, dev sde, sector 46975813
[11260.235743] sd 7:0:0:0: [sde] Unhandled error code
[11260.235753] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11260.235765] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 02 cc cb 4d 00 00 08 00
[11260.235794] end_request: I/O error, dev sde, sector 46975821
[11260.242640] sd 7:0:0:0: [sde] Unhandled error code
[11260.242649] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11260.242658] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 02 cc cb 4d 00 00 08 00
[11260.242680] end_request: I/O error, dev sde, sector 46975821
[11260.250344] audit_printk_skb: 14 callbacks suppressed
[11260.250354] type=1305 audit(1297113914.027:23385): audit_pid=0 old=1692 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[11260.253904] sd 7:0:0:0: [sde] Unhandled error code
[11260.253912] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11260.253922] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 02 cc cb 4d 00 00 08 00
[11260.253944] end_request: I/O error, dev sde, sector 46975821
[11260.259921] sd 7:0:0:0: [sde] Unhandled error code
[11260.259928] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11260.259937] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 02 cc cb 4d 00 00 08 00
[11260.259960] end_request: I/O error, dev sde, sector 46975821
[11260.264804] type=1701 audit(1297113914.041:23386): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:smbd_t:s0 pid=2668 comm="smbd" sig=7
[11485.367071] sd 7:0:0:0: [sde] Unhandled error code
[11485.367079] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11485.367089] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 04 8f 8b 85 00 00 20 00
[11485.367111] end_request: I/O error, dev sde, sector 76516229
[11485.371928] sd 7:0:0:0: [sde] Unhandled error code
[11485.371935] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11485.371944] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 04 8f 8b 85 00 00 08 00
[11485.371966] end_request: I/O error, dev sde, sector 76516229
[11485.376828] sd 7:0:0:0: [sde] Unhandled error code
[11485.376834] sd 7:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[11485.376842] sd 7:0:0:0: [sde] CDB: Read(10): 28 00 04 8f 8b 85 00 00 08 00
[11485.376864] end_request: I/O error, dev sde, sector 76516229


Any idea?

Comment 8 Edouard Bourguignon 2011-02-09 15:00:05 UTC
Motherboard is a Gigabyte GA-D510UD, sde (WDC WD3200AAJB-00J3A0, rev01.03E01) is the system disk on PATA. The trace above shows that I also lost contact with one of the SATA drive (sdd) in my RAID Array. It's now back online, just after a reboot.

I've been using this configuration for many years, this problem only occurs since I changed the motherboard. I tried with a RH5.4 and problem seems worse but the errors are the same in the log (DMA write failed etc).

I've upgraded the BIOS to version F4 from Gigabyte, seems the system was a bit more stable but still not usable. I can't read/write data on my disks more than a few hours... Then I/O errors occurs, and system hangs, have to magic sys keys to reboot it. 
Before BIOS upgrade I had to boot with "pci=nomsi,nommconf hda=noprobe hdc=noprobe" parameters to have normal performance. So may be the problem is in the BIOS or something wrong about the Gigabyte IDE controller, but how can I be sure? I will try to contact Gigabyte.

The sde disk is new, the power supply is new, motherboard is quite new too. Only SATA drives are a bit old. None of the disk fails during SMART tests. But I can try to switch back to the old motherboard, just to be sure the problem is not on the disks side.
I also tried with only the PATA disk drive connected, same problem.

What information do you need to troubelshoot this problem?

Comment 9 Edouard Bourguignon 2011-02-17 11:05:38 UTC
With boot parameter noapic and "options libata noacpi=1" in modprobe.conf I get a better uptime, errors are still there but the kernel has not yet hang. I have a huge/awesome uptime of 5 days! A new record on that machine.
I also upgraded the kernel to 2.6.35.11-83.fc14.x86_64, but don't see anything about ata in the changelog.

[260146.430951] ata5.00: configured for UDMA/100
[260146.435931] ata5: EH complete
[260153.636885] ata5.00: limiting speed to UDMA/66:PIO4
[260153.641998] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[260153.647220] ata5.00: BMDMA stat 0x24
[260153.652278] ata5.00: failed command: WRITE DMA EXT
[260153.657353] ata5.00: cmd 35/00:60:fd:74:47/00:01:02:00:00/e0 tag 0 dma 180224 out
[260153.657358]          res 51/84:00:5c:76:47/84:00:02:00:00/e0 Emask 0x10 (ATA bus error)
[260153.667928] ata5.00: status: { DRDY ERR }
[260153.673151] ata5.00: error: { ICRC ABRT }
[260153.678250] ata5: soft resetting link
[260153.844887] ata5.00: configured for UDMA/66
[260153.849910] ata5: EH complete

Any news? I just remember, but not sure, that I add the problem before I changed the motherboard. That's why I changed the PATA disk first, thinking at that moment that the disk was in fault. 
I think I will remove the PATA disk, and only have sata. I hope the controller/kernel will work better. May be the problem comes from the SATA/PATA mix on the same controller?

-[0000:00]-+-00.0  Intel Corporation N10 Family DMI Bridge
           +-02.0  Intel Corporation N10 Family Integrated Graphics Controller
           +-1c.0-[01]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller
           +-1c.1-[02]--+-00.0  JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller
           |            \-00.1  JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller
           +-1d.0  Intel Corporation N10/ICH 7 Family USB UHCI Controller #1
           +-1d.1  Intel Corporation N10/ICH 7 Family USB UHCI Controller #2
           +-1d.2  Intel Corporation N10/ICH 7 Family USB UHCI Controller #3
           +-1d.3  Intel Corporation N10/ICH 7 Family USB UHCI Controller #4
           +-1d.7  Intel Corporation N10/ICH 7 Family USB2 EHCI Controller
           +-1e.0-[03]--
           +-1f.0  Intel Corporation NM10 Family LPC Controller
           +-1f.2  Intel Corporation N10/ICH7 Family SATA IDE Controller
           \-1f.3  Intel Corporation N10/ICH 7 Family SMBus Controller

2 SATA disks are on one SATA only controller (don't know which one, any help to find out?).
2 SATA + 1 PATA disks are on the SATA+IDE Controller.

Any idea or tests I can do?

Comment 10 Edouard Bourguignon 2011-05-03 06:55:51 UTC
Problem still here on kernel-2.6.35.12-90.fc14.x86_64

[   50.510502] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   50.510618] ata5.00: BMDMA stat 0x24
[   50.510688] ata5.00: failed command: WRITE DMA
[   50.510780] ata5.00: cmd ca/00:08:8d:31:43/00:00:00:00:00/e4 tag 0 dma 4096 out
[   50.510784]          res 51/84:00:94:31:43/84:00:02:00:00/e4 Emask 0x10 (ATA bus error)
[   50.511070] ata5.00: status: { DRDY ERR }
[   50.511149] ata5.00: error: { ICRC ABRT }
[   50.511251] ata5: soft resetting link
[   50.673207] ata5.00: configured for UDMA/66
[   50.673320] ata5: EH complete

Should I try on Fedora 15 beta?

Comment 11 Edouard Bourguignon 2011-05-03 15:52:10 UTC
Answer from Gigabyte:

"Hello,

We don't have any solution at the moment for your problem.

Regards,
Technical Support GIGABYTE."

Comment 12 Chuck Ebbert 2011-05-06 13:54:39 UTC
(In reply to comment #10)
> [   50.511070] ata5.00: status: { DRDY ERR }
> [   50.511149] ata5.00: error: { ICRC ABRT }

You're getting CRC errors, which is usually a sign of some kind of hardware problem (bad/loose cables, wrong cable configuration with PATA drives, insufficient power supply etc.)

Comment 13 Edouard Bourguignon 2011-05-07 10:47:29 UTC
Thank you Chuck. I agree that CRC errors mean some kind of hardware failure, but I've checked everything, changed everything, problem is still occuring. Power supply is ok, I have the problem with mobo and only 1 PATA disk connected, which should need about 40W only... However power supply is brand new now. I've changed all the cables by new ones, same problem. Even the PATA disk is new.

But well, I'm going to check agin and see if there is any loose cable. I'm a bit distrubed by the fact that it's really better since BIOS upgrade (no more hang), what make me thinks it could be software related?

Comment 14 Edouard Bourguignon 2011-05-16 18:55:28 UTC
I've just upgraded to fc15, and for the moment (1day of uptime), no error. So far so good, I hope this is fixed now.

Comment 15 Edouard Bourguignon 2011-05-23 08:17:24 UTC
After a few days of uptime, still nothing. But when I remove the nolapic option from cmdline, same errors are back:


[   26.815639] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[   26.820247] ata5.00: BMDMA stat 0x24
[   26.824781] ata5.00: failed command: WRITE DMA EXT
[   26.829294] ata5.00: cmd 35/00:00:fd:37:47/00:04:02:00:00/e0 tag 0 dma 524288 out
[   26.838226] ata5.00: status: { DRDY ERR }
[   26.842574] ata5.00: error: { ICRC ABRT }
[   26.846866] ata5: soft resetting link
[   27.012824] ata5.00: configured for UDMA/100
[   27.017096] ata5: EH complete

kernel-2.6.38.6-27.fc15.x86_64. At least the machine doesn't hang anymore.

Comment 16 Edouard Bourguignon 2011-07-29 12:42:55 UTC
I'm trying a new approach.

This motherboard has 2 SATA controllers. The PATA is controlled by the 2nd SATA controller (named GSATA for Gigabyte Sata?).

With the following settings, there come my problems:

SATA1 => 750GB HDD SATA 1.5Gbps
SATA2 => 2TB HDD SATA 3Gbps
GSATA1 => 750GB HDD SATA 1.5Gbps
GSATA2 => 2TB HDD SATA 3Gbps
GPATA1 => HDD UDMA133

With the following settings it seems to work fine:
SATA1 => 750GB HDD SATA 1.5Gbps
SATA2 => 2TB HDD SATA 1.5Gbps
GSATA1 => 750GB HDD SATA 3Gbps
GSATA2 => 2TB HDD SATA 3Gbps
GPATA1 => HDD UDMA133

So it seems impossible to mix RAID1 array disks on different controllers on this mobo...

Comment 17 Edouard Bourguignon 2011-07-29 12:45:03 UTC
I meant

With the following settings, there come my problems:

SATA1 => 750GB HDD SATA 1.5Gbps
SATA2 => 2TB HDD SATA 3Gbps
GSATA1 => 750GB HDD SATA 1.5Gbps
GSATA2 => 2TB HDD SATA 3Gbps
GPATA1 => HDD UDMA133

With the following settings it seems to work fine:
SATA1 => 750GB HDD SATA 1.5Gbps
SATA2 => 750GB HDD SATA 1.5Gbps
GSATA1 => 2TB HDD SATA 3Gbps
GSATA2 => 2TB HDD SATA 3Gbps
GPATA1 => HDD UDMA133

I just swap HDDs on port SATA1 and GSATA2, same SATA speed on same controller and now it's fine.

Comment 18 Edouard Bourguignon 2011-12-04 23:23:25 UTC
Problem disappears on F16.