| Summary: | eSATA hot-swap broken with nvidia sata controller [10de:0ad4], port is unusable until either reboot or execute a manual port rescan without drives | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Reartes Guillermo <rtguille> |
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 15 | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-06-04 17:47:58 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Update:
I can reproduce it on F13, so it is not a new issue.
I got a new drive yesterday & put it on a spare enclosure (same model)
The i discovered it because i started to use both enclosures.
F13 kernel 2.6.34.8-68.fc13.x86_64
So the hot-swap functionality is not working properly on:
00:09.0 SATA controller [0106]: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2)
This is on F13:
Power on enclosure#1
Mar 12 15:28:26 ulquiorra kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
Mar 12 15:28:26 ulquiorra kernel: ata6: irq_stat 0x00000040, connection status changed
Mar 12 15:28:26 ulquiorra kernel: ata6: SError: { CommWake DevExch }
Mar 12 15:28:26 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:27 ulquiorra kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 12 15:28:27 ulquiorra kernel: ata6.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
Mar 12 15:28:27 ulquiorra kernel: ata6.00: 3907029168 sectors, multi 0: LBA48
Mar 12 15:28:27 ulquiorra kernel: ata6.00: configured for UDMA/133
Mar 12 15:28:27 ulquiorra kernel: ata6: EH complete
Mar 12 15:28:27 ulquiorra kernel: scsi 5:0:0:0: Direct-Access ATA WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: Attached scsi generic sg3 type 0
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] Write Protect is off
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 12 15:28:27 ulquiorra kernel: sdc: sdc1
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] Attached SCSI disk
Mount & umount an fs
Mar 12 15:28:35 ulquiorra kernel: kjournald starting. Commit interval 5 seconds
Mar 12 15:28:35 ulquiorra kernel: EXT3-fs (sdc1): using internal journal
Mar 12 15:28:35 ulquiorra kernel: EXT3-fs (sdc1): mounted filesystem with ordered data mode
Mar 12 15:28:35 ulquiorra hald: mounted /dev/sdc1 on behalf of uid 500
Mar 12 15:28:37 ulquiorra hald: unmounted /dev/sdc1 from '/media/videos-2tb' on behalf of uid 500
Poweroff enclosure #1
Mar 12 15:28:46 ulquiorra kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Mar 12 15:28:46 ulquiorra kernel: ata6: irq_stat 0x00400000, PHY RDY changed
Mar 12 15:28:46 ulquiorra kernel: ata6: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Mar 12 15:28:46 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:46 ulquiorra kernel: ata6: SATA link down (SStatus 0 SControl 300)
Mar 12 15:28:51 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:52 ulquiorra kernel: ata6: SATA link down (SStatus 0 SControl 300)
Mar 12 15:28:52 ulquiorra kernel: ata6: limiting SATA link speed to 1.5 Gbps
Mar 12 15:28:57 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:57 ulquiorra kernel: ata6: SATA link down (SStatus 0 SControl 310)
Mar 12 15:28:57 ulquiorra kernel: ata6.00: disabled
Mar 12 15:28:57 ulquiorra kernel: ata6: EH complete
Mar 12 15:28:57 ulquiorra kernel: ata6.00: detaching (SCSI 5:0:0:0)
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Stopping disk
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] START_STOP FAILED
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Poweron enclosure #2
Mar 12 15:29:39 ulquiorra kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4140000 action 0xe frozen
Mar 12 15:29:39 ulquiorra kernel: ata6: irq_stat 0x00000040, connection status changed
Mar 12 15:29:39 ulquiorra kernel: ata6: SError: { CommWake Dispar DevExch }
Mar 12 15:29:39 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:29:49 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:29:49 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:29:59 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:29:59 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:30:10 ulquiorra kernel: ata6: link is slow to respond, please be patient (ready=0)
Mar 12 15:30:34 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:30:34 ulquiorra kernel: ata6: limiting SATA link speed to 1.5 Gbps
Mar 12 15:30:34 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:30:39 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:30:39 ulquiorra kernel: ata6: reset failed, giving up
Mar 12 15:30:39 ulquiorra kernel: ata6: EH complete
workaround: running this command once, after being hit with the problem seems to fix the issue until reboot. pwr on enclosure#1 pwr off enclosure#1 pwr on enclosure#2 ---> ata port unusable, as stated above. to fix execute: # echo "- - - " > /sys/class/scsi_host/host5/scan pwr off enclosure#2 pwr on enclosure#1, now it is detected! Hot-Swap support is broken with the SATA controller specified below.
A manual workaround can be used, specified below.
HW:
* 2 x Nexstar CX (eSATA + USB2)
* 2 x WD 2TB Green (Adv. Fmt) (SAME MODEL)
Affected controller:
* nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2)
Workaround:
After powering down the disk enclosure to be removed, execute:
# echo "- - - " > /sys/class/scsi_host/host4/scan
As there is nothing in ATA5, it seems to purge something.
After executing this, change the enclosure and power it on again, it will be detected.
--------------------------
Tried again with kernel 2.6.38.2-9.fc15.x86_64
Disk#0 >>> POWERON
Disk#0 >>> Detected by the kernel
Disk#0 >>> POWEROFF
Apr 10 14:21:57 xxxxxx kernel: [14222.365736] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Apr 10 14:21:57 xxxxxx kernel: [14222.365747] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 10 14:21:57 xxxxxx kernel: [14222.365757] ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Apr 10 14:21:57 xxxxxx kernel: [14222.365772] ata5: hard resetting link
Apr 10 14:21:57 xxxxxx kernel: [14223.088388] ata5: SATA link down (SStatus 0 SControl 300)
Apr 10 14:22:02 xxxxxx kernel: [14228.088402] ata5: hard resetting link
Apr 10 14:22:03 xxxxxx kernel: [14228.393359] ata5: SATA link down (SStatus 0 SControl 300)
Apr 10 14:22:03 xxxxxx kernel: [14228.393374] ata5: limiting SATA link speed to 1.5 Gbps
Apr 10 14:22:08 xxxxxx kernel: [14233.393400] ata5: hard resetting link
Apr 10 14:22:08 xxxxxx kernel: [14233.698389] ata5: SATA link down (SStatus 0 SControl 310)
Apr 10 14:22:08 xxxxxx kernel: [14233.698403] ata5.00: disabled
Apr 10 14:22:08 xxxxxx kernel: [14233.698425] ata5: EH complete
Apr 10 14:22:08 xxxxxx kernel: [14233.698442] ata5.00: detaching (SCSI 4:0:0:0)
Apr 10 14:22:08 xxxxxx kernel: [14233.712537] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
Apr 10 14:22:08 xxxxxx kernel: [14233.712580] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 10 14:22:08 xxxxxx kernel: [14233.712584] sd 4:0:0:0: [sdc] Stopping disk
Apr 10 14:22:08 xxxxxx kernel: [14233.712591] sd 4:0:0:0: [sdc] START_STOP FAILED
Apr 10 14:22:08 xxxxxx kernel: [14233.712593] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Waited for 20 seconds...
Disk#0 >>> UNPLUGGED
Disk#1 >>> PLUGGED
Disk#1 >>> POWERON
Disk#1 >>> Drive NOT detected
Apr 10 14:23:22 xxxxxx kernel: [14307.342869] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5940000 action 0xe frozen
Apr 10 14:23:22 xxxxxx kernel: [14307.342880] ata5: irq_stat 0x00000040, connection status changed
Apr 10 14:23:22 xxxxxx kernel: [14307.342890] ata5: SError: { CommWake Dispar LinkSeq TrStaTrns DevExch }
Apr 10 14:23:22 xxxxxx kernel: [14307.342907] ata5: hard resetting link
Apr 10 14:23:27 xxxxxx kernel: [14313.114376] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:23:32 xxxxxx kernel: [14317.347106] ata5: COMRESET failed (errno=-16)
Apr 10 14:23:32 xxxxxx kernel: [14317.347120] ata5: hard resetting link
Apr 10 14:23:37 xxxxxx kernel: [14323.119409] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:23:42 xxxxxx kernel: [14327.354374] ata5: COMRESET failed (errno=-16)
Apr 10 14:23:42 xxxxxx kernel: [14327.354387] ata5: hard resetting link
Apr 10 14:23:47 xxxxxx kernel: [14333.126374] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:24:17 xxxxxx kernel: [14362.355374] ata5: COMRESET failed (errno=-16)
Apr 10 14:24:17 xxxxxx kernel: [14362.355388] ata5: limiting SATA link speed to 1.5 Gbps
Apr 10 14:24:17 xxxxxx kernel: [14362.355396] ata5: hard resetting link
Apr 10 14:24:22 xxxxxx kernel: [14367.362464] ata5: COMRESET failed (errno=-16)
Apr 10 14:24:22 xxxxxx kernel: [14367.362476] ata5: reset failed, giving up
Apr 10 14:24:22 xxxxxx kernel: [14367.362487] ata5: EH complete
# echo "- - - " > /sys/class/scsi_host/host4/scan
Apr 10 14:25:41 xxxxxx kernel: [14447.022753] ata5: hard resetting link
Apr 10 14:25:47 xxxxxx kernel: [14452.376367] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:25:51 xxxxxx kernel: [14457.068379] ata5: COMRESET failed (errno=-16)
Apr 10 14:25:51 xxxxxx kernel: [14457.068393] ata5: hard resetting link
Apr 10 14:25:57 xxxxxx kernel: [14462.423410] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:26:01 xxxxxx kernel: [14467.115054] ata5: COMRESET failed (errno=-16)
Apr 10 14:26:01 xxxxxx kernel: [14467.115068] ata5: hard resetting link
Disk#1 >>> Drive STILL not detected
Disk#1 >>> POWEROFF
# echo "- - - " > /sys/class/scsi_host/host4/scan
Apr 10 14:26:46 xxxxxx kernel: [14511.614712] ata5: hard resetting link
Apr 10 14:26:46 xxxxxx kernel: [14511.919362] ata5: SATA link down (SStatus 10 SControl 300)
Apr 10 14:26:46 xxxxxx kernel: [14511.919381] ata5: EH complete
Disk#1 >>> POWERON
Disk#1 >>> Detected by the kernel
Apr 10 14:27:09 xxxxxx kernel: [14534.355145] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
Apr 10 14:27:09 xxxxxx kernel: [14534.355155] ata5: irq_stat 0x00000040, connection status changed
Apr 10 14:27:09 xxxxxx kernel: [14534.355166] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
Apr 10 14:27:09 xxxxxx kernel: [14534.355183] ata5: hard resetting link
Apr 10 14:27:09 xxxxxx kernel: [14535.078374] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 10 14:27:09 xxxxxx kernel: [14535.097086] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
Apr 10 14:27:09 xxxxxx kernel: [14535.097096] ata5.00: 3907029168 sectors, multi 0: LBA48
Apr 10 14:27:09 xxxxxx kernel: [14535.100818] ata5.00: configured for UDMA/133
Apr 10 14:27:09 xxxxxx kernel: [14535.100835] ata5: EH complete
Apr 10 14:27:09 xxxxxx kernel: [14535.101057] scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
Apr 10 14:27:09 xxxxxx kernel: [14535.103725] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Apr 10 14:27:09 xxxxxx kernel: [14535.103852] sd 4:0:0:0: [sdc] Write Protect is off
Apr 10 14:27:09 xxxxxx kernel: [14535.103914] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 10 14:27:09 xxxxxx kernel: [14535.108858] sd 4:0:0:0: Attached scsi generic sg3 type 0
Apr 10 14:27:09 xxxxxx kernel: [14535.127100] sdc: sdc1
Apr 10 14:27:09 xxxxxx kernel: [14535.127727] sd 4:0:0:0: [sdc] Attached SCSI disk
There were a number of eSATA fixes that went into the 2.6.43 kernel updates. You might want to try those if you haven't already. If that doesn't solve the problem, please open a new bug with the relevant information. |
Description of problem: I had two eSATA enclosures. Each one has a 2TB hd. I cannot use the other enclosure after using any of them first. After using any enclosure and umounting, then powering them off, the other enclosure is not propery detected by the kernel. Also puting back the previously detected unit back is also not detected anymore. The ata port is unusable until reboot (tried with 2 sata ports and it is not port related) The only workarount is to use one esata port for each enclosure. Not a desirable workaround. Version-Release number of selected component (if applicable): 2.6.38-0.rc8.git0.1.fc15.x86_64 How reproducible: allways Steps to Reproduce: I finished transfering data to the drive. I umounted the drive's partition. # tail -f /var/log/messages Mar 12 13:51:52 ulquiorra kernel: [ 7996.162732] md127: unknown partition table Mar 12 13:52:00 ulquiorra kernel: [ 8004.064764] EXT3-fs: barriers not enabled Mar 12 13:52:00 ulquiorra kernel: [ 8004.070992] kjournald starting. Commit interval 5 seconds Mar 12 13:52:00 ulquiorra kernel: [ 8004.099845] EXT3-fs (md127): using internal journal Mar 12 13:52:00 ulquiorra kernel: [ 8004.099857] EXT3-fs (md127): mounted filesystem with ordered data mode Mar 12 13:53:05 ulquiorra kernel: [ 8069.044644] EXT3-fs: barriers not enabled Mar 12 13:53:05 ulquiorra kernel: [ 8069.051054] kjournald starting. Commit interval 5 seconds Mar 12 13:53:05 ulquiorra kernel: [ 8069.079907] EXT3-fs (md127): using internal journal Mar 12 13:53:05 ulquiorra kernel: [ 8069.079921] EXT3-fs (md127): mounted filesystem with ordered data mode Mar 12 13:53:13 ulquiorra kernel: [ 8076.598705] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null) I am going to poweroff the device. Mar 12 14:03:31 ulquiorra kernel: [ 8695.290685] ata6: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen Mar 12 14:03:31 ulquiorra kernel: [ 8695.290696] ata6: irq_stat 0x00400000, PHY RDY changed Mar 12 14:03:31 ulquiorra kernel: [ 8695.290706] ata6: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } Mar 12 14:03:31 ulquiorra kernel: [ 8695.290722] ata6: hard resetting link Mar 12 14:03:32 ulquiorra kernel: [ 8696.013381] ata6: SATA link down (SStatus 0 SControl 300) Mar 12 14:03:37 ulquiorra kernel: [ 8701.013413] ata6: hard resetting link Mar 12 14:03:38 ulquiorra kernel: [ 8701.318080] ata6: SATA link down (SStatus 0 SControl 300) Mar 12 14:03:38 ulquiorra kernel: [ 8701.318095] ata6: limiting SATA link speed to 1.5 Gbps Mar 12 14:03:43 ulquiorra kernel: [ 8706.318366] ata6: hard resetting link Mar 12 14:03:43 ulquiorra kernel: [ 8706.623382] ata6: SATA link down (SStatus 0 SControl 310) Mar 12 14:03:43 ulquiorra kernel: [ 8706.623397] ata6.00: disabled Mar 12 14:03:43 ulquiorra kernel: [ 8706.623419] ata6: EH complete Mar 12 14:03:43 ulquiorra kernel: [ 8706.623436] ata6.00: detaching (SCSI 5:0:0:0) Mar 12 14:03:43 ulquiorra kernel: [ 8706.634640] sd 5:0:0:0: [sdc] Synchronizing SCSI cache Mar 12 14:03:43 ulquiorra kernel: [ 8706.634723] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Mar 12 14:03:43 ulquiorra kernel: [ 8706.634736] sd 5:0:0:0: [sdc] Stopping disk Mar 12 14:03:43 ulquiorra kernel: [ 8706.634757] sd 5:0:0:0: [sdc] START_STOP FAILED Mar 12 14:03:43 ulquiorra kernel: [ 8706.634763] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Ok, now i replace the esata enclosure for the second one. Now i power on the enclosure. Mar 12 14:04:35 ulquiorra kernel: [ 8758.845511] ata6: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen Mar 12 14:04:35 ulquiorra kernel: [ 8758.845522] ata6: irq_stat 0x00000040, connection status changed Mar 12 14:04:35 ulquiorra kernel: [ 8758.845532] ata6: SError: { CommWake LinkSeq TrStaTrns DevExch } Mar 12 14:04:35 ulquiorra kernel: [ 8758.845549] ata6: hard resetting link Mar 12 14:04:41 ulquiorra kernel: [ 8764.569389] ata6: link is slow to respond, please be patient (ready=0) Mar 12 14:04:45 ulquiorra kernel: [ 8768.855376] ata6: COMRESET failed (errno=-16) Mar 12 14:04:45 ulquiorra kernel: [ 8768.855389] ata6: hard resetting link Mar 12 14:04:51 ulquiorra kernel: [ 8774.627384] ata6: link is slow to respond, please be patient (ready=0) Mar 12 14:04:55 ulquiorra kernel: [ 8778.860377] ata6: COMRESET failed (errno=-16) Mar 12 14:04:55 ulquiorra kernel: [ 8778.860389] ata6: hard resetting link Mar 12 14:05:01 ulquiorra kernel: [ 8784.632359] ata6: link is slow to respond, please be patient (ready=0) Mar 12 14:05:30 ulquiorra kernel: [ 8813.864401] ata6: COMRESET failed (errno=-16) Mar 12 14:05:30 ulquiorra kernel: [ 8813.864415] ata6: limiting SATA link speed to 1.5 Gbps Mar 12 14:05:30 ulquiorra kernel: [ 8813.864422] ata6: hard resetting link Mar 12 14:05:35 ulquiorra kernel: [ 8818.872389] ata6: COMRESET failed (errno=-16) Mar 12 14:05:35 ulquiorra kernel: [ 8818.872401] ata6: reset failed, giving up Mar 12 14:05:35 ulquiorra kernel: [ 8818.872412] ata6: EH complete The enclosure light is blinking fast. This keeps going, so i power of the enclosure. Now i change the enclosure. Using the first enclosure, i power on the unit. No messages, the enclosure ligth are blinking fast (clearly an error) So i must reboot to use any enclosure again. ---------- Actual results: I can only use one enclosure per boot. must reboot to use the other(s) Expected results: Use one enclosure, power off it. conect the other, use it, etc. Additional info: The disks inside are both WD 2T green 4K sectors Ç& the same model.