Bug 684450

Summary: eSATA hot-swap broken with nvidia sata controller [10de:0ad4], port is unusable until either reboot or execute a manual port rescan without drives
Product: [Fedora] Fedora Reporter: Reartes Guillermo <rtguille>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-04 17:47:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Reartes Guillermo 2011-03-12 17:30:20 UTC
Description of problem:

I had two eSATA enclosures. Each one has a 2TB hd.

I cannot use the other enclosure after using any of them first.
After using any enclosure and umounting, then powering them off,
the other enclosure is not propery detected by the kernel. Also
puting back the previously detected unit back is also not 
detected anymore. The ata port is unusable until reboot (tried
with 2 sata ports and it is not port related)

The only workarount is to use one esata port for each enclosure. 
Not a desirable workaround.



Version-Release number of selected component (if applicable):

2.6.38-0.rc8.git0.1.fc15.x86_64

How reproducible:
allways

Steps to Reproduce:
I finished transfering data to the drive. I umounted the drive's partition.

# tail -f /var/log/messages 
Mar 12 13:51:52 ulquiorra kernel: [ 7996.162732]  md127: unknown partition table
Mar 12 13:52:00 ulquiorra kernel: [ 8004.064764] EXT3-fs: barriers not enabled
Mar 12 13:52:00 ulquiorra kernel: [ 8004.070992] kjournald starting.  Commit interval 5 seconds
Mar 12 13:52:00 ulquiorra kernel: [ 8004.099845] EXT3-fs (md127): using internal journal
Mar 12 13:52:00 ulquiorra kernel: [ 8004.099857] EXT3-fs (md127): mounted filesystem with ordered data mode
Mar 12 13:53:05 ulquiorra kernel: [ 8069.044644] EXT3-fs: barriers not enabled
Mar 12 13:53:05 ulquiorra kernel: [ 8069.051054] kjournald starting.  Commit interval 5 seconds
Mar 12 13:53:05 ulquiorra kernel: [ 8069.079907] EXT3-fs (md127): using internal journal
Mar 12 13:53:05 ulquiorra kernel: [ 8069.079921] EXT3-fs (md127): mounted filesystem with ordered data mode
Mar 12 13:53:13 ulquiorra kernel: [ 8076.598705] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)



I am going to poweroff the device.

Mar 12 14:03:31 ulquiorra kernel: [ 8695.290685] ata6: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Mar 12 14:03:31 ulquiorra kernel: [ 8695.290696] ata6: irq_stat 0x00400000, PHY RDY changed
Mar 12 14:03:31 ulquiorra kernel: [ 8695.290706] ata6: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Mar 12 14:03:31 ulquiorra kernel: [ 8695.290722] ata6: hard resetting link
Mar 12 14:03:32 ulquiorra kernel: [ 8696.013381] ata6: SATA link down (SStatus 0 SControl 300)
Mar 12 14:03:37 ulquiorra kernel: [ 8701.013413] ata6: hard resetting link
Mar 12 14:03:38 ulquiorra kernel: [ 8701.318080] ata6: SATA link down (SStatus 0 SControl 300)
Mar 12 14:03:38 ulquiorra kernel: [ 8701.318095] ata6: limiting SATA link speed to 1.5 Gbps
Mar 12 14:03:43 ulquiorra kernel: [ 8706.318366] ata6: hard resetting link
Mar 12 14:03:43 ulquiorra kernel: [ 8706.623382] ata6: SATA link down (SStatus 0 SControl 310)
Mar 12 14:03:43 ulquiorra kernel: [ 8706.623397] ata6.00: disabled
Mar 12 14:03:43 ulquiorra kernel: [ 8706.623419] ata6: EH complete
Mar 12 14:03:43 ulquiorra kernel: [ 8706.623436] ata6.00: detaching (SCSI 5:0:0:0)
Mar 12 14:03:43 ulquiorra kernel: [ 8706.634640] sd 5:0:0:0: [sdc] Synchronizing SCSI cache
Mar 12 14:03:43 ulquiorra kernel: [ 8706.634723] sd 5:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 12 14:03:43 ulquiorra kernel: [ 8706.634736] sd 5:0:0:0: [sdc] Stopping disk
Mar 12 14:03:43 ulquiorra kernel: [ 8706.634757] sd 5:0:0:0: [sdc] START_STOP FAILED
Mar 12 14:03:43 ulquiorra kernel: [ 8706.634763] sd 5:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Ok, now i replace the esata enclosure for the second one.
Now i power on the enclosure.

Mar 12 14:04:35 ulquiorra kernel: [ 8758.845511] ata6: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
Mar 12 14:04:35 ulquiorra kernel: [ 8758.845522] ata6: irq_stat 0x00000040, connection status changed
Mar 12 14:04:35 ulquiorra kernel: [ 8758.845532] ata6: SError: { CommWake LinkSeq TrStaTrns DevExch }
Mar 12 14:04:35 ulquiorra kernel: [ 8758.845549] ata6: hard resetting link
Mar 12 14:04:41 ulquiorra kernel: [ 8764.569389] ata6: link is slow to respond, please be patient (ready=0)
Mar 12 14:04:45 ulquiorra kernel: [ 8768.855376] ata6: COMRESET failed (errno=-16)
Mar 12 14:04:45 ulquiorra kernel: [ 8768.855389] ata6: hard resetting link
Mar 12 14:04:51 ulquiorra kernel: [ 8774.627384] ata6: link is slow to respond, please be patient (ready=0)
Mar 12 14:04:55 ulquiorra kernel: [ 8778.860377] ata6: COMRESET failed (errno=-16)
Mar 12 14:04:55 ulquiorra kernel: [ 8778.860389] ata6: hard resetting link
Mar 12 14:05:01 ulquiorra kernel: [ 8784.632359] ata6: link is slow to respond, please be patient (ready=0)
Mar 12 14:05:30 ulquiorra kernel: [ 8813.864401] ata6: COMRESET failed (errno=-16)
Mar 12 14:05:30 ulquiorra kernel: [ 8813.864415] ata6: limiting SATA link speed to 1.5 Gbps
Mar 12 14:05:30 ulquiorra kernel: [ 8813.864422] ata6: hard resetting link
Mar 12 14:05:35 ulquiorra kernel: [ 8818.872389] ata6: COMRESET failed (errno=-16)
Mar 12 14:05:35 ulquiorra kernel: [ 8818.872401] ata6: reset failed, giving up
Mar 12 14:05:35 ulquiorra kernel: [ 8818.872412] ata6: EH complete


The enclosure light is blinking fast.
This keeps going, so i power of the enclosure. 
Now i change the enclosure.
Using the first enclosure, i power on the unit.


No messages, the enclosure ligth are blinking fast (clearly an error)
So i must reboot to use any enclosure again.
----------
  
Actual results:
I can only use one enclosure per boot. must reboot to use the other(s)

Expected results:
Use one enclosure, power off it.
conect the other, use it, etc.

Additional info:

The disks inside are both WD 2T green 
4K sectors Ç& the same model.

Comment 1 Reartes Guillermo 2011-03-12 18:45:49 UTC
Update:

I can reproduce it on F13, so it is not a new issue.

I got a new drive yesterday & put it on a spare enclosure (same model)
The i discovered it because i started to use both enclosures.

F13 kernel  2.6.34.8-68.fc13.x86_64

So the hot-swap functionality is not working properly on:

00:09.0 SATA controller [0106]: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2)

This is on F13:
Power on enclosure#1

Mar 12 15:28:26 ulquiorra kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
Mar 12 15:28:26 ulquiorra kernel: ata6: irq_stat 0x00000040, connection status changed
Mar 12 15:28:26 ulquiorra kernel: ata6: SError: { CommWake DevExch }
Mar 12 15:28:26 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:27 ulquiorra kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 12 15:28:27 ulquiorra kernel: ata6.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
Mar 12 15:28:27 ulquiorra kernel: ata6.00: 3907029168 sectors, multi 0: LBA48 
Mar 12 15:28:27 ulquiorra kernel: ata6.00: configured for UDMA/133
Mar 12 15:28:27 ulquiorra kernel: ata6: EH complete
Mar 12 15:28:27 ulquiorra kernel: scsi 5:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: Attached scsi generic sg3 type 0
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] Write Protect is off
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 12 15:28:27 ulquiorra kernel: sdc: sdc1
Mar 12 15:28:27 ulquiorra kernel: sd 5:0:0:0: [sdc] Attached SCSI disk


Mount & umount an fs

Mar 12 15:28:35 ulquiorra kernel: kjournald starting.  Commit interval 5 seconds
Mar 12 15:28:35 ulquiorra kernel: EXT3-fs (sdc1): using internal journal
Mar 12 15:28:35 ulquiorra kernel: EXT3-fs (sdc1): mounted filesystem with ordered data mode
Mar 12 15:28:35 ulquiorra hald: mounted /dev/sdc1 on behalf of uid 500
Mar 12 15:28:37 ulquiorra hald: unmounted /dev/sdc1 from '/media/videos-2tb' on behalf of uid 500


Poweroff enclosure #1

Mar 12 15:28:46 ulquiorra kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Mar 12 15:28:46 ulquiorra kernel: ata6: irq_stat 0x00400000, PHY RDY changed
Mar 12 15:28:46 ulquiorra kernel: ata6: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Mar 12 15:28:46 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:46 ulquiorra kernel: ata6: SATA link down (SStatus 0 SControl 300)
Mar 12 15:28:51 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:52 ulquiorra kernel: ata6: SATA link down (SStatus 0 SControl 300)
Mar 12 15:28:52 ulquiorra kernel: ata6: limiting SATA link speed to 1.5 Gbps
Mar 12 15:28:57 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:28:57 ulquiorra kernel: ata6: SATA link down (SStatus 0 SControl 310)
Mar 12 15:28:57 ulquiorra kernel: ata6.00: disabled
Mar 12 15:28:57 ulquiorra kernel: ata6: EH complete
Mar 12 15:28:57 ulquiorra kernel: ata6.00: detaching (SCSI 5:0:0:0)
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Stopping disk
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] START_STOP FAILED
Mar 12 15:28:57 ulquiorra kernel: sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


Poweron enclosure #2


Mar 12 15:29:39 ulquiorra kernel: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4140000 action 0xe frozen
Mar 12 15:29:39 ulquiorra kernel: ata6: irq_stat 0x00000040, connection status changed
Mar 12 15:29:39 ulquiorra kernel: ata6: SError: { CommWake Dispar DevExch }
Mar 12 15:29:39 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:29:49 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:29:49 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:29:59 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:29:59 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:30:10 ulquiorra kernel: ata6: link is slow to respond, please be patient (ready=0)
Mar 12 15:30:34 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:30:34 ulquiorra kernel: ata6: limiting SATA link speed to 1.5 Gbps
Mar 12 15:30:34 ulquiorra kernel: ata6: hard resetting link
Mar 12 15:30:39 ulquiorra kernel: ata6: softreset failed (device not ready)
Mar 12 15:30:39 ulquiorra kernel: ata6: reset failed, giving up
Mar 12 15:30:39 ulquiorra kernel: ata6: EH complete

Comment 2 Reartes Guillermo 2011-03-12 23:59:52 UTC
workaround:

running this command once, after being hit with the problem
seems to fix the issue until reboot. 

pwr on enclosure#1
pwr off enclosure#1

pwr on enclosure#2
---> ata port unusable, as stated above. to fix execute:
# echo "- - - " > /sys/class/scsi_host/host5/scan

pwr off enclosure#2

pwr on enclosure#1, now it is detected!

Comment 3 Reartes Guillermo 2011-04-10 17:55:43 UTC
Hot-Swap support is broken with the SATA controller specified below.
A manual workaround can be used, specified below.

HW:
* 2 x Nexstar CX (eSATA + USB2)
* 2 x WD 2TB Green (Adv. Fmt) (SAME MODEL)


Affected controller:
* nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2) 

Workaround:
After powering down the disk enclosure to be removed, execute:
# echo "- - - " > /sys/class/scsi_host/host4/scan
As there is nothing in ATA5, it seems to purge something.

After executing this, change the enclosure and power it on again, it will be detected.

--------------------------

Tried again with kernel 2.6.38.2-9.fc15.x86_64


Disk#0 >>> POWERON
Disk#0 >>> Detected by the kernel
Disk#0 >>> POWEROFF

Apr 10 14:21:57 xxxxxx kernel: [14222.365736] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
Apr 10 14:21:57 xxxxxx kernel: [14222.365747] ata5: irq_stat 0x00400000, PHY RDY changed
Apr 10 14:21:57 xxxxxx kernel: [14222.365757] ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Apr 10 14:21:57 xxxxxx kernel: [14222.365772] ata5: hard resetting link
Apr 10 14:21:57 xxxxxx kernel: [14223.088388] ata5: SATA link down (SStatus 0 SControl 300)
Apr 10 14:22:02 xxxxxx kernel: [14228.088402] ata5: hard resetting link
Apr 10 14:22:03 xxxxxx kernel: [14228.393359] ata5: SATA link down (SStatus 0 SControl 300)
Apr 10 14:22:03 xxxxxx kernel: [14228.393374] ata5: limiting SATA link speed to 1.5 Gbps
Apr 10 14:22:08 xxxxxx kernel: [14233.393400] ata5: hard resetting link
Apr 10 14:22:08 xxxxxx kernel: [14233.698389] ata5: SATA link down (SStatus 0 SControl 310)
Apr 10 14:22:08 xxxxxx kernel: [14233.698403] ata5.00: disabled
Apr 10 14:22:08 xxxxxx kernel: [14233.698425] ata5: EH complete
Apr 10 14:22:08 xxxxxx kernel: [14233.698442] ata5.00: detaching (SCSI 4:0:0:0)
Apr 10 14:22:08 xxxxxx kernel: [14233.712537] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
Apr 10 14:22:08 xxxxxx kernel: [14233.712580] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 10 14:22:08 xxxxxx kernel: [14233.712584] sd 4:0:0:0: [sdc] Stopping disk
Apr 10 14:22:08 xxxxxx kernel: [14233.712591] sd 4:0:0:0: [sdc] START_STOP FAILED
Apr 10 14:22:08 xxxxxx kernel: [14233.712593] sd 4:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


Waited for 20 seconds...
Disk#0 >>> UNPLUGGED

Disk#1 >>> PLUGGED
Disk#1 >>> POWERON
Disk#1 >>> Drive NOT detected

Apr 10 14:23:22 xxxxxx kernel: [14307.342869] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5940000 action 0xe frozen
Apr 10 14:23:22 xxxxxx kernel: [14307.342880] ata5: irq_stat 0x00000040, connection status changed
Apr 10 14:23:22 xxxxxx kernel: [14307.342890] ata5: SError: { CommWake Dispar LinkSeq TrStaTrns DevExch }
Apr 10 14:23:22 xxxxxx kernel: [14307.342907] ata5: hard resetting link
Apr 10 14:23:27 xxxxxx kernel: [14313.114376] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:23:32 xxxxxx kernel: [14317.347106] ata5: COMRESET failed (errno=-16)
Apr 10 14:23:32 xxxxxx kernel: [14317.347120] ata5: hard resetting link
Apr 10 14:23:37 xxxxxx kernel: [14323.119409] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:23:42 xxxxxx kernel: [14327.354374] ata5: COMRESET failed (errno=-16)
Apr 10 14:23:42 xxxxxx kernel: [14327.354387] ata5: hard resetting link
Apr 10 14:23:47 xxxxxx kernel: [14333.126374] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:24:17 xxxxxx kernel: [14362.355374] ata5: COMRESET failed (errno=-16)
Apr 10 14:24:17 xxxxxx kernel: [14362.355388] ata5: limiting SATA link speed to 1.5 Gbps
Apr 10 14:24:17 xxxxxx kernel: [14362.355396] ata5: hard resetting link
Apr 10 14:24:22 xxxxxx kernel: [14367.362464] ata5: COMRESET failed (errno=-16)
Apr 10 14:24:22 xxxxxx kernel: [14367.362476] ata5: reset failed, giving up
Apr 10 14:24:22 xxxxxx kernel: [14367.362487] ata5: EH complete

# echo "- - - " > /sys/class/scsi_host/host4/scan

Apr 10 14:25:41 xxxxxx kernel: [14447.022753] ata5: hard resetting link
Apr 10 14:25:47 xxxxxx kernel: [14452.376367] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:25:51 xxxxxx kernel: [14457.068379] ata5: COMRESET failed (errno=-16)
Apr 10 14:25:51 xxxxxx kernel: [14457.068393] ata5: hard resetting link
Apr 10 14:25:57 xxxxxx kernel: [14462.423410] ata5: link is slow to respond, please be patient (ready=0)
Apr 10 14:26:01 xxxxxx kernel: [14467.115054] ata5: COMRESET failed (errno=-16)
Apr 10 14:26:01 xxxxxx kernel: [14467.115068] ata5: hard resetting link

Disk#1 >>> Drive STILL not detected
Disk#1 >>> POWEROFF

# echo "- - - " > /sys/class/scsi_host/host4/scan

Apr 10 14:26:46 xxxxxx kernel: [14511.614712] ata5: hard resetting link
Apr 10 14:26:46 xxxxxx kernel: [14511.919362] ata5: SATA link down (SStatus 10 SControl 300)
Apr 10 14:26:46 xxxxxx kernel: [14511.919381] ata5: EH complete

Disk#1 >>> POWERON
Disk#1 >>> Detected by the kernel

Apr 10 14:27:09 xxxxxx kernel: [14534.355145] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
Apr 10 14:27:09 xxxxxx kernel: [14534.355155] ata5: irq_stat 0x00000040, connection status changed
Apr 10 14:27:09 xxxxxx kernel: [14534.355166] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
Apr 10 14:27:09 xxxxxx kernel: [14534.355183] ata5: hard resetting link
Apr 10 14:27:09 xxxxxx kernel: [14535.078374] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 10 14:27:09 xxxxxx kernel: [14535.097086] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
Apr 10 14:27:09 xxxxxx kernel: [14535.097096] ata5.00: 3907029168 sectors, multi 0: LBA48 
Apr 10 14:27:09 xxxxxx kernel: [14535.100818] ata5.00: configured for UDMA/133
Apr 10 14:27:09 xxxxxx kernel: [14535.100835] ata5: EH complete
Apr 10 14:27:09 xxxxxx kernel: [14535.101057] scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
Apr 10 14:27:09 xxxxxx kernel: [14535.103725] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Apr 10 14:27:09 xxxxxx kernel: [14535.103852] sd 4:0:0:0: [sdc] Write Protect is off
Apr 10 14:27:09 xxxxxx kernel: [14535.103914] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 10 14:27:09 xxxxxx kernel: [14535.108858] sd 4:0:0:0: Attached scsi generic sg3 type 0
Apr 10 14:27:09 xxxxxx kernel: [14535.127100]  sdc: sdc1
Apr 10 14:27:09 xxxxxx kernel: [14535.127727] sd 4:0:0:0: [sdc] Attached SCSI disk

Comment 4 Josh Boyer 2012-06-04 17:47:58 UTC
There were a number of eSATA fixes that went into the 2.6.43 kernel updates.  You might want to try those if you haven't already.  If that doesn't solve the problem, please open a new bug with the relevant information.