Bug 671923 - External USB 2.0 conected dual SATA drive enclosure periodically disconnects & fails
Summary: External USB 2.0 conected dual SATA drive enclosure periodically disconnects ...
Keywords:
Status: CLOSED DUPLICATE of bug 663186
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 14
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-22 22:56 UTC by George R. Goffe
Modified: 2011-09-06 18:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-06 18:10:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
contents of /var/log/messages (40.13 KB, application/x-gzip)
2011-01-22 22:56 UTC, George R. Goffe
no flags Details

Description George R. Goffe 2011-01-22 22:56:40 UTC
Created attachment 474766 [details]
contents of /var/log/messages

Description of problem:

I have two external NEW sata drives (2TB in size) in a ez-dock2 kingwin device, model ezd-2536 (JBOD mode), connected to my system via usb. I use only one drive at a time due to cooling issues. For some odd reason, each of the drives eventually becomes unreachable but then becomes almost immediately reachable on the next drive letter available. For example, starting at sdb1 the device "disappears" from view (with associated I/O error messages) and re-appears as sdc1. At this point in time, I ALWAYS reboot the system AND run "e2fsck -f /dev/sdb1". ALL system file types are ext4 including these two drives. THANKFULLY, ext4 fsck runs VERY QUICKLY.

Version-Release number of selected component (if applicable):

Kernel version: 2.6.35.10-74.fc14.i686

How reproducible:

Readily. Possibly by making the drive REALLY busy. I don't TRY to make this happen but suspect it's possible.


Steps to Reproduce:
1. Possibly make the drive REALLY busy
2. Eventually, the problem appears.
3.
  
Actual results:

Drive is unreachable on it's initial device assignment. Then it switches to the next drive. i.e., /dev/sdb1 to /dev/sdc1.


Expected results:

Normal system and device operations.

Additional info:

I'm willing to test fixes and/or gather more information if that's necessary to resolve this problem.

Please NOTE: I am NOT certain it's a kernel/driver problem.

Comment 1 Reartes Guillermo 2011-02-12 21:06:08 UTC
Jan 21 22:12:41 shooter kernel: [    2.207738] usb 1-3: New USB device found, idVendor=152d, idProduct=2336
Jan 21 22:12:41 shooter kernel: [    2.207741] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Jan 21 22:12:41 shooter kernel: [    2.207743] usb 1-3: Product: JM20336 SATA, USB Combo
Jan 21 22:12:41 shooter kernel: [    2.207745] usb 1-3: Manufacturer: JMicron
Jan 21 22:12:41 shooter kernel: [    2.207746] usb 1-3: SerialNumber: 2171B9888888

[...]

Jan 22 03:56:09 shooter kernel: [20631.110094] usb 1-3: reset high speed USB device using ehci_hcd and address 4
Jan 22 03:56:18 shooter kernel: [20639.670712] usb 1-3: USB disconnect, address 4
Jan 22 03:56:18 shooter kernel: [20639.674578] sd 5:0:0:0: Device offlined - not ready after error recovery
Jan 22 03:56:18 shooter kernel: [20639.674595] sd 5:0:0:0: [sdb] Unhandled error code
Jan 22 03:56:18 shooter kernel: [20639.674597] sd 5:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jan 22 03:56:18 shooter kernel: [20639.674601] sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 ae b1 b5 0f 00 00 40 00
Jan 22 03:56:18 shooter kernel: [20639.674611] end_request: I/O error, dev sdb, sector 2930881807
Jan 22 03:56:18 shooter kernel: [20639.674650] sd 5:0:0:0: [sdb] Unhandled error code
Jan 22 03:56:18 shooter kernel: [20639.674652] sd 5:0:0:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jan 22 03:56:18 shooter kernel: [20639.674655] sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 20 eb 7c bf 00 00 c0 00
Jan 22 03:56:18 shooter kernel: [20639.674663] end_request: I/O error, dev sdb, sector 552303807
Jan 22 03:56:18 shooter kernel: [20639.690426] JBD2: Detected IO errors while flushing file data on sdb1-8
Jan 22 03:56:18 shooter kernel: [20639.690454] Aborting journal on device sdb1-8.

[...]

Jan 22 03:56:18 shooter kernel: [20639.690553] EXT4-fs error (device sdb1) in ext4_delete_inode: Journal has aborted
Jan 22 03:56:23 shooter kernel: [20644.698247] EXT4-fs error (device sdb1): ext4_find_entry: inode #83492865: (comm wget) reading directory lblock 0
Jan 22 03:56:23 shooter kernel: [20645.134038] usb 1-3: new high speed USB device using ehci_hcd and address 6
Jan 22 03:56:23 shooter kernel: [20645.249824] usb 1-3: New USB device found, idVendor=152d, idProduct=2336
Jan 22 03:56:23 shooter kernel: [20645.249828] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=5
Jan 22 03:56:23 shooter kernel: [20645.249831] usb 1-3: Product: JM20336 SATA, USB Combo
Jan 22 03:56:23 shooter kernel: [20645.249834] usb 1-3: Manufacturer: JMicron
Jan 22 03:56:23 shooter kernel: [20645.249835] usb 1-3: SerialNumber: 2171B9888888
Jan 22 03:56:23 shooter kernel: [20645.252058] scsi6 : usb-storage 1-3:1.0
Jan 22 03:56:24 shooter kernel: [20646.253350] scsi 6:0:0:0: Direct-Access     Hitachi  HDS722020ALA330       PQ: 0 ANSI: 2 CCS
Jan 22 03:56:24 shooter kernel: [20646.255257] sd 6:0:0:0: Attached scsi generic sg2 type 0
Jan 22 03:56:24 shooter kernel: [20646.255999] sd 6:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Jan 22 03:56:24 shooter kernel: [20646.257770] sd 6:0:0:0: [sdc] Write Protect is off
Jan 22 03:56:24 shooter kernel: [20646.257786] sd 6:0:0:0: [sdc] Assuming drive cache: write through
Jan 22 03:56:24 shooter kernel: [20646.265766] sd 6:0:0:0: [sdc] Assuming drive cache: write through
Jan 22 03:56:24 shooter kernel: [20646.265781]  sdc: sdc1
Jan 22 03:56:24 shooter kernel: [20646.285798] sd 6:0:0:0: [sdc] Assuming drive cache: write through
Jan 22 03:56:24 shooter kernel: [20646.285802] sd 6:0:0:0: [sdc] Attached SCSI disk
Jan 22 09:14:59 shooter kernel: [39761.148444] EXT4-fs error (device sdb1): ext4_discard_preallocations: Error loading buddy information for 2696
Jan 22 09:14:59 shooter kernel: [39761.148469] EXT4-fs error (device sdb1): ext4_read_block_bitmap: Cannot read block bitmap - block_group = 2419, block_bitmap = 79167491
Jan 22 09:14:59 shooter kernel: [39761.148473] EXT4-fs error (device sdb1): ext4_discard_preallocations: Error reading block bitmap for 2419
Jan 22 09:14:59 shooter kernel: [39761.148499] EXT4-fs error (device sdb1): ext4_read_block_bitmap: Cannot read block bitmap - block_group = 9494, block_bitmap = 310902790
Jan 22 09:14:59 shooter kernel: [39761.148503] EXT4-fs error (device sdb1): ext4_discard_preallocations: Error reading block bitmap for 9494
2 x TB hdds in JBOD in an ez-dock2 kingwin device (model zed-2536) connected via USB.

* Try another usb port
* Try another usb-cable
* Try a different usb controller (probably not easy)

For example, my M4N72-E with my Thermaltake Soprano, the lateral/external
usb ports suffer from ocasional usb disconnects when transfering large (20+GB of data). But i am able to mitigate it by using a double cable (1 port in the device, mini usb & two normal usb for certain devices only). But this also applies to devices that does have their own power supply! 
I my case, usb pen drives never disconnect, but usb enclosures like
passports & nexstar 2.5 & 3.5 with Power supply do.
 
For the size of your disks, an esata-type (or dual usb/esata like mine) enclosure is highly recommended.

You must think it as an usb related issue. (i do asume that the disks works ok, if used in a normal sata port/controller). 

Are you using an external (chassis) usb port or an internal one ?
If you access both drives, what happens?

Be advised that if you continues to experience the issue, sooner or later you will experience data loss, if you are unlucky!

Try to test the drives without mounting the filesystems, for example read-only tests.

Comment 2 George R. Goffe 2011-02-14 12:57:08 UTC
Reartes,

Thank you for your suggestions. I have tried some of them. No luck yet unless you count bad luck.

George...


 Try another usb port

grg> I did, still having failures

* Try another usb-cable

grg> I'm not sure KingWin sells these separately... This is a recent purchase... within the past 2 months.

* Try a different usb controller (probably not easy)

grg> Lenovo T61p: when viewed from the front, has a cluster of 4 usb ports on the back, two on the right side, one on the left. Tried a different port, same problem.

For example, my M4N72-E with my Thermaltake Soprano, the lateral/external
usb ports suffer from ocasional usb disconnects when transfering large (20+GB
of data). But i am able to mitigate it by using a double cable (1 port in the
device, mini usb & two normal usb for certain devices only). But this also
applies to devices that does have their own power supply!
I my case, usb pen drives never disconnect, but usb enclosures like
passports & nexstar 2.5 & 3.5 with Power supply do.

For the size of your disks, an esata-type (or dual usb/esata like mine) enclosure is highly recommended.

You must think it as an usb related issue. (i do asume that the disks works ok, if used in a normal sata port/controller).

grg> I only have laptops, no internal sata adapter except for the system drive.

Are you using an external (chassis) usb port or an internal one ?

grg> external usb port, left hand side of the system when viewed from the front.

If you access both drives, what happens?

grg> Both drives heat up, even with a dual fan Vantec behind one of the drives. KingWin is a bad design because they didn't allow for cooling fan attachment.  With one 2TB and 1 100GB sata drive, things seem to work rather well but just experienced another failure (probably high I/O rate related) today with latest Fedora 14 kernel 2.6.35.11-83.fc14.i686

Be advised that if you continues to experience the issue, sooner or later you will experience data loss, if you are unlucky!

grg> Oh man! This doesn't happen as much as it did earlier... Newer kernels? I have run e2fsck -f on both disks after EVERY failure.

Try to test the drives without mounting the filesystems, for example read-only tests.

grg> This is a great idea... I could use "dd if=/dev/sdb of=/dev/null bs=4G" and run smartd to watch the drives. Sound good?

Comment 3 Reartes Guillermo 2011-02-14 23:01:02 UTC
The USB 2.0 bandwidth is 480mbps... not so great (now). 
I normally get 22 MB in transfers aprox. 

Maybe:

# dd if=/dev/sdb of=/dev/null bs=25M &>/dev/null &

This way you can launch more than one dd very easy and
manage it with jobs and kill %

# iostat -kx 5

Then look for %util, the check if it disconnects when at 100% or
if it is not under load.

Regarding the usb problem:

* you should definitively change the description of the bug.
If the driver is not plugged to a sata port, then it is not 
a sata device for the kernel but an USB Mass Storage Device...

As a candidate:
"External USB 2.0 conected dual SATA drive enclosure periodically 
disconnects & fails"

* You may tri to find a powered usb 2.0 hub. I don't have any &
  never used one, but it may work.

* Try an ordinary USB 2.0 passive hub, just for testing purposes.

* If you are using only laptops, i would check with and without AC
? it is the same result using the laptop battery than just using the
AC power (and also using AC without any battery [if it is possible] )?

* Don't forget to check if it is there any new firmware for the laptop(s) you are using.

And i think i don't know any more thing to try, seems chipset electrically, but
who knows, it may be a bug... 

I also used to have USB serial port disconnects with an MSI Wind, /dev/ttyUSB0 became /dev/ttyUSB1 and /dev/ttyUSB0 became broken (special state). Also very rarely with my eee 1005HA 3g modem... I never solved the mystery nor the reason of them. 

Upload the output of:

* After booting:
# lsusb -vvv > lsusb-vvv.out

* Then plugg the thing
# lsusb -vvv > lsusb-vvv_with_ezd.out


And maybe the pci info:

# lspci -vvv > lspci-vvv.out

Comment 4 Reartes Guillermo 2011-02-14 23:10:40 UTC
Check the vaule of 
# cat /sys/module/usbcore/parameters/autosuspend

Mine is 2 (Desktop)

* You may also try to disable it & test

# echo "0" > /sys/module/usbcore/parameters/autosuspend

Comment 5 Reartes Guillermo 2011-02-14 23:43:51 UTC
This entry can be found in 2.6.37 downloaded from kernel.org in the file:

/media/other-root/usr/src/linux-2.6.37/drivers/usb/storage/unusual_devs.h

There is an entry that may be of interest:

/* Reported by Alexandre Oliva <oliva.unicamp.br>
 * JMicron responds to USN and several other SCSI ioctls with a
 * residue that causes subsequent I/O requests to fail.  */
UNUSUAL_DEV(  0x152d, 0x2329, 0x0100, 0x0100,
                "JMicron",
                "USB to ATA/ATAPI Bridge",
                USB_SC_DEVICE, USB_PR_DEVICE, NULL,
                US_FL_IGNORE_RESIDUE | US_FL_SANE_SENSE ),

This also applies to 2.6.33.4 from kernel.org.

? does this device match yours (or at least, is it similar?) ?


Check the value of:

# cat /sys/module/usb_storage/parameters/quirks

before and after plugging the thing. I don't really know if the quirks are
applied automatically or if they need to be manually enabled.

Comment 6 Yun-Fong Loh 2011-07-21 16:17:24 UTC
Might this be a duplicate of:

Bug 663186?

Comment 7 George R. Goffe 2011-07-22 01:38:55 UTC
Yum-Fong,

It does look like a dup.

How did you install the patch you reference in this other bug? I'm using FC 14 by the way, with their latest Kernel.

Regards and THANKS,

George...

Comment 8 Josh Boyer 2011-09-06 18:10:24 UTC

*** This bug has been marked as a duplicate of bug 663186 ***


Note You need to log in before you can comment on or make changes to this bug.