Bug 515881

Summary: SMART probing causes USB resets on some controllers
Product: [Fedora] Fedora Reporter: Martin Pitt <mpitt>
Component: libatasmartAssignee: Lennart Poettering <lpoetter>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: lpoetter, mclasen, rhbz, rhel, webmaster
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-29 03:46:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg.log with the reset errors none

Description Martin Pitt 2009-08-06 08:05:27 UTC
Description of problem:

Calling devkit-disks-probe-ata-smart in the udev rules causes tons of USB resets with certain controllers:

[ 391.272447] sd 9:0:0:0: [sdf] Attached SCSI disk
[ 399.100023] usb 1-4: reset high speed USB device using ehci_hcd and address 8
[ 399.224032] usb 1-4: device descriptor read/64, error -71
[ 399.452025] usb 1-4: device descriptor read/64, error -71
[ 399.668022] usb 1-4: reset high speed USB device using ehci_hcd and address 8
[ 399.792023] usb 1-4: device descriptor read/64, error -71
[ 400.020026] usb 1-4: device descriptor read/64, error -71
[ 400.236028] usb 1-4: reset high speed USB device using ehci_hcd and address 8
[ 400.652029] usb 1-4: device not accepting address 8, error -71
[ 400.764022] usb 1-4: reset high speed USB device using ehci_hcd and address 8
[ 401.176019] usb 1-4: device not accepting address 8, error -71
[ 401.176058] usb 1-4: USB disconnect, address 8
[ 401.176995] scsi 9:0:0:0: Device offlined - not ready after error recovery
[ 401.288015] usb 1-4: new high speed USB device using ehci_hcd and address 9
[ 401.417078] usb 1-4: device descriptor read/64, error -71

This happens on sk_disk_open() on the device. Background:

<mezcalero> if we encounter an USB disk we try to find out if it speaks SAT
<mezcalero> we do that the hard way:
<mezcalero> by sending it some commands
<mezcalero> some shitty bridges tend to hang themselves up due to that
<mezcalero> or reset themselves
<mezcalero> there is not much we can do about that
<mezcalero> we had similar reports regarding usb sticks
<pitti> reportedly the drives pretty much stop working entirely, though
<mezcalero> because an usb stick does not look any different from a HDD drive to libatasmart
<mezcalero> so if you look into the udev rules of dk-disks
<mezcalero> there is now a check based on the disk *size* which disables ata smart
<mezcalero> this catches most of the broken usb sticks
<mezcalero> however, of coruse bridges for HDDs are not covered
<mezcalero> and we need to blacklist them all seperetely
<pitti> mezcalero: interestingly, though, the calls eventually succeed and report that the drive is smart capabl
<mezcalero> pitti: hmm, so yes, i am suggesting you add blacklist rules for this

In the further discussion Lennart said that the blacklisting should happen in libatasmart, not in the devkit rules.

The sysfs attributes of one affected devices are in http://launchpadlibrarian.net/29951175/udevadm_info (an --attribute-walk on the drive).

# skdump /dev/sdb
Device: /dev/sdb
Type: JMicron SCSI ATA Passthru
Size: 152627 MiB
Model: [FUJITSU MHZ2160BH G2]
Serial: [K62TT8A27KMF]
Firmware: [00000009]
SMART Available: yes
Quirks:
Awake: Invalid argument
SMART Disk Health Good: Invalid argument
Failed to dump disk data: Invalid argument

Above command also causes the resets.

# skdump jmicron:/dev/sdb
Failed to open disk jmicron:/dev/sdb: No such file or directory

What kind of information do you need for blacklisting those in libatasmart? 

The original bug had some more people who are also affected by this. I'll ask them for skdumps as well.

Version-Release number of selected component (if applicable):
0.13

Comment 1 Martin Pitt 2009-08-07 10:07:38 UTC
I updated to 0.14 (thanks for the release), and it fixed the issue for one reporter. I asked the other reporters to test this, and give me the skdump output if it still happens.

I'll be on holiday the next two weeks, I'll report back then. Alternatively, you can just take a look at the linked LP bug for replies.

Thanks!

Comment 2 Lennart Poettering 2009-08-07 16:19:55 UTC
Hmm libatamsart .14 breaks ABI/API, how did you manage to get dk-disks working on it so quickly?

Comment 3 Martin Pitt 2009-08-07 17:03:55 UTC
The API changes were pretty small (essentially "good" -> "good_now"), I updated our devicekit-disks package to build against it and sent the patch to https://bugs.freedesktop.org/show_bug.cgi?id=23191 (I'm still waiting to get into the devicekit group to be able to commit such trivialities myself).

Comment 4 Lennart Poettering 2009-08-07 17:08:22 UTC
Hmm, no. This is not a triviality. Updating just by "compile testing" is not enough here. The overall status enum got two new values. This needs to be handled.

A patch taht does not more than just making dk-disks compile again is *not* enough.

Comment 5 Martin Pitt 2009-08-07 20:43:11 UTC
Well, I checked the 0.13 -> 0.14 diff of atasmart.h, and noticed the new SkSmartOverall attributes. But dk-disks wasn't even checking the old status codes, and I get proper smart data in devkit-disks --dump, so I figured for the purpose of testing this bug and collecting more data it would be better to get the new version in ASAP.

If devkit-disks is supposed to handle more smart states, we should totally do that of course.

Comment 6 Matthias Clasen 2009-08-29 17:55:34 UTC
Status update on this ? I believe dkd-006 has been ported to the latest atasmart, right ? All fixed ?

Comment 7 Martin Pitt 2009-08-30 19:44:49 UTC
(In reply to comment #6)
> Status update on this ? I believe dkd-006 has been ported to the latest
> atasmart, right?

Right, that patch was (more or less) applied upstream.

> All fixed?

Unfortunately not. Even with libatasmart 0.14 and dk-disks 006, many people still report "error -71" drive disconnects with the ata probing udev rules (all of them report that moving it away helps).

Ulrich Beyer provided a dk-disks dump and an skdump strace in both standard and jmicron mode: http://launchpadlibrarian.net/30815202/skdump.trace . Standard mode fails with "invalid argument", jmicron mode prints "Failed to open disk jmicron:/dev/sdb: Success".

Scott Zawalski did not provide an strace unfortunately, but reports a similar result:
----------
sudo skdump /dev/sdf

Device: jmicron:/dev/sdf
Type: JMicron SCSI ATA Passthru
Size: 715404 MiB
Model: [ST3750640NS]
Serial: [3QD136WL]
Firmware: [3.CNQ]
SMART Available: yes
Quirks:
Awake: Invalid argument
SMART Disk Health Good: Invalid argument
----------

Comment 8 Martin Pitt 2009-08-31 06:42:02 UTC
Another response from a different user:

Device: jmicron:/dev/sdb
ize: 953869 MiB
Model: [WDC WD10EAVS-00D7B0]
Serial: [WD-WCAU42028877]
Firmware: [01.01A01]
SMART Available: yes
Quirks:
Awake: No such device
SMART Disk Health Good: No such device
Failed to dump disk data: No such device

Device: jmicron:/dev/sdb
Type: JMicron SCSI ATA Passthru
Size: 953869 MiB
Model: [WDC WD10EAVS-00D7B0]
Serial: [WD-WCAU42028877]
Firmware: [01.01A01]
SMART Available: yes
Quirks:
Awake: No such device
SMART Disk Health Good: No such device
Failed to dump disk data: No such device

Type: JMicron SCSI ATA Passthru
Size: 953869 MiB
Model: [WDC WD10EAVS-00D7B0]
Serial: [WD-WCAU42028877]
Firmware: [01.01A01]
SMART Available: yes
Quirks:
Awake: Invalid argument
SMART Disk Health Good: Invalid argument

Comment 9 Martin Pitt 2009-08-31 06:48:39 UTC
Another set of straces from Stefan Ebner, to get some variance:

standard mode: http://launchpadlibrarian.net/30814259/trace
jmicron:/dev/sdb: http://launchpadlibrarian.net/30814523/jmicron

Comment 10 Lennart Poettering 2009-08-31 17:53:45 UTC
This is a bug so far specific to Ubuntu, so as long as nobody manages to reproduce this on Fedora I guess I am not too concerned right now, to fix this.

Comment 11 Wesley Schwengle 2009-08-31 21:36:04 UTC
In which Fedora release can I find the exact version of this package?

Comment 12 Lennart Poettering 2009-09-01 15:21:50 UTC
(In reply to comment #11)
> In which Fedora release can I find the exact version of this package?  

libatasmart 0.14 is available in fedora rawhide.

Comment 13 Wesley Schwengle 2009-09-02 06:31:27 UTC
I can confirm this bug on Rawhide, attached is my dmesg.log

FE12 will not automount the disk due to policy SELinux policykit violations, but they can be mounted manually.

The exact same behaviour as on Ubuntu, after running skdump /dev/sdb the resets are back, need to remove the disk and re-attach it before it works again.

Comment 14 Wesley Schwengle 2009-09-02 06:32:08 UTC
Created attachment 359472 [details]
dmesg.log with the reset errors

Comment 15 Wesley Schwengle 2009-09-02 06:34:47 UTC
BTW, the policy violation will only occur when you do this:

sudo mv /lib/udev/rules.d/95-devkit-disks.rules{,.disabled}
devkit-disks --dump # this will make sure that the daemon is running

Comment 16 Matthias Clasen 2009-09-14 03:50:27 UTC
Lennart, any update on this ?

Comment 17 Lennart Poettering 2009-09-15 01:10:43 UTC
No, not really.

Unless someone has the hardware in question and wants to fix this for good I think the only option is to blacklist this device for SMART. DK-disks won't be able to use the SMART data of these USB bridges then.

Could someone please do the following for me: 

Check the results of the following commands on the hdds in question. I'd assume that they either trigger the reset problem or won't produce any SMART data.

skdump sat16:/dev/sdXXX
skdump sat12:/dev/sdXXX
skdump sunplus:/dev/sdXXX
skdump jmicron:/dev/sdXXX

Then, please include the lsusb -v data for this drive, so that I know what to check against in the blacklist.

When I implemented the jmicron support on a borrowed bridge things worked quite well. If it doesn't for some folks then I guess the only option we have is disable it for all. Which is a pity...

Comment 18 Wesley Schwengle 2009-09-15 11:47:09 UTC
I can run those skdump commands for you, but do you want me to do this with Ubuntu or Fedora Rawhide? 

Installing Fedora is not the biggest of problems, but I need to wipe everything of my disk since the Fedora 11 installer cannot deal with my extended partitions. And restoring takes a but longer for me, since clonezilla has some problems with the extended partitions as well (able to work around it). So I'd prefer running these tests on Ubuntu. I can finish those today, otherwise expect results EOW.

Comment 19 Lennart Poettering 2009-09-15 22:56:22 UTC
libatasmart .14 on the rawhide kernel would be best.

Comment 20 Robert Willert 2009-09-15 23:23:36 UTC
skdump sat16:/dev/sdXXX - "ATA SMART not supported"
skdump sat12:/dev/sdXXX - "ATA SMART not supported"
skdump sunplus:/dev/sdXXX - "ATA SMART not supported"
skdump jmicron:/dev/sdXXX - HDD reset

lsusb

Bus 001 Device 013: ID 152d:2329 JMicron Technology Corp. / JMicron USA Technology Corp. 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x152d JMicron Technology Corp. / JMicron USA Technology Corp.
  idProduct          0x2329 
  bcdDevice            0.00
  iManufacturer           1 JMicron
  iProduct               11 StoreJet Transcend
  iSerial                 5 8F25FFFFFFFF
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           32
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          4 USB Mass Storage
    bmAttributes         0xc0
      Self Powered
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         8 Mass Storage
      bInterfaceSubClass      6 SCSI
      bInterfaceProtocol     80 Bulk (Zip)
      iInterface              6 MSC Bulk-Only Transfer
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x5301
  Self Powered

Comment 21 Lennart Poettering 2009-09-17 02:13:36 UTC
Thanks, that should be all I need!

Comment 22 Lennart Poettering 2009-09-17 02:19:37 UTC
Hmm, libatasmart currently uses jmicron access mode for usb devices 152d:2329, :2336, :2338, :2339. #20 suggests that :2329 should be blacklisted. 

I wonder what about the other three USB bridges, is there any report of things not working for those? Martin, do yo know if the original ubuntu report was mentioning and other product ids than 2329?

Comment 23 Martin Pitt 2009-09-17 08:49:56 UTC
I checked all the comments and duplicate bugs, and I saw several other occurrendes of 152d:2329. I did not see any of the other JMicron product IDs.

However, there are a fair number of reporters who didn't submit lsusb -v, just the standard "lsusb" output which comes by default through apport-reported bugs. The original reporter of that bug had

Bus 001 Device 017: ID 413c:2002 Dell Computer Corp. SK-8125 Keyboard
Bus 001 Device 016: ID 0409:005a NEC Corp. HighSpeed Hub
Bus 001 Device 015: ID 413c:1002 Dell Computer Corp. Keyboard Hub
Bus 001 Device 014: ID 046d:c051 Logitech, Inc. G3 (MX518) Optical Mouse
Bus 001 Device 013: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

which doesn't mention JMicron anywhere.

However, I'll ask the folks to report a new bug if they still have the problem after blacklisting the one from above, and include lsusb -v output.

Some people use this as a workaround:

  ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", GOTO="check_sat_end"

Is the blacklisting in libatasmart itself, or should it actually go into the rules?

Thanks! Martin

Comment 24 Lennart Poettering 2009-09-17 20:52:08 UTC
(In reply to comment #23)
> I checked all the comments and duplicate bugs, and I saw several other
> occurrendes of 152d:2329. I did not see any of the other JMicron product IDs.

Ok, I'll then blacklist only this one for now. We can blacklist more later on.
 
>   ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", GOTO="check_sat_end"
> 
> Is the blacklisting in libatasmart itself, or should it actually go into the
> rules?

I think the blacklist should be maintained in libatasmart proper. That said I added an easy way to override the access method detection/blacklist from udev rules:

ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", ENV{ID_ATA_SMART_ACCESS}=none

This will tell libatasmart not to try SMART at all. The value can be any of the access method names: sat12, sat16, linux-ide, sunplus, jmicron, none, auto.

The idea is that quick blacklisting (or registration of a new usb vid/pid for a specific access mode) can happen via ID_ATA_SMART_ACCESS, without having to wait for a new upstream release. However, sooner or later the blacklist should be merged into libatasmart proper and dropped from the udev file.

For a packager this means if you find a bridge that needs a special access method or blacklisting: 1) file a bug upstream to inform me, 2) for quickly updating your .debs/.rpms simply add an udev rule using ID_ATA_SMART_ACCESS, 3) as soon as this was merged into the C code upstream and you update your packages with upstream, drop it from the udev rules again.

Comment 25 Lennart Poettering 2009-09-17 20:57:49 UTC
Patch is here: http://git.0pointer.de/?p=libatasmart.git;a=patch;h=6f71fbec9e4488aab2fb92cd3e0a0b2445f83237

Hmm, since this is the first blacklisting and I don't have the hw I'd like to see this tested before i roll a new release. Would be awesome if someone could test this who has the hw!

Comment 26 Lennart Poettering 2009-09-17 21:00:54 UTC
(In reply to comment #24)

> I think the blacklist should be maintained in libatasmart proper. That said I
> added an easy way to override the access method detection/blacklist from udev
> rules:
> 
> ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329",
> ENV{ID_ATA_SMART_ACCESS}=none

This is not a new feature btw, It's supported in .14 already.

Comment 27 Lennart Poettering 2009-09-18 03:40:22 UTC
The Ubuntu folks have tested this and it seems to work, so I have now released 0.15 and uploaded it to rawhide.

Comment 28 Martin Pitt 2009-09-19 14:39:00 UTC
Thanks for the new release!

I just got a followp that the 152d:2339 model causes USB resets as well and needs to be blacklisted, too.

Comment 29 Lennart Poettering 2009-09-19 15:13:00 UTC
(In reply to comment #28)
> Thanks for the new release!
> 
> I just got a followp that the 152d:2339 model causes USB resets as well and
> needs to be blacklisted, too.  

Grmbl.

Btw, I actually would like to know if smartmonutil's jmicron access implementation actually works on those drives. Anyone wants to test this for me?

Comment 30 Robert Willert 2009-09-19 16:41:57 UTC
I tested with the latest svn version.

rob@rob-laptop:~/smartmontools$ sudo smartctl --all -d usbjmicron /dev/sdb
smartctl 5.39 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl open device: /dev/sdb [USB JMicron] failed: scsi error unsupported scsi opcode

Comment 31 Martin Pitt 2009-09-24 09:13:25 UTC
Sigh, they keep coming:

  "Here is another JMicron-chip that gives the symtoms listed above and which are fixed by disabling devicekit-disks:

Bus 001 Device 005: ID 152d:2338 JMicron Technology Corp. / JMicron USA Technology Corp. JM20337 Hi-Speed USB to SATA & PATA Combo Bridge

"

Comment 32 Martin Pitt 2009-09-24 09:15:19 UTC
So given that we now have confirmation that we need to blacklist three of the four JMicron devices, we should start to raise bets when someone comes along to see the fourth one failing as well.. :-(

Comment 33 Lennart Poettering 2009-09-25 21:29:11 UTC
I am curently playing around with Kay's 152d:2336 HDD which works just fine.

So, for :2338, :2339, :2329  we have reports that things don't work.

For :2336 I myself can report that things do work. Let's hope we don't get a report about broken :2336 too. Because that would then mean that there are both good and bad bridges available under the same vid/pid. That would be a pity.

Comment 34 Lennart Poettering 2009-09-29 03:46:30 UTC
I now rolled a new tarball with the other two bridges blacklisted, only leaving 152d:2336 as good for the jmicron access mode. Let's hope we don't get negative feedback on that one too...

0.16-1 is now building in koji for rawhide.

Comment 35 rhbz 2010-01-09 19:16:02 UTC
(In reply to comment #30)
> (In reply to comment #29)
>> Btw, I actually would like to know if smartmonutil's jmicron access
>> implementation actually works on those drives. Anyone wants to test this
>> for me?
>
> I tested with the latest svn version.
>
> rob@rob-laptop:~/smartmontools$ sudo smartctl --all -d usbjmicron /dev/sdb
> smartctl 5.39 [i686-pc-linux-gnu] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
>
> Smartctl open device: /dev/sdb [USB JMicron] failed: scsi error unsupported
> scsi opcode
  
smartmontools' own wiki <http://sourceforge.net/apps/trac/smartmontools/wiki/Supported_USB-Devices>
shows users are having some success with these devices. I can confirm for
device ID 2329 (only have Windows available at the moment, will try again on a
Linux machine if I get the chance):


C:\> smartctl -d usbjmicron -c -l xerror /dev/sdb
smartctl 5.39 2009-12-09 r2995 [i686-pc-mingw32-win7(64)] (sf-win32-5.39-1)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 ( 426) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 111) minutes.
SCT capabilities: 	       (0x0035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not supported
Read GP Log Directory failed.

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
Try '-l [xerror,]error' to read traditional SMART Error Log