Bug 515881
Summary: | SMART probing causes USB resets on some controllers | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Martin Pitt <mpitt> | ||||
Component: | libatasmart | Assignee: | Lennart Poettering <lpoetter> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | rawhide | CC: | lpoetter, mclasen, rhbz, rhel, webmaster | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-09-29 03:46:30 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Martin Pitt
2009-08-06 08:05:27 UTC
I updated to 0.14 (thanks for the release), and it fixed the issue for one reporter. I asked the other reporters to test this, and give me the skdump output if it still happens. I'll be on holiday the next two weeks, I'll report back then. Alternatively, you can just take a look at the linked LP bug for replies. Thanks! Hmm libatamsart .14 breaks ABI/API, how did you manage to get dk-disks working on it so quickly? The API changes were pretty small (essentially "good" -> "good_now"), I updated our devicekit-disks package to build against it and sent the patch to https://bugs.freedesktop.org/show_bug.cgi?id=23191 (I'm still waiting to get into the devicekit group to be able to commit such trivialities myself). Hmm, no. This is not a triviality. Updating just by "compile testing" is not enough here. The overall status enum got two new values. This needs to be handled. A patch taht does not more than just making dk-disks compile again is *not* enough. Well, I checked the 0.13 -> 0.14 diff of atasmart.h, and noticed the new SkSmartOverall attributes. But dk-disks wasn't even checking the old status codes, and I get proper smart data in devkit-disks --dump, so I figured for the purpose of testing this bug and collecting more data it would be better to get the new version in ASAP. If devkit-disks is supposed to handle more smart states, we should totally do that of course. Status update on this ? I believe dkd-006 has been ported to the latest atasmart, right ? All fixed ? (In reply to comment #6) > Status update on this ? I believe dkd-006 has been ported to the latest > atasmart, right? Right, that patch was (more or less) applied upstream. > All fixed? Unfortunately not. Even with libatasmart 0.14 and dk-disks 006, many people still report "error -71" drive disconnects with the ata probing udev rules (all of them report that moving it away helps). Ulrich Beyer provided a dk-disks dump and an skdump strace in both standard and jmicron mode: http://launchpadlibrarian.net/30815202/skdump.trace . Standard mode fails with "invalid argument", jmicron mode prints "Failed to open disk jmicron:/dev/sdb: Success". Scott Zawalski did not provide an strace unfortunately, but reports a similar result: ---------- sudo skdump /dev/sdf Device: jmicron:/dev/sdf Type: JMicron SCSI ATA Passthru Size: 715404 MiB Model: [ST3750640NS] Serial: [3QD136WL] Firmware: [3.CNQ] SMART Available: yes Quirks: Awake: Invalid argument SMART Disk Health Good: Invalid argument ---------- Another response from a different user: Device: jmicron:/dev/sdb ize: 953869 MiB Model: [WDC WD10EAVS-00D7B0] Serial: [WD-WCAU42028877] Firmware: [01.01A01] SMART Available: yes Quirks: Awake: No such device SMART Disk Health Good: No such device Failed to dump disk data: No such device Device: jmicron:/dev/sdb Type: JMicron SCSI ATA Passthru Size: 953869 MiB Model: [WDC WD10EAVS-00D7B0] Serial: [WD-WCAU42028877] Firmware: [01.01A01] SMART Available: yes Quirks: Awake: No such device SMART Disk Health Good: No such device Failed to dump disk data: No such device Type: JMicron SCSI ATA Passthru Size: 953869 MiB Model: [WDC WD10EAVS-00D7B0] Serial: [WD-WCAU42028877] Firmware: [01.01A01] SMART Available: yes Quirks: Awake: Invalid argument SMART Disk Health Good: Invalid argument Another set of straces from Stefan Ebner, to get some variance: standard mode: http://launchpadlibrarian.net/30814259/trace jmicron:/dev/sdb: http://launchpadlibrarian.net/30814523/jmicron This is a bug so far specific to Ubuntu, so as long as nobody manages to reproduce this on Fedora I guess I am not too concerned right now, to fix this. In which Fedora release can I find the exact version of this package? (In reply to comment #11) > In which Fedora release can I find the exact version of this package? libatasmart 0.14 is available in fedora rawhide. I can confirm this bug on Rawhide, attached is my dmesg.log FE12 will not automount the disk due to policy SELinux policykit violations, but they can be mounted manually. The exact same behaviour as on Ubuntu, after running skdump /dev/sdb the resets are back, need to remove the disk and re-attach it before it works again. Created attachment 359472 [details]
dmesg.log with the reset errors
BTW, the policy violation will only occur when you do this: sudo mv /lib/udev/rules.d/95-devkit-disks.rules{,.disabled} devkit-disks --dump # this will make sure that the daemon is running Lennart, any update on this ? No, not really. Unless someone has the hardware in question and wants to fix this for good I think the only option is to blacklist this device for SMART. DK-disks won't be able to use the SMART data of these USB bridges then. Could someone please do the following for me: Check the results of the following commands on the hdds in question. I'd assume that they either trigger the reset problem or won't produce any SMART data. skdump sat16:/dev/sdXXX skdump sat12:/dev/sdXXX skdump sunplus:/dev/sdXXX skdump jmicron:/dev/sdXXX Then, please include the lsusb -v data for this drive, so that I know what to check against in the blacklist. When I implemented the jmicron support on a borrowed bridge things worked quite well. If it doesn't for some folks then I guess the only option we have is disable it for all. Which is a pity... I can run those skdump commands for you, but do you want me to do this with Ubuntu or Fedora Rawhide? Installing Fedora is not the biggest of problems, but I need to wipe everything of my disk since the Fedora 11 installer cannot deal with my extended partitions. And restoring takes a but longer for me, since clonezilla has some problems with the extended partitions as well (able to work around it). So I'd prefer running these tests on Ubuntu. I can finish those today, otherwise expect results EOW. libatasmart .14 on the rawhide kernel would be best. skdump sat16:/dev/sdXXX - "ATA SMART not supported" skdump sat12:/dev/sdXXX - "ATA SMART not supported" skdump sunplus:/dev/sdXXX - "ATA SMART not supported" skdump jmicron:/dev/sdXXX - HDD reset lsusb Bus 001 Device 013: ID 152d:2329 JMicron Technology Corp. / JMicron USA Technology Corp. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x152d JMicron Technology Corp. / JMicron USA Technology Corp. idProduct 0x2329 bcdDevice 0.00 iManufacturer 1 JMicron iProduct 11 StoreJet Transcend iSerial 5 8F25FFFFFFFF bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 32 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 4 USB Mass Storage bmAttributes 0xc0 Self Powered MaxPower 2mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk (Zip) iInterface 6 MSC Bulk-Only Transfer Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x5301 Self Powered Thanks, that should be all I need! Hmm, libatasmart currently uses jmicron access mode for usb devices 152d:2329, :2336, :2338, :2339. #20 suggests that :2329 should be blacklisted. I wonder what about the other three USB bridges, is there any report of things not working for those? Martin, do yo know if the original ubuntu report was mentioning and other product ids than 2329? I checked all the comments and duplicate bugs, and I saw several other occurrendes of 152d:2329. I did not see any of the other JMicron product IDs. However, there are a fair number of reporters who didn't submit lsusb -v, just the standard "lsusb" output which comes by default through apport-reported bugs. The original reporter of that bug had Bus 001 Device 017: ID 413c:2002 Dell Computer Corp. SK-8125 Keyboard Bus 001 Device 016: ID 0409:005a NEC Corp. HighSpeed Hub Bus 001 Device 015: ID 413c:1002 Dell Computer Corp. Keyboard Hub Bus 001 Device 014: ID 046d:c051 Logitech, Inc. G3 (MX518) Optical Mouse Bus 001 Device 013: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub which doesn't mention JMicron anywhere. However, I'll ask the folks to report a new bug if they still have the problem after blacklisting the one from above, and include lsusb -v output. Some people use this as a workaround: ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", GOTO="check_sat_end" Is the blacklisting in libatasmart itself, or should it actually go into the rules? Thanks! Martin (In reply to comment #23) > I checked all the comments and duplicate bugs, and I saw several other > occurrendes of 152d:2329. I did not see any of the other JMicron product IDs. Ok, I'll then blacklist only this one for now. We can blacklist more later on. > ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", GOTO="check_sat_end" > > Is the blacklisting in libatasmart itself, or should it actually go into the > rules? I think the blacklist should be maintained in libatasmart proper. That said I added an easy way to override the access method detection/blacklist from udev rules: ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", ENV{ID_ATA_SMART_ACCESS}=none This will tell libatasmart not to try SMART at all. The value can be any of the access method names: sat12, sat16, linux-ide, sunplus, jmicron, none, auto. The idea is that quick blacklisting (or registration of a new usb vid/pid for a specific access mode) can happen via ID_ATA_SMART_ACCESS, without having to wait for a new upstream release. However, sooner or later the blacklist should be merged into libatasmart proper and dropped from the udev file. For a packager this means if you find a bridge that needs a special access method or blacklisting: 1) file a bug upstream to inform me, 2) for quickly updating your .debs/.rpms simply add an udev rule using ID_ATA_SMART_ACCESS, 3) as soon as this was merged into the C code upstream and you update your packages with upstream, drop it from the udev rules again. Patch is here: http://git.0pointer.de/?p=libatasmart.git;a=patch;h=6f71fbec9e4488aab2fb92cd3e0a0b2445f83237 Hmm, since this is the first blacklisting and I don't have the hw I'd like to see this tested before i roll a new release. Would be awesome if someone could test this who has the hw! (In reply to comment #24) > I think the blacklist should be maintained in libatasmart proper. That said I > added an easy way to override the access method detection/blacklist from udev > rules: > > ATTRS{idVendor}=="152d", ATTRS{idProduct}=="2329", > ENV{ID_ATA_SMART_ACCESS}=none This is not a new feature btw, It's supported in .14 already. The Ubuntu folks have tested this and it seems to work, so I have now released 0.15 and uploaded it to rawhide. Thanks for the new release! I just got a followp that the 152d:2339 model causes USB resets as well and needs to be blacklisted, too. (In reply to comment #28) > Thanks for the new release! > > I just got a followp that the 152d:2339 model causes USB resets as well and > needs to be blacklisted, too. Grmbl. Btw, I actually would like to know if smartmonutil's jmicron access implementation actually works on those drives. Anyone wants to test this for me? I tested with the latest svn version. rob@rob-laptop:~/smartmontools$ sudo smartctl --all -d usbjmicron /dev/sdb smartctl 5.39 [i686-pc-linux-gnu] (local build) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl open device: /dev/sdb [USB JMicron] failed: scsi error unsupported scsi opcode Sigh, they keep coming: "Here is another JMicron-chip that gives the symtoms listed above and which are fixed by disabling devicekit-disks: Bus 001 Device 005: ID 152d:2338 JMicron Technology Corp. / JMicron USA Technology Corp. JM20337 Hi-Speed USB to SATA & PATA Combo Bridge " So given that we now have confirmation that we need to blacklist three of the four JMicron devices, we should start to raise bets when someone comes along to see the fourth one failing as well.. :-( I am curently playing around with Kay's 152d:2336 HDD which works just fine. So, for :2338, :2339, :2329 we have reports that things don't work. For :2336 I myself can report that things do work. Let's hope we don't get a report about broken :2336 too. Because that would then mean that there are both good and bad bridges available under the same vid/pid. That would be a pity. I now rolled a new tarball with the other two bridges blacklisted, only leaving 152d:2336 as good for the jmicron access mode. Let's hope we don't get negative feedback on that one too... 0.16-1 is now building in koji for rawhide. (In reply to comment #30) > (In reply to comment #29) >> Btw, I actually would like to know if smartmonutil's jmicron access >> implementation actually works on those drives. Anyone wants to test this >> for me? > > I tested with the latest svn version. > > rob@rob-laptop:~/smartmontools$ sudo smartctl --all -d usbjmicron /dev/sdb > smartctl 5.39 [i686-pc-linux-gnu] (local build) > Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net > > Smartctl open device: /dev/sdb [USB JMicron] failed: scsi error unsupported > scsi opcode smartmontools' own wiki <http://sourceforge.net/apps/trac/smartmontools/wiki/Supported_USB-Devices> shows users are having some success with these devices. I can confirm for device ID 2329 (only have Windows available at the moment, will try again on a Linux machine if I get the chance): C:\> smartctl -d usbjmicron -c -l xerror /dev/sdb smartctl 5.39 2009-12-09 r2995 [i686-pc-mingw32-win7(64)] (sf-win32-5.39-1) Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 426) seconds. Offline data collection capabilities: (0x53) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 111) minutes. SCT capabilities: (0x0035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not supported Read GP Log Directory failed. SMART Extended Comprehensive Error Log (GP Log 0x03) not supported Try '-l [xerror,]error' to read traditional SMART Error Log |