Bug 157444

Summary: USBDEVFS_CONTROL failed messages for usb device
Product: [Fedora] Fedora Reporter: Lars Damerow <lars>
Component: kernelAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED CANTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: davej, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-03 00:52:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The contents of /proc/bus/usb/devices on a fresh 2.6.8 boot.
none
The contents of /proc/bus/usb/devices on a fresh 2.6.11 boot.
none
dmesg from 2.6.8 in single-user mode
none
dmesg from 2.6.11 in single-user mode
none
Annotated usbmon trace
none
Ian's uhci dumps with a failing TD
none
The workaround (removal of extra descriptor fetches) none

Description Lars Damerow 2005-05-11 18:14:08 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.6) Gecko/20050329 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
When using a particular low-speed USB device (a color calibrator) a couple of our machines start dumping this to dmesg:

usb 2-2: usbfs: USBDEVFS_CONTROL failed cmd pcolor rqt 194 rq 1 len 8 ret -71
usb 2-2: usbfs: USBDEVFS_CONTROL failed cmd pcolor rqt 194 rq 1 len 8 ret -71
usb 2-2: usbfs: USBDEVFS_CONTROL failed cmd pcolor rqt 194 rq 1 len 8 ret -71

The calibration software then fails to see the device. The same device and software works on other machines running the same kernel.

I can see from rqt that it's failing when trying to read back from the device, but that's about all I can tell with my limited USB knowledge. Is there anything I can do to get more debugging information about this problem?

Unfortunately, this is the only problem stopping us from using 2.6.11 on all of our machines, and we need 2.6.11 to fix a host of other problems.

Thanks very much!

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.14_FC3smp

How reproducible:
Always

Steps to Reproduce:
1. plug in the usb device
2. try to access it with the calibration software.
  

Actual Results:  The calibration software fails to communicate with the device and USBDEVFS_CONTROL errors get dumped to dmesg.

Expected Results:  The calibration software should have connected with the device and accessed it without error.

Additional info:

Comment 1 Lars Damerow 2005-05-16 23:35:20 UTC
I just tested this with kernel 2.6.11-1.20_FC3smp and the problem persists. Any
help would be much appreciated.

Thanks,
lars

Comment 2 Pete Zaitcev 2005-05-17 07:02:06 UTC
The problem is not in the kernel. Some process is hitting the device.
Analyzing the process list (and things like cron tables) on different
machines ought to shed some light.

Kernel might be able to help, depending of the nature of the offending
process. If it's just some process reading from /proc/bus/usb/devices,
then I suggest trying a rawhide kernel. It implements caching of descriptors.

On the off chance that it's HCD, please attach a few examples of
/proc/bus/usb/devices from working and non-working boxes (please do
not drop them in the comments box).


Comment 3 Lars Damerow 2005-05-19 22:21:39 UTC
Thanks for the response! The reason that I suspect something about the kernel is
that I can boot one of these machines back and forth between kernel 2.6.8 and
kernel 2.6.11 with GRUB, and though the USB device works under 2.6.8, it doesn't
under 2.6.11.

I'll attach the contents of /proc/bus/usb/devices for 2.6.8 and 2.6.11. No
changes have been made to the system in between copies of the file except for
rebooting from 2.6.8 to 2.6.11.

The only interesting difference between the files is that, in 2.6.8, the
Manufacturer and Product strings are visible for the calibrator, while in 2.6.11
they aren't.

If there's any other information I can gather, please let me know.

Thanks!
-lars

Comment 4 Lars Damerow 2005-05-19 22:22:55 UTC
Created attachment 114597 [details]
The contents of /proc/bus/usb/devices on a fresh 2.6.8 boot.

Comment 5 Lars Damerow 2005-05-19 22:23:59 UTC
Created attachment 114598 [details]
The contents of /proc/bus/usb/devices on a fresh 2.6.11 boot.

Comment 6 Pete Zaitcev 2005-05-20 08:12:54 UTC
What is the precise version of the "2.6.8" which works?


Comment 7 Lars Damerow 2005-05-20 15:20:45 UTC
It's 2.6.8-1.521smp.

Comment 8 Pete Zaitcev 2005-05-31 22:27:21 UTC
Lars, I have a stupid question here... The 2.6.8-1.521 is an FC2 kernel.
So, how did it end in the picture? Is it a leftover from an upgrade to FC3?
Just verify for me that you did a full upgrade to FC3, please, because
those messages are caused by a userland process accessing USB devices.
I am wondering if you ended with a mixed package set somehow...
The easiest is to run "yum update", see what it offers to update.
You'll need current udev, hal, hotplug to go with current kernels
(and dependencies - kudzu, pcmcia-cs, etc., whatever yum suggests).

Did you try to identify the offending process? Yes, I understand that the
old kernel worked. The point is to find what kind of access is happening.


Comment 9 Pete Zaitcev 2005-05-31 22:44:03 UTC
To add to this, did an originally shipping FC3 kernel work (2.6.9-1.637)?
The distance between 2.6.8 and 2.6.11 is too great, it's hard for me to
pin the problem onto a particular change.


Comment 10 Lars Damerow 2005-05-31 23:29:20 UTC
Hi Pete,

The machines themselves are running FC2 with just enough updates from FC3 to run
FC3's update kernels. I'll bet you probably want to close the bug report on that
information alone, no doubt. :)

I tried booting the machine in single-user mode and looking at
/proc/bus/usb/devices for both kernels, going under the assumption that if the
calibration probe's information is correct there, we won't get the
USBDEVFS_CONTROL message when I run the calibration software.

Under 2.6.9-1.667, booted from the FC3 install disk in rescue mode, the probe's
information is correct. (I can't actually boot the machine with the shipping FC3
kernel because it doesn't work with our SATA controller, so it doesn't see our
hard drives.)

I tried booting the machine to single-user mode on both 2.6.8 and 2.6.11, and
looked at all the running processes. The only differences are that on 2.6.11,
khpsbpkt, knodemgrd_0, kthread, and minilogd are running. Then, when I run our
calibration software and ask it to list the probes available, it sees the probe
on 2.6.8 and fails on 2.6.11, printing the USBDEVFS_CONTROL message.

Also, as before, in /proc/bus/usb/devices, the probe's information is correct in
2.6.8 and not in 2.6.11. I think that the USBDEVFS_CONTROL message is just a
symptom of the kernel's incorrect initialization of this device, which worked in
2.6.9-1.667 and not in 2.6.11-1.14_FC3smp.

I hope some of that was useful!

cheers,
lars

Comment 11 Pete Zaitcev 2005-06-01 01:12:09 UTC
The 2.6.9-1.667 was a good call, but the diff is still too big for me to
crack the problem based on that. Let's approach it differently. Missing
string descriptors in those /proc/bus/usb/messages were a good clue.

Please attach two dmesg(8) output instances taken on normally running
systems with 2.6.8-1.521 and 2.6.11-14_FC3. I wish to see if probing fails
(it probably does), and how it fails.


Comment 12 Lars Damerow 2005-06-01 18:05:34 UTC
Here are the dmesg outputs.

Comment 13 Lars Damerow 2005-06-01 18:06:18 UTC
Created attachment 115051 [details]
dmesg from 2.6.8 in single-user mode

Comment 14 Lars Damerow 2005-06-01 18:06:53 UTC
Created attachment 115052 [details]
dmesg from 2.6.11 in single-user mode

Comment 15 Pete Zaitcev 2005-06-02 00:14:24 UTC
I need same dmesgs only with the calibrator connected, please.
They will cut off at the top a little, but that's ok.


Comment 16 Lars Damerow 2005-06-02 00:18:52 UTC
Hi Pete,

The calibrator's connected in both cases--it's device 5-2.

Comment 17 Pete Zaitcev 2005-06-07 10:09:28 UTC
Created attachment 115179 [details]
Annotated usbmon trace

This trace has both success and failure, see two last plugs,
near the end of file.

Comment 18 Pete Zaitcev 2005-06-07 10:13:28 UTC
Created attachment 115180 [details]
Ian's uhci dumps with a failing TD

Fails at same distance, after 56 bytes. TD link switches a page (coincidence?)

Comment 19 Pete Zaitcev 2005-06-07 23:29:18 UTC
I have worked with the customer to debug the issue and I'm going to chekpoint
it for now. Here's the status.

Scenario appears as following. After the khubd enumerates the calibrator,
hotplug is run. Something in it causes /sys files to be read, although
I am not sure what. Perhaps it's enough to scan a directory...
When they are read, usb_string is called to fetch string descriptors
for serial number (if present), manufacturer, product, interface,
current configuration. The last request fails.

Two hotplug processes are run simultaneously, but this is a red herring.
The failure happens reliably with just one.

It appears that everything is set up correctly, including the CPU side
of the DMA, but the device simply ceases functioning after transferring
56 out of 60 bytes. The HC (UHCI) reports CRC/Timeout and not any host
DMA related problems. The problem reproduces with swiotlb off. The Ian's
error dump shows TDs well formed.

The difference with 2.6.8 is that it does not try to fetch string
descriptors for configuration and interface.

Lars is going to use a workaround which backs out the fetching of
additional descriptors.

At this point, the problem needs heavy artillery to make headway.
A USB bus analyzer would help to see what actually happens.
Another option would be to instrument uhci-hcd heavily.


Comment 20 Pete Zaitcev 2005-06-07 23:30:22 UTC
Created attachment 115201 [details]
The workaround (removal of extra descriptor fetches)

Comment 21 Dave Jones 2005-07-15 20:13:00 UTC
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 22 Dave Jones 2005-10-03 00:52:57 UTC
This bug has been automatically closed as part of a mass update.
It had been in NEEDINFO state since July 2005.
If this bug still exists in current errata kernels, please reopen this bug.

There are a large number of inactive bugs in the database, and this is the only
way to purge them.

Thank you.