Bug 157444
Description
Lars Damerow
2005-05-11 18:14:08 UTC
I just tested this with kernel 2.6.11-1.20_FC3smp and the problem persists. Any help would be much appreciated. Thanks, lars The problem is not in the kernel. Some process is hitting the device. Analyzing the process list (and things like cron tables) on different machines ought to shed some light. Kernel might be able to help, depending of the nature of the offending process. If it's just some process reading from /proc/bus/usb/devices, then I suggest trying a rawhide kernel. It implements caching of descriptors. On the off chance that it's HCD, please attach a few examples of /proc/bus/usb/devices from working and non-working boxes (please do not drop them in the comments box). Thanks for the response! The reason that I suspect something about the kernel is that I can boot one of these machines back and forth between kernel 2.6.8 and kernel 2.6.11 with GRUB, and though the USB device works under 2.6.8, it doesn't under 2.6.11. I'll attach the contents of /proc/bus/usb/devices for 2.6.8 and 2.6.11. No changes have been made to the system in between copies of the file except for rebooting from 2.6.8 to 2.6.11. The only interesting difference between the files is that, in 2.6.8, the Manufacturer and Product strings are visible for the calibrator, while in 2.6.11 they aren't. If there's any other information I can gather, please let me know. Thanks! -lars Created attachment 114597 [details]
The contents of /proc/bus/usb/devices on a fresh 2.6.8 boot.
Created attachment 114598 [details]
The contents of /proc/bus/usb/devices on a fresh 2.6.11 boot.
What is the precise version of the "2.6.8" which works? It's 2.6.8-1.521smp. Lars, I have a stupid question here... The 2.6.8-1.521 is an FC2 kernel. So, how did it end in the picture? Is it a leftover from an upgrade to FC3? Just verify for me that you did a full upgrade to FC3, please, because those messages are caused by a userland process accessing USB devices. I am wondering if you ended with a mixed package set somehow... The easiest is to run "yum update", see what it offers to update. You'll need current udev, hal, hotplug to go with current kernels (and dependencies - kudzu, pcmcia-cs, etc., whatever yum suggests). Did you try to identify the offending process? Yes, I understand that the old kernel worked. The point is to find what kind of access is happening. To add to this, did an originally shipping FC3 kernel work (2.6.9-1.637)? The distance between 2.6.8 and 2.6.11 is too great, it's hard for me to pin the problem onto a particular change. Hi Pete, The machines themselves are running FC2 with just enough updates from FC3 to run FC3's update kernels. I'll bet you probably want to close the bug report on that information alone, no doubt. :) I tried booting the machine in single-user mode and looking at /proc/bus/usb/devices for both kernels, going under the assumption that if the calibration probe's information is correct there, we won't get the USBDEVFS_CONTROL message when I run the calibration software. Under 2.6.9-1.667, booted from the FC3 install disk in rescue mode, the probe's information is correct. (I can't actually boot the machine with the shipping FC3 kernel because it doesn't work with our SATA controller, so it doesn't see our hard drives.) I tried booting the machine to single-user mode on both 2.6.8 and 2.6.11, and looked at all the running processes. The only differences are that on 2.6.11, khpsbpkt, knodemgrd_0, kthread, and minilogd are running. Then, when I run our calibration software and ask it to list the probes available, it sees the probe on 2.6.8 and fails on 2.6.11, printing the USBDEVFS_CONTROL message. Also, as before, in /proc/bus/usb/devices, the probe's information is correct in 2.6.8 and not in 2.6.11. I think that the USBDEVFS_CONTROL message is just a symptom of the kernel's incorrect initialization of this device, which worked in 2.6.9-1.667 and not in 2.6.11-1.14_FC3smp. I hope some of that was useful! cheers, lars The 2.6.9-1.667 was a good call, but the diff is still too big for me to crack the problem based on that. Let's approach it differently. Missing string descriptors in those /proc/bus/usb/messages were a good clue. Please attach two dmesg(8) output instances taken on normally running systems with 2.6.8-1.521 and 2.6.11-14_FC3. I wish to see if probing fails (it probably does), and how it fails. Here are the dmesg outputs. Created attachment 115051 [details]
dmesg from 2.6.8 in single-user mode
Created attachment 115052 [details]
dmesg from 2.6.11 in single-user mode
I need same dmesgs only with the calibrator connected, please. They will cut off at the top a little, but that's ok. Hi Pete, The calibrator's connected in both cases--it's device 5-2. Created attachment 115179 [details]
Annotated usbmon trace
This trace has both success and failure, see two last plugs,
near the end of file.
Created attachment 115180 [details]
Ian's uhci dumps with a failing TD
Fails at same distance, after 56 bytes. TD link switches a page (coincidence?)
I have worked with the customer to debug the issue and I'm going to chekpoint it for now. Here's the status. Scenario appears as following. After the khubd enumerates the calibrator, hotplug is run. Something in it causes /sys files to be read, although I am not sure what. Perhaps it's enough to scan a directory... When they are read, usb_string is called to fetch string descriptors for serial number (if present), manufacturer, product, interface, current configuration. The last request fails. Two hotplug processes are run simultaneously, but this is a red herring. The failure happens reliably with just one. It appears that everything is set up correctly, including the CPU side of the DMA, but the device simply ceases functioning after transferring 56 out of 60 bytes. The HC (UHCI) reports CRC/Timeout and not any host DMA related problems. The problem reproduces with swiotlb off. The Ian's error dump shows TDs well formed. The difference with 2.6.8 is that it does not try to fetch string descriptors for configuration and interface. Lars is going to use a workaround which backs out the fetching of additional descriptors. At this point, the problem needs heavy artillery to make headway. A USB bus analyzer would help to see what actually happens. Another option would be to instrument uhci-hcd heavily. Created attachment 115201 [details]
The workaround (removal of extra descriptor fetches)
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you. |