From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12 Description of problem: Ever since I upgraded to this new version of the kernel (which I was told to do to solve another problem) my iBook G4 12" trackpad has been frequently crashing. Specifically, everything starts up fine and I can use the trackpad normally, but after some hours of use, appletouch module begins reporting heavy touches all over the pad, disrupting normal use. Unloading/reloading the appletouch module appears to reset the device and (tentatively) solve the problem. Previous (2.6.23) kernels did not have this problem, and my trackpad worked fine. Version-Release number of selected component (if applicable): kernel-2.6.24.2-10.fc8 How reproducible: Always Steps to Reproduce: 1. use trackpad 2. ??? 3. madness! Actual Results: Trackpad flips out. Expected Results: Trackpad probably shouldn't flip out. Additional info: [root@localhost Movies]# dmesg | grep -i touch appletouch: Geyser mode initialized. input: appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.0/input/input6 usbcore: registered new interface driver appletouch appletouch: incomplete data package (first byte: 0, length: 17). input: appletouch disconnected appletouch: Geyser mode initialized. input: appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.0/input/input8 appletouch: incomplete data package (first byte: 0, length: 17). usbcore: deregistering interface driver appletouch input: appletouch disconnected appletouch: Geyser mode initialized. input: appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.0/input/input11 appletouch: incomplete data package (first byte: 0, length: 17). usbcore: registered new interface driver appletouch [root@localhost Movies]#
Do the messages happen at the same time as the bad behavior?
No, as far as I can tell, appletouch thinks everything is fine. So I get no interesting messages in dmesg during the misbehavior. Here's a snapshot of synclient running when things are working properly: [alex@localhost scripts]$ synclient -m 200 time x y z f w l r u d m multi gl gm gr gdx gdy 0.000 585 161 0 0 0 0 0 0 0 0 00000000 0 0 0 0 0 1.601 494 152 84 1 0 0 0 0 0 0 00000000 0 0 0 0 0 1.801 471 152 81 1 0 0 0 0 0 0 00000000 0 0 0 0 0 2.001 428 172 86 1 0 0 0 0 0 0 00000000 0 0 0 0 0 2.201 428 172 0 0 0 0 0 0 0 0 00000000 0 0 0 0 0 Note, in particular, the Z column, which shows 0 for no touch, and ~83 for a normal touch. When the crash occurs (which, oddly, hasn't yet happened since I reported this bug) synclient shows the driver reporting touch values of 250+ (max is 300) for no apparent reason. I'll try to get some synclient output next time it goes haywire. Also, any tips on other sources of debugging information would be greatly appreciated.
The usbmon is my cure-all for this sort of thing, because it can show precisely what goes on. Unfortunately, usbmon has to be running at the time of crash, so it produces megabytes of useless data until that moment. I wish the was a way to trip this more often... Maybe some kind of load condition in the system that makes two reports to stick together. I'm imagining a disk error or other case which makes CPU to close interrupts enough for two mouse events to collide. Mind, this is pure speculation. But if you find a way to make it happen more often, running usbmon becomes reasonable.
I have two ideas on how I might cause the crash more reliably. First, the crash seems to occur after I've left my laptop plugged into the network and running unsupervised for a few hours. (Ie, not interacting with it at all for a while.) The second thing that strikes me is this: The last time the crash happened, I figured out that unloading/reloading appletouch would bring the device back up. It hasn't crashed since, so that makes me wonder if rebooting wouldn't make it unstable again. I'll poke around some more and see if I can't nail down a specific situation where the issue occurs. In the mean-time, I can't seem to find any package like "usbmon"... Would you mind pointing me in the right direction on that? It would be nice if I could come back with some detailed logs of what's going on during the issue.
(In reply to comment #4) > In the mean-time, I can't seem to find any package like "usbmon"... Would you > mind pointing me in the right direction on that? It would be nice if I could > come back with some detailed logs of what's going on during the issue. Install the kernel-doc package and read this file: /usr/share/doc/kernel-doc-<version>/Documentation/usb/usbmon.txt
I forgot that this is PPC, so usbmon trace is going to be full of "D" records ("D" means that usbmon gives up because of pre-mapped DMA). I have to come up with a better idea.
Hey there. Sorry it's taken so long for me to get back with you on this. I was busy all last week, and it's taken me the better part of two days to get a good snapshot of the crash via usbmon. I don't see the "D" records of which you speak... Is it possibly because of a difference between the internal USB bus and the bus for the external ports? Anyway, I finally got it, so here I attach a very nice snapshot of it dying...
Created attachment 296979 [details] usbmon session where appletouch dies Okay, so here I start capturing like so: [root@localhost ~]# mount -t debugfs none_debugs /sys/kernel/debug [root@localhost ~]# cat /sys/kernel/debug/usbmon/0u > /tmp/appletouch_dies.txt I swipe the trackpad a few times to demonstrate that everything works, then I leave the trackpad alone for a few minutes. You'll notice a long gap in the microseconds of the records. By microsecond 2445740193, everything's gone to pot. It doesn't respond properly to my touches, and it jitters around the screen even when I'm not touching it. I don't know the protocol the trackpad uses, but it appears to exchange groups of ee643100 records when I let go of the trackpad. After microsecond 2445740193, I don't see any more of those "let go" records, despite the fact that I clearly let go several times. Is it possible that some timing issue is causing the trouble? Since I upgraded to this new kernel, I've been getting these messages at startup and shutdown: "Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method." Anyhoo, please let me know if I can gather any other information.
All right, this is the critical section: efa62d80 1780028222 S Ii:2:003:1 -115:1 81 < efa62d80 1780036213 C Ii:2:003:1 0:1 81 = 551f1b16 1a510a2a 005911ed 2f004c11 f227004b 11f33400 5211ef28 003c5113 ee643100 1780036245 S Ci:2:003:0 s a1 01 0300 0000 0008 8 < ee643100 1780038214 C Ci:2:003:0 0 8 = 00020000 00000000 ee643100 1780038233 S Co:2:003:0 s 21 09 0300 0000 0008 8 = 04020000 00000000 ee643100 1780040212 C Co:2:003:0 0 8 > efa62d80 1780040227 S Ii:2:003:1 -115:1 81 < efa62d80 1780044212 C Ii:2:003:1 0:1 81 = 551f1b16 1b510a2b 005a11ee 2f004c11 f227004b 11f33300 5111ee27 003b5113 efa62d80 1780044216 S Ii:2:003:1 -115:1 81 < efa62d80 2445740193 C Ii:2:003:1 0:1 81 = 55292027 1f511530 005d11f8 34005011 fd2c004f 11fd3900 5411f935 003f511e efa62d80 2445740221 S Ii:2:003:1 -115:1 81 < efa62d80 2445748186 C Ii:2:003:1 0:1 81 = 552a203a 1f511530 005d11f8 34004f11 fd2c004f 11fd3800 5411f93c 003f511e It seems like too much of a coincidence that the input layer decides to issue a1 01 (Get Report) and 21 09 (Set Report) right before everything dies. However, the device delivers one more report before the pause, which looks just fine. I wish I knew what the 04 in the first byte of the report was. This would only be possible to decode if we knew what reports and fields the device has. I wonder if evtest would tell us... I have a copy, but it needs to be built for PPC http://people.redhat.com/zaitcev/tmp/evtest.c I'm going to bug Dmitry and ask him to help.
Meanwhile, I grepped through the source a bit. The only way to do the above is through usbhid_submit_report. And the only place which submits both kinds is hiddev. Ergo, an application does this (maybe X). But why? Screensaver? It's a mystery.
Is the old (working) kernel still available? The analysis points squaredly at an application (possibly X itself, through evdev) issuing a disruptive ioctl. But this conclusion contradicts the original report which identifies a kernel regression.
Okay, I managed to find and install some old 2.6.23.9-85.fc8 (known-working) kernel RPMs. (Had to strong-arm rpm to get it to install, but whatever.) I'll be running the old kernel for the next few days to see if using it really eliminates the problem. However, I've already noticed one major difference. Usbmon seems to be streaming LOTS more data to/from the appletouch device in this kernel. With the newer 2.6.24 kernels, usbmon seems to only show data when I'm actually touching the appletouch device. The 2.6.23.9-85.fc8 version shows a non-stop stream of data flowing. I'll post a relevant attachment next...
Created attachment 297703 [details] A few seconds of usbmon output from the 2.6.23 kernel. I produced this output the same way as before: [root@localhost ~]# mount -t debugfs none_debugs /sys/kernel/debug [root@localhost ~]# cat /sys/kernel/debug/usbmon/0u > /tmp/appletouch_usbmon_2.6.23.9-85.fc8.txt This usbmon capture only represents a few seconds. I started the capture, quickly swiped the trackpad twice, and left it alone for a second, then killed the capture. Looking at the pattern of data, it seems clear to me that the device and device driver are communicating using a markedly different protocol in 2.6.23 than they are in 2.6.24. And to be clear, I have not yet experienced any trackpad misbehavior in this 2.6.23 kernel yet.
I'm going to poke linux-input for suggestions.
Jiri suggests to revert 2a3e480d4b3392ce8907089094bd074575f9bb2a and see if that helps. He also reminds that my code inspection was invalid because appletouch is not a part of HID.
(In reply to comment #15) > Jiri suggests to revert 2a3e480d4b3392ce8907089094bd074575f9bb2a > and see if that helps. I'm afraid I don't understand what this means. :-P Sorry if I'm missing something obvious. > He also reminds that my code inspection was > invalid because appletouch is not a part of HID. So this may not be an application-level problem after all?
Apparently not, but your efforts to install the old kernel weren't in vain, because they enabled me to press the regression angle with the upstream. I thought I cc-ed you on it.
I did see a small snapshot of your discussion, although my limited familiarity of kernel stuff prevents me from comprehending everything I read. :-P It's encouraging to know that it might actually be a device driver bug and not some horrible mistake on my part. ;) Unfortunately, I don't know the meaning of this: "[R]evert 2a3e480d4b3392ce8907089094bd074575f9bb2a and see if that helps." What exactly should I do here? Also, I seem to have discovered another interesting piece of information: Once the trackpad starts acting up, if I leave it alone long enough, the trouble can occasionally clear up. (Not frequently, but it's definitely happened more than once now.) Let me know what else I can do to help.
Just installed new kernel-2.6.24.3-34.fc8. Issue is still present.
Alex, no surprise here. I should build you a test kernel with the revert, but it's a bigger pain than just throwing a patch at someone. Once there (assuming Jiri's guess is correct), we'd need to identify the actual issue, the patch 2a3e480d4b is pretty long.
Created attachment 298432 [details] Jiri's guess
Ah, so "2a3e480d4b" refers to a specific patch. I'm less confused now. :) I would offer to build and install a test kernel myself, but the ppc boot process scares the willies out of me. I know grub inside and out, but I simply don't know how to install a kernel on this ibook. If you were to link to some instructions and a conveniently-patched kernel tarball, I might be able to build it. (Assuming that's any easier for you than just building an rpm and having me install that.)
http://fedoraproject.org/wiki/Docs/CustomKernel
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.