435419 – appletouch occasionally crashes, imagines touch that isn't there (regression)

Bug 435419 - appletouch occasionally crashes, imagines touch that isn't there (regression)

Summary: appletouch occasionally crashes, imagines touch that isn't there (regression)

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	8
Hardware:	powerpc
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-02-29 07:45 UTC by Alex Markley
Modified:	2009-01-09 06:05 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-01-09 06:05:11 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
usbmon session where appletouch dies (842.59 KB, text/plain) 2008-03-06 02:12 UTC, Alex Markley	no flags	Details
A few seconds of usbmon output from the 2.6.23 kernel. (76.39 KB, text/plain) 2008-03-12 04:27 UTC, Alex Markley	no flags	Details
Jiri's guess (8.71 KB, patch) 2008-03-18 18:13 UTC, Pete Zaitcev	no flags	Details \| Diff
View All

Description Alex Markley 2008-02-29 07:45:17 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12

Description of problem:
Ever since I upgraded to this new version of the kernel (which I was told to do to solve another problem) my iBook G4 12" trackpad has been frequently crashing.

Specifically, everything starts up fine and I can use the trackpad normally, but after some hours of use, appletouch module begins reporting heavy touches all over the pad, disrupting normal use.

Unloading/reloading the appletouch module appears to reset the device and (tentatively) solve the problem.

Previous (2.6.23) kernels did not have this problem, and my trackpad worked fine.

Version-Release number of selected component (if applicable):
kernel-2.6.24.2-10.fc8

How reproducible:
Always


Steps to Reproduce:
1. use trackpad
2. ???
3. madness!

Actual Results:
Trackpad flips out.

Expected Results:
Trackpad probably shouldn't flip out.

Additional info:
[root@localhost Movies]# dmesg | grep -i touch
appletouch: Geyser mode initialized.
input: appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.0/input/input6
usbcore: registered new interface driver appletouch
appletouch: incomplete data package (first byte: 0, length: 17).
input: appletouch disconnected
appletouch: Geyser mode initialized.
input: appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.0/input/input8
appletouch: incomplete data package (first byte: 0, length: 17).
usbcore: deregistering interface driver appletouch
input: appletouch disconnected
appletouch: Geyser mode initialized.
input: appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.0/input/input11
appletouch: incomplete data package (first byte: 0, length: 17).
usbcore: registered new interface driver appletouch
[root@localhost Movies]#

Comment 1 Chuck Ebbert 2008-02-29 21:22:58 UTC

Do the messages happen at the same time as the bad behavior?

Comment 2 Alex Markley 2008-02-29 21:31:46 UTC

No, as far as I can tell, appletouch thinks everything is fine. So I get no
interesting messages in dmesg during the misbehavior.

Here's a snapshot of synclient running when things are working properly:

[alex@localhost scripts]$ synclient -m 200
    time     x    y   z f  w  l r u d m     multi  gl gm gr gdx gdy
   0.000   585  161   0 0  0  0 0 0 0 0  00000000   0  0  0   0   0
   1.601   494  152  84 1  0  0 0 0 0 0  00000000   0  0  0   0   0
   1.801   471  152  81 1  0  0 0 0 0 0  00000000   0  0  0   0   0
   2.001   428  172  86 1  0  0 0 0 0 0  00000000   0  0  0   0   0
   2.201   428  172   0 0  0  0 0 0 0 0  00000000   0  0  0   0   0

Note, in particular, the Z column, which shows 0 for no touch, and ~83 for a
normal touch.

When the crash occurs (which, oddly, hasn't yet happened since I reported this
bug) synclient shows the driver reporting touch values of 250+ (max is 300) for
no apparent reason.

I'll try to get some synclient output next time it goes haywire. Also, any tips
on other sources of debugging information would be greatly appreciated.

Comment 3 Pete Zaitcev 2008-02-29 21:50:37 UTC

The usbmon is my cure-all for this sort of thing, because it can show
precisely what goes on. Unfortunately, usbmon has to be running at the
time of crash, so it produces megabytes of useless data until that
moment. I wish the was a way to trip this more often... Maybe some kind
of load condition in the system that makes two reports to stick together.
I'm imagining a disk error or other case which makes CPU to close
interrupts enough for two mouse events to collide. Mind, this is pure
speculation. But if you find a way to make it happen more often, running
usbmon becomes reasonable.

Comment 4 Alex Markley 2008-02-29 23:21:19 UTC

I have two ideas on how I might cause the crash more reliably. First, the crash
seems to occur after I've left my laptop plugged into the network and running
unsupervised for a few hours. (Ie, not interacting with it at all for a while.)

The second thing that strikes me is this: The last time the crash happened, I
figured out that unloading/reloading appletouch would bring the device back up.
It hasn't crashed since, so that makes me wonder if rebooting wouldn't make it
unstable again.

I'll poke around some more and see if I can't nail down a specific situation
where the issue occurs.

In the mean-time, I can't seem to find any package like "usbmon"... Would you
mind pointing me in the right direction on that? It would be nice if I could
come back with some detailed logs of what's going on during the issue.

Comment 5 Chuck Ebbert 2008-02-29 23:26:54 UTC

(In reply to comment #4)
> In the mean-time, I can't seem to find any package like "usbmon"... Would you
> mind pointing me in the right direction on that? It would be nice if I could
> come back with some detailed logs of what's going on during the issue.

Install the kernel-doc package and read this file:

/usr/share/doc/kernel-doc-<version>/Documentation/usb/usbmon.txt

Comment 6 Pete Zaitcev 2008-03-06 00:32:38 UTC

I forgot that this is PPC, so usbmon trace is going to be full of "D"
records ("D" means that usbmon gives up because of pre-mapped DMA).
I have to come up with a better idea.

Comment 7 Alex Markley 2008-03-06 01:55:28 UTC

Hey there. Sorry it's taken so long for me to get back with you on this. I was
busy all last week, and it's taken me the better part of two days to get a good
snapshot of the crash via usbmon.

I don't see the "D" records of which you speak... Is it possibly because of a
difference between the internal USB bus and the bus for the external ports?

Anyway, I finally got it, so here I attach a very nice snapshot of it dying...

Comment 8 Alex Markley 2008-03-06 02:12:08 UTC

Created attachment 296979 [details]
usbmon session where appletouch dies

Okay, so here I start capturing like so:

[root@localhost ~]# mount -t debugfs none_debugs /sys/kernel/debug
[root@localhost ~]# cat /sys/kernel/debug/usbmon/0u > /tmp/appletouch_dies.txt

I swipe the trackpad a few times to demonstrate that everything works, then I
leave the trackpad alone for a few minutes.

You'll notice a long gap in the microseconds of the records. By microsecond
2445740193, everything's gone to pot. It doesn't respond properly to my
touches, and it jitters around the screen even when I'm not touching it.

I don't know the protocol the trackpad uses, but it appears to exchange groups
of ee643100 records when I let go of the trackpad. After microsecond
2445740193, I don't see any more of those "let go" records, despite the fact
that I clearly let go several times.

Is it possible that some timing issue is causing the trouble? Since I upgraded
to this new kernel, I've been getting these messages at startup and shutdown:
"Cannot access the Hardware Clock via any known method. Use the --debug option
to see the details of our search for an access method."

Anyhoo, please let me know if I can gather any other information.

Comment 9 Pete Zaitcev 2008-03-08 02:34:32 UTC

All right, this is the critical section:

efa62d80 1780028222 S Ii:2:003:1 -115:1 81 <
efa62d80 1780036213 C Ii:2:003:1 0:1 81 = 551f1b16 1a510a2a 005911ed 2f004c11
f227004b 11f33400 5211ef28 003c5113
ee643100 1780036245 S Ci:2:003:0 s a1 01 0300 0000 0008 8 <
ee643100 1780038214 C Ci:2:003:0 0 8 = 00020000 00000000
ee643100 1780038233 S Co:2:003:0 s 21 09 0300 0000 0008 8 = 04020000 00000000
ee643100 1780040212 C Co:2:003:0 0 8 >
efa62d80 1780040227 S Ii:2:003:1 -115:1 81 <
efa62d80 1780044212 C Ii:2:003:1 0:1 81 = 551f1b16 1b510a2b 005a11ee 2f004c11
f227004b 11f33300 5111ee27 003b5113
efa62d80 1780044216 S Ii:2:003:1 -115:1 81 <
efa62d80 2445740193 C Ii:2:003:1 0:1 81 = 55292027 1f511530 005d11f8 34005011
fd2c004f 11fd3900 5411f935 003f511e
efa62d80 2445740221 S Ii:2:003:1 -115:1 81 <
efa62d80 2445748186 C Ii:2:003:1 0:1 81 = 552a203a 1f511530 005d11f8 34004f11
fd2c004f 11fd3800 5411f93c 003f511e

It seems like too much of a coincidence that the input layer decides
to issue a1 01 (Get Report) and 21 09 (Set Report) right before
everything dies. However, the device delivers one more report
before the pause, which looks just fine.

I wish I knew what the 04 in the first byte of the report was.
This would only be possible to decode if we knew what reports and
fields the device has. I wonder if evtest would tell us...
I have a copy, but it needs to be built for PPC
 http://people.redhat.com/zaitcev/tmp/evtest.c

I'm going to bug Dmitry and ask him to help.

Comment 10 Pete Zaitcev 2008-03-08 08:26:01 UTC

Meanwhile, I grepped through the source a bit. The only way to do the
above is through usbhid_submit_report. And the only place which submits
both kinds is hiddev. Ergo, an application does this (maybe X). But why?
Screensaver? It's a mystery.

Comment 11 Pete Zaitcev 2008-03-11 01:47:48 UTC

Is the old (working) kernel still available? The analysis points
squaredly at an application (possibly X itself, through evdev)
issuing a disruptive ioctl. But this conclusion contradicts the
original report which identifies a kernel regression.

Comment 12 Alex Markley 2008-03-12 04:22:49 UTC

Okay, I managed to find and install some old 2.6.23.9-85.fc8 (known-working)
kernel RPMs. (Had to strong-arm rpm to get it to install, but whatever.)

I'll be running the old kernel for the next few days to see if using it really
eliminates the problem.

However, I've already noticed one major difference. Usbmon seems to be streaming
LOTS more data to/from the appletouch device in this kernel. With the newer
2.6.24 kernels, usbmon seems to only show data when I'm actually touching the
appletouch device. The 2.6.23.9-85.fc8 version shows a non-stop stream of data
flowing.

I'll post a relevant attachment next...

Comment 13 Alex Markley 2008-03-12 04:27:34 UTC

Created attachment 297703 [details]
A few seconds of usbmon output from the 2.6.23 kernel.

I produced this output the same way as before:

[root@localhost ~]# mount -t debugfs none_debugs /sys/kernel/debug
[root@localhost ~]# cat /sys/kernel/debug/usbmon/0u >
/tmp/appletouch_usbmon_2.6.23.9-85.fc8.txt

This usbmon capture only represents a few seconds. I started the capture,
quickly swiped the trackpad twice, and left it alone for a second, then killed
the capture.

Looking at the pattern of data, it seems clear to me that the device and device
driver are communicating using a markedly different protocol in 2.6.23 than
they are in 2.6.24.

And to be clear, I have not yet experienced any trackpad misbehavior in this
2.6.23 kernel yet.

Comment 14 Pete Zaitcev 2008-03-12 05:27:24 UTC

I'm going to poke linux-input for suggestions.

Comment 15 Pete Zaitcev 2008-03-13 00:02:46 UTC

Jiri suggests to revert 2a3e480d4b3392ce8907089094bd074575f9bb2a
and see if that helps. He also reminds that my code inspection was
invalid because appletouch is not a part of HID.

Comment 16 Alex Markley 2008-03-13 01:13:47 UTC

(In reply to comment #15)
> Jiri suggests to revert 2a3e480d4b3392ce8907089094bd074575f9bb2a
> and see if that helps.

I'm afraid I don't understand what this means. :-P Sorry if I'm missing
something obvious.

> He also reminds that my code inspection was
> invalid because appletouch is not a part of HID.

So this may not be an application-level problem after all?

Comment 17 Pete Zaitcev 2008-03-13 02:45:59 UTC

Apparently not, but your efforts to install the old kernel weren't
in vain, because they enabled me to press the regression angle with
the upstream. I thought I cc-ed you on it.

Comment 18 Alex Markley 2008-03-13 16:14:08 UTC

I did see a small snapshot of your discussion, although my limited familiarity
of kernel stuff prevents me from comprehending everything I read. :-P It's
encouraging to know that it might actually be a device driver bug and not some
horrible mistake on my part. ;)

Unfortunately, I don't know the meaning of this: "[R]evert
2a3e480d4b3392ce8907089094bd074575f9bb2a and see if that helps." What exactly
should I do here?

Also, I seem to have discovered another interesting piece of information: Once
the trackpad starts acting up, if I leave it alone long enough, the trouble can
occasionally clear up. (Not frequently, but it's definitely happened more than
once now.)

Let me know what else I can do to help.

Comment 19 Alex Markley 2008-03-18 16:25:10 UTC

Just installed new kernel-2.6.24.3-34.fc8. Issue is still present.

Comment 20 Pete Zaitcev 2008-03-18 18:13:07 UTC

Alex, no surprise here. I should build you a test kernel with the revert,
but it's a bigger pain than just throwing a patch at someone. Once there
(assuming Jiri's guess is correct), we'd need to identify the actual
issue, the patch 2a3e480d4b is pretty long.

Comment 21 Pete Zaitcev 2008-03-18 18:13:55 UTC

Created attachment 298432 [details]
Jiri's guess

Comment 22 Alex Markley 2008-03-18 19:41:14 UTC

Ah, so "2a3e480d4b" refers to a specific patch. I'm less confused now. :)

I would offer to build and install a test kernel myself, but the ppc boot
process scares the willies out of me. I know grub inside and out, but I simply
don't know how to install a kernel on this ibook.

If you were to link to some instructions and a conveniently-patched kernel
tarball, I might be able to build it. (Assuming that's any easier for you than
just building an rpm and having me install that.)

Comment 23 Chuck Ebbert 2008-03-19 14:01:33 UTC

http://fedoraproject.org/wiki/Docs/CustomKernel

Comment 24 Bug Zapper 2008-11-26 09:58:37 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 25 Bug Zapper 2009-01-09 06:05:11 UTC

Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.