Bug 1123584

Summary: Touchpad on Dell xps 13 9333 jumps to left bottom corner occasionally since 3.15
Product: [Fedora] Fedora Reporter: Piotr Popieluch <piotr1212>
Component: kernelAssignee: Benjamin Tissoires <btissoir>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: btissoir, gansalmon, hdegoede, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, piotr1212
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.16.4-200.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-11 06:51:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hid-recorder output1
none
hid recorder output2
none
0001-HID-rmi-check-sanity-of-the-incoming-report.patch none

Description Piotr Popieluch 2014-07-26 22:05:08 UTC
Description of problem:
When using the touchpad on Dell XPS 13 (9333/Haswell) the cursor jumps to the left bottom corner when being used. This happens about once every 1~5 minutes. In kernel 3.14 this didn't occur. In 3.14 I had to blacklist i2c_hid to get it working.

Version-Release number of selected component (if applicable):
kernel-3.15.5-200.fc20.x86_64

How reproducible:
Use touchpad and it jumps to left bottom corner once in a while.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
Cursor not to jump to the bottom left corner.

Additional info:
➜  ~  xinput list
⎡ Virtual core pointer                    	id=2	[master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer              	id=4	[slave  pointer  (2)]
⎜   ↳ SYNAPTICS Synaptics Large Touch Screen  	id=9	[slave  pointer  (2)]
⎜   ↳ DLL060A:00 06CB:2734                    	id=11	[slave  pointer  (2)]
⎣ Virtual core keyboard                   	id=3	[master keyboard (2)]

Comment 1 Hans de Goede 2014-07-27 09:46:06 UTC
Benjamin, this seems to be related to the i2c hid vs touchpad work you've been doing recently, can you take a look at this please ?

Comment 2 Benjamin Tissoires 2014-07-28 19:07:37 UTC
I tried to play today with a prototype of this touchpad. I was not able to reproduce the bug.

Piotr, I'll need some hid-recordings[1] of your touchpad while the bug is appearing.

Can you install hid-replay from the link [1], then start hid-recorder as root and start recording your touchpad (DLL060A:00 06CB:2734). Once you got the jump, hit ctrl-C and attach the hid-recorder output to this bug.

[1] http://bentiss.github.io/hid-replay-docs/Fedora.html

Comment 3 Piotr Popieluch 2014-07-28 19:37:05 UTC
Created attachment 921910 [details]
hid-recorder output1

Comment 4 Piotr Popieluch 2014-07-28 19:37:42 UTC
Created attachment 921911 [details]
hid recorder output2

Comment 5 Piotr Popieluch 2014-07-28 19:39:42 UTC
Hi Benjamin,
Thanks for looking into it. I've added two hid recordings, if there is anything I can do to help you debug, please let me know.

Comment 6 Benjamin Tissoires 2014-07-28 21:18:29 UTC
Thanks for the logs. They indeed show your problem: if you look at the line starting by "E: 6.645366" in the hid-recorder output 1, you will see that the received event contains a lot of "ff ff ff ff ff", and the driver interprets that as a very big jump in the coordinates.

This is obviously not a a big jump in the coordinates, but more likely a partial read of the input register. Apparently, the controller returns a valid read return length, with erroneous data. i2c-hid could normally deal with uncompleted read (if the controller would have said that it was able to only read the first bytes). But in this case, the data are just corrupted.

Hans, do you have any idea if this is valid from the i2c kernel spec? And more specifically, do you think we should prevent that in i2c-hid, or in the specific bus of this platform?

I'll check with Synaptics if they also saw this and if this is related to their firmware.

Comment 7 Hans de Goede 2014-07-28 21:35:34 UTC
i2c is an open collector master slave bus, here is what happens at the lowest levels when the master reads:

1) master sends an address (master controls the data line)
2) slave acks (slave pulls the line down / controls the data line)
3) master reads a data byte (slave controls the data line)
4) master sends ack (master pulls the line down / controls the data line)
5) repeat step 3-4 x times
6) master reads a data byte (slave controls the data line)
7) master sends nack (master lets the data line float, so it will go high due to pull-ups)

Note that the master determines how many bytes get read, there is no way for the slave to indicate it has no data. So what is likely happening here is that the touchpad has nothing to report, and just lets the line float (high) which will result in reading only ff ff ff.

Are we using oob signalling for the smbus for this now, or are we polling ? If we are polling then this happening makes perfect sense.

I don't know the exact protocol, but if there are places were one would always expect a 0, it makes sense to check them, and if they are 1 and the entire byte is ff, treat it as a short read (the further at the end of a single packet the bits checked are the better). Or just assume something ending with a sequence of only ff bytes is a short read.

Comment 8 Hans de Goede 2014-07-28 21:48:27 UTC
Thinking more about this, and seeing that when this happens we do get some bytes, this makes perfect sense.

Assuming the touchpad has an internal fifo for i2c packets, and normally oob signalling is used to indicate a complete packet is in the fifo, and assuming the producer and consumer both work in packet sized chunks, then when were using polling we may hit a race where there is a partial packet queued up in the fifo (so the i2c slave won't nack at step 2 from comment #7), and if we hit this race then we only get the partial packet / a partial read as the fifo runs dry mid way through our read.

And since the slave cannot indicate it has no more data / signal a short read to the master (the i2c controller) this will show up as some data + all ff ff ff.

Comment 9 Piotr Popieluch 2014-08-03 17:09:45 UTC
I've noticed the following message in my logs:

kernel: hid-rmi 0018:06CB:2734.0002: unknown intr source:d3 drivers/hid/hid-rmi.c:383

I'm not sure if this is related but I get the message a lot, about 80 times today. The message occurs often when I experience the described bug, but not every time...

Comment 10 Benjamin Tissoires 2014-08-05 14:27:41 UTC
(In reply to Piotr Popieluch from comment #9)
> I'm not sure if this is related but I get the message a lot, about 80 times
> today. The message occurs often when I experience the described bug, but not
> every time...

This should be related to this bug. Some times, the read fail and you end up receiving 'ff' instead of real data. If the read failed close enough to the begining, then the protocol is busted and these messages may appear.

Comment 11 Piotr Popieluch 2014-08-23 22:57:53 UTC
I've patched xorg-x11-drv-synaptics with this patch 
http://lists.freedesktop.org/archives/xorg-devel/2014-June/042790.html . The patch ignores large cursor jumps and makes my touchpad workable. 

I know that this doesn't fix the problem but is a workaround. 

Should I file a bug/propose a patch on the xorg-x11-drv-synaptics package as long there is no fix for this in the kernel?

see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1305522 for more info,   comment #105

Comment 12 Benjamin Tissoires 2014-08-25 14:32:14 UTC
> Should I file a bug/propose a patch on the xorg-x11-drv-synaptics package as
> long there is no fix for this in the kernel?

Hmm, not sure about this one. There has been some discussions upstream regarding this particular patch, and it rested in a status quo. So I am not sure Peter will take it right now.

I will try to have a look at the kernel problem today so we can speed this one up.

Comment 13 Benjamin Tissoires 2014-08-25 20:24:06 UTC
Created attachment 930633 [details]
0001-HID-rmi-check-sanity-of-the-incoming-report.patch

Here is a kernel patch which is likely to fix most of the cases. I am afraid that we may lose some information some times, and thus produce a weird behavior (when too many data is corrupted while it should not).

Comment 14 Benjamin Tissoires 2014-08-25 20:42:54 UTC
Piotr, I created a development branch for you to test hid-rmi with the attached patch:

$ git clone https://github.com/bentiss/hid-rmi.git -b f20
$ make
$ sudo rmmod hid-rmi ; sync ; sudo insmod ./hid-rmi.ko

If this solve your problem (without the xorg patch you mentioned above), then I'll just have to push this upstream and in Fedora.

Comment 15 Piotr Popieluch 2014-09-01 15:12:26 UTC
Ben, thank you for the patch and excuse me for my late response, I was on holiday and didn't have my laptop with me. 

I've reverted the xorg patch and built and installed the module from your git branch this morning. I Haven't experienced the problem since and I think it solves my issue. I will test a couple more days to be sure and come back at you.

Comment 16 Benjamin Tissoires 2014-09-05 13:58:55 UTC
Patch sent upstream: https://patchwork.kernel.org/patch/4852381/

I will let the bug know when it has been accepted, so we can ship it in fedora too.

Comment 17 Piotr Popieluch 2014-09-08 07:17:56 UTC
I have not encountered the bug since I applied your patch.

Comment 18 Benjamin Tissoires 2014-09-22 19:25:33 UTC
I forgot to mention it. The patch has been accepted and will be shipped in 3.18.

Josh, can you cherry pick https://git.kernel.org/cgit/linux/kernel/git/jikos/hid.git/commit/?h=for-3.18/rmi&id=5b65c2a0296644dd3dbdd590d6f00174d18c96b3 in Fedora 20 and 21?

Thanks!

Comment 19 Josh Boyer 2014-09-23 13:15:36 UTC
(In reply to Benjamin Tissoires from comment #18)
> I forgot to mention it. The patch has been accepted and will be shipped in
> 3.18.
> 
> Josh, can you cherry pick
> https://git.kernel.org/cgit/linux/kernel/git/jikos/hid.git/commit/?h=for-3.
> 18/rmi&id=5b65c2a0296644dd3dbdd590d6f00174d18c96b3 in Fedora 20 and 21?
> 
> Thanks!

Sure.  (I'll include it in rawhide as well.)

Comment 20 Josh Boyer 2014-09-23 14:45:36 UTC
OK, added to all branches.

Comment 21 Fedora Update System 2014-10-07 02:50:10 UTC
kernel-3.16.4-200.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.16.4-200.fc20

Comment 22 Fedora Update System 2014-10-08 19:04:37 UTC
Package kernel-3.16.4-200.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.16.4-200.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-12366/kernel-3.16.4-200.fc20
then log in and leave karma (feedback).

Comment 23 Fedora Update System 2014-10-11 06:51:47 UTC
kernel-3.16.4-200.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.