Description of problem:
When using the touchpad on Dell XPS 13 (9333/Haswell) the cursor jumps to the left bottom corner when being used. This happens about once every 1~5 minutes. In kernel 3.14 this didn't occur. In 3.14 I had to blacklist i2c_hid to get it working.
Version-Release number of selected component (if applicable):
Use touchpad and it jumps to left bottom corner once in a while.
Steps to Reproduce:
Cursor not to jump to the bottom left corner.
➜ ~ xinput list
⎡ Virtual core pointer id=2 [master pointer (3)]
⎜ ↳ Virtual core XTEST pointer id=4 [slave pointer (2)]
⎜ ↳ SYNAPTICS Synaptics Large Touch Screen id=9 [slave pointer (2)]
⎜ ↳ DLL060A:00 06CB:2734 id=11 [slave pointer (2)]
⎣ Virtual core keyboard id=3 [master keyboard (2)]
Benjamin, this seems to be related to the i2c hid vs touchpad work you've been doing recently, can you take a look at this please ?
I tried to play today with a prototype of this touchpad. I was not able to reproduce the bug.
Piotr, I'll need some hid-recordings of your touchpad while the bug is appearing.
Can you install hid-replay from the link , then start hid-recorder as root and start recording your touchpad (DLL060A:00 06CB:2734). Once you got the jump, hit ctrl-C and attach the hid-recorder output to this bug.
Created attachment 921910 [details]
Created attachment 921911 [details]
hid recorder output2
Thanks for looking into it. I've added two hid recordings, if there is anything I can do to help you debug, please let me know.
Thanks for the logs. They indeed show your problem: if you look at the line starting by "E: 6.645366" in the hid-recorder output 1, you will see that the received event contains a lot of "ff ff ff ff ff", and the driver interprets that as a very big jump in the coordinates.
This is obviously not a a big jump in the coordinates, but more likely a partial read of the input register. Apparently, the controller returns a valid read return length, with erroneous data. i2c-hid could normally deal with uncompleted read (if the controller would have said that it was able to only read the first bytes). But in this case, the data are just corrupted.
Hans, do you have any idea if this is valid from the i2c kernel spec? And more specifically, do you think we should prevent that in i2c-hid, or in the specific bus of this platform?
I'll check with Synaptics if they also saw this and if this is related to their firmware.
i2c is an open collector master slave bus, here is what happens at the lowest levels when the master reads:
1) master sends an address (master controls the data line)
2) slave acks (slave pulls the line down / controls the data line)
3) master reads a data byte (slave controls the data line)
4) master sends ack (master pulls the line down / controls the data line)
5) repeat step 3-4 x times
6) master reads a data byte (slave controls the data line)
7) master sends nack (master lets the data line float, so it will go high due to pull-ups)
Note that the master determines how many bytes get read, there is no way for the slave to indicate it has no data. So what is likely happening here is that the touchpad has nothing to report, and just lets the line float (high) which will result in reading only ff ff ff.
Are we using oob signalling for the smbus for this now, or are we polling ? If we are polling then this happening makes perfect sense.
I don't know the exact protocol, but if there are places were one would always expect a 0, it makes sense to check them, and if they are 1 and the entire byte is ff, treat it as a short read (the further at the end of a single packet the bits checked are the better). Or just assume something ending with a sequence of only ff bytes is a short read.
Thinking more about this, and seeing that when this happens we do get some bytes, this makes perfect sense.
Assuming the touchpad has an internal fifo for i2c packets, and normally oob signalling is used to indicate a complete packet is in the fifo, and assuming the producer and consumer both work in packet sized chunks, then when were using polling we may hit a race where there is a partial packet queued up in the fifo (so the i2c slave won't nack at step 2 from comment #7), and if we hit this race then we only get the partial packet / a partial read as the fifo runs dry mid way through our read.
And since the slave cannot indicate it has no more data / signal a short read to the master (the i2c controller) this will show up as some data + all ff ff ff.
I've noticed the following message in my logs:
kernel: hid-rmi 0018:06CB:2734.0002: unknown intr source:d3 drivers/hid/hid-rmi.c:383
I'm not sure if this is related but I get the message a lot, about 80 times today. The message occurs often when I experience the described bug, but not every time...
(In reply to Piotr Popieluch from comment #9)
> I'm not sure if this is related but I get the message a lot, about 80 times
> today. The message occurs often when I experience the described bug, but not
> every time...
This should be related to this bug. Some times, the read fail and you end up receiving 'ff' instead of real data. If the read failed close enough to the begining, then the protocol is busted and these messages may appear.
I've patched xorg-x11-drv-synaptics with this patch
http://lists.freedesktop.org/archives/xorg-devel/2014-June/042790.html . The patch ignores large cursor jumps and makes my touchpad workable.
I know that this doesn't fix the problem but is a workaround.
Should I file a bug/propose a patch on the xorg-x11-drv-synaptics package as long there is no fix for this in the kernel?
see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1305522 for more info, comment #105
> Should I file a bug/propose a patch on the xorg-x11-drv-synaptics package as
> long there is no fix for this in the kernel?
Hmm, not sure about this one. There has been some discussions upstream regarding this particular patch, and it rested in a status quo. So I am not sure Peter will take it right now.
I will try to have a look at the kernel problem today so we can speed this one up.
Created attachment 930633 [details]
Here is a kernel patch which is likely to fix most of the cases. I am afraid that we may lose some information some times, and thus produce a weird behavior (when too many data is corrupted while it should not).
Piotr, I created a development branch for you to test hid-rmi with the attached patch:
$ git clone https://github.com/bentiss/hid-rmi.git -b f20
$ sudo rmmod hid-rmi ; sync ; sudo insmod ./hid-rmi.ko
If this solve your problem (without the xorg patch you mentioned above), then I'll just have to push this upstream and in Fedora.
Ben, thank you for the patch and excuse me for my late response, I was on holiday and didn't have my laptop with me.
I've reverted the xorg patch and built and installed the module from your git branch this morning. I Haven't experienced the problem since and I think it solves my issue. I will test a couple more days to be sure and come back at you.
Patch sent upstream: https://patchwork.kernel.org/patch/4852381/
I will let the bug know when it has been accepted, so we can ship it in fedora too.
I have not encountered the bug since I applied your patch.
I forgot to mention it. The patch has been accepted and will be shipped in 3.18.
Josh, can you cherry pick https://git.kernel.org/cgit/linux/kernel/git/jikos/hid.git/commit/?h=for-3.18/rmi&id=5b65c2a0296644dd3dbdd590d6f00174d18c96b3 in Fedora 20 and 21?
(In reply to Benjamin Tissoires from comment #18)
> I forgot to mention it. The patch has been accepted and will be shipped in
> Josh, can you cherry pick
> 18/rmi&id=5b65c2a0296644dd3dbdd590d6f00174d18c96b3 in Fedora 20 and 21?
Sure. (I'll include it in rawhide as well.)
OK, added to all branches.
kernel-3.16.4-200.fc20 has been submitted as an update for Fedora 20.
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.16.4-200.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
then log in and leave karma (feedback).
kernel-3.16.4-200.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.