Bug 294011 (xen-mouse-stuck-left)

Summary: Mouse pointer gets stuck on the left side of display, under Xen
Product: [Fedora] Fedora Reporter: Eduardo Habkost <ehabkost>
Component: kernel-xen-2.6Assignee: Eduardo Habkost <ehabkost>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: airlied, atkac, lucien, mcepl, mykel, rjones, xen-maint, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6-2.6.20-2936.fc7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-12 14:17:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program to reproduce the bug none

Description Eduardo Habkost 2007-09-17 21:23:00 UTC
Description of problem:

The description of the bug and how to reproduce it is really bizarre. But I 
can reproduce it easily on my machine, and had another report of the problem. 
I considered the possibility of this being a kernel bug, but I concluded that
the X server is not behaving properly, as you can see below.

This is reproducible on both Rawhide and Fedora 7. I managed to reproduce it 
only on i386.

It was reported on this thread on the fedora-xen mailing list:
https://www.redhat.com/archives/fedora-xen/2007-September/msg00049.html


Steps to reproduce:

It is bizarre. Be prepared.  :)

1. Install kernel-xen-2.6.21-2940.fc8xen
2. Boot the Xen kernel
3. Log in on gdm to Gnome
4. Create a dummy cd-rom image (just to make virt-install happy):
   - dd if=/dev/zero bs=1M count=10 of=/tmp/img
5. Start a full-virtualized guest:
   - virt-install -v -c /tmp/img --nodisks -n myguest -r 500 --vnc
6. Quickly, before the guest console (vncviewer) is shown, move heavily the 
mouse. I move it quickly up and down (fast and large movements), but large and 
fast circular movements seem to work also. Keep moving the mouse until the 
guest console is shown.
7. Approximatedly half of the time, the cursor will get stuck on the left side 
of the screen less than one second after the console viewer is shown. When 
stuck, the cursor will move to the right only if you move the mouse very 
slowly.
8. If the cursor doesn't get stuck:
   8.1. destroy the guest:
     - Close the guest console viewer
     - xm destroy myguest
     - xm delete myguest
   8.2. Go back to step 5


Actual results:
Cursor gets stuck on the left side of the screen.


Expected results:
Cursor don't get stuck on the side of the screen.  :)




Additional info:

- Restarting the X server makes the problem go away. However, if you move 
heavily the mouse while the X server and gdm are restarting, the cursor may 
get stuck again, on the gdm screen. However, this works only when the Xen 
guest is still running. I haven't managed to make the cursor get stuck on 
X-server restart, if the Xen guest is destroyed.


Things that make me believe this is a Xorg bug:

- When the cursor gets stuck, starting gpm on the text console 
(on /dev/input/mice) works as expected, so the data coming 
from /dev/input/mice is correct.
- Changing the mouse acceleration threshold to a huge value (xset m 1/1 1000) 
makes the cursor behave normally. Enabling cursor acceleration again (xset m 
2/1 4) makes the cursor get stuck on the left again. Changing to a large (but 
usable) value makes the cursor go to the left of the screen as soon as the 
movement to the right is larger than the acceleration threshold.



Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.3.0.0-23.fc8

Comment 1 Matěj Cepl 2007-09-18 11:50:37 UTC
Just to be sure, attach please to this bug report /etc/X11/xorg.conf and
/var/log/Xorg.0.log as separate uncompressed attachments.

However, I am afraid this is more vnc problem.

Comment 2 Adam Tkac 2007-09-18 13:02:14 UTC
*** Bug 242737 has been marked as a duplicate of this bug. ***

Comment 3 Matěj Cepl 2007-09-18 13:18:00 UTC
After consultation with RH people, it seems to be more libvirt problem. Guys, do
you have any ideas, what's going on here (and in bug 242737, which is from vnc
maintainer)?

Comment 4 Eduardo Habkost 2007-09-18 13:27:06 UTC
Got the problem reproduce under gdb. Added a breakpoint on the mouse 
acceleration code where movement is above the configured threshold:

(gdb) c
Continuing.

Breakpoint 6, xf86PostMotionEvent (device=0x84daec8, is_absolute=0, 
first_valuator=0, num_valuators=2) at xf86Xinput.c:941
941                                 local->dxremaind = ((float)dx * (float)
(device->ptrfeed->ctrl.num)) /
(gdb) list
936                      */
937                     if (device->ptrfeed && device->ptrfeed->ctrl.num) {
938                         /* modeled from xf86Events.c */
939                         if (device->ptrfeed->ctrl.threshold) {
940                             if ((abs(dx) + abs(dy)) >= 
device->ptrfeed->ctrl.threshold) {
941                                 local->dxremaind = ((float)dx * (float)
(device->ptrfeed->ctrl.num)) /
942                                     (float)(device->ptrfeed->ctrl.den) + 
local->dxremaind;
943                                 valuator[0] = (int)local->dxremaind;
944                                 local->dxremaind = local->dxremaind - 
(float)valuator[0];
945
(gdb) p local->dxremaind
$10 = -nan(0x400000)
(gdb) p local->dyremaind
$11 = 0


Somehow local->dxremaind becomes -NaN.


I have added a watch on local->dxremaind, and I got when the NaN value was 
set:

(gdb) c
Continuing.
Hardware watchpoint 3: ((struct _LocalDeviceRec *) 137218624)->dxremaind

Old value = 0
New value = -nan(0x400000)
Hardware watchpoint 6: ((struct _LocalDeviceRec *) 137218624)->dxremaind

Old value = 0
New value = -nan(0x400000)
xf86PostMotionEvent (device=0x82dcec8, is_absolute=0, first_valuator=0, 
num_valuators=2) at xf86Xinput.c:946
946                                 local->dyremaind = ((float)dy * (float)
(device->ptrfeed->ctrl.num)) /
4: ((struct _LocalDeviceRec *) device->public.devicePrivate)->dxremaind 
= -nan(0x400000)
2: device = (DeviceIntPtr) 0x82dcec8
1: device->public.devicePrivate = (pointer) 0x82dca40
(gdb) list
941                                 local->dxremaind = ((float)dx * (float)
(device->ptrfeed->ctrl.num)) /
942                                     (float)(device->ptrfeed->ctrl.den) + 
local->dxremaind;
943                                 valuator[0] = (int)local->dxremaind;
944                                 local->dxremaind = local->dxremaind - 
(float)valuator[0];
945
946                                 local->dyremaind = ((float)dy * (float)
(device->ptrfeed->ctrl.num)) /
947                                     (float)(device->ptrfeed->ctrl.den) + 
local->dyremaind;
948                                 valuator[1] = (int)local->dyremaind;
949                                 local->dyremaind = local->dyremaind - 
(float)valuator[1];
950                             }
(gdb) bt
#0  xf86PostMotionEvent (device=0x82dcec8, is_absolute=0, first_valuator=0, 
num_valuators=2) at xf86Xinput.c:946
#1  0x00129570 in ?? () from /usr/lib/xorg/modules/input//mouse_drv.so
#2  0x00129c0a in ?? () from /usr/lib/xorg/modules/input//mouse_drv.so
#3  0x0012a10b in ?? () from /usr/lib/xorg/modules/input//mouse_drv.so
#4  0x080d499a in xf86SigioReadInput (fd=7, closure=0x82dca40) at 
xf86Events.c:1212
#5  0x080b3821 in xf86SIGIO (sig=29) at ../shared/sigio.c:113
#6  <signal handler called>


Comment 5 Eduardo Habkost 2007-09-18 15:02:05 UTC
The bug is getting interesting: I have added a breakpoint on xf86Xinput.c:941 
with condition 'dx<-1000||dx>1000', and the condition was never met while I 
was reproducing the bug.

While reproducing the bug, a watchpoint on local->dxremaind was never hit, 
until local->dxremaind became -nan(0x400000). Before that, local->dxremaind 
was 0.

Here is the relevant code:

 local->dxremaind = ((float)dx * (float)(device->ptrfeed->ctrl.num)) /
     (float)(device->ptrfeed->ctrl.den) + local->dxremaind;
 valuator[0] = (int)local->dxremaind;
 local->dxremaind = local->dxremaind - (float)valuator[0];

device->ptrfeed->ctrl.num is always 2, device->ptrfeed->ctrl.den is always 1.

dx is an int.

If dxremaind is 0, it doesn't seem to be mathematically possible to get NaN if 
dx is never larger than 1000 or smaller than -1000. So I believe this is a bug 
on either compiler, kernel, or processor. I bet on kernel-xen.

Comment 6 Eduardo Habkost 2007-09-18 16:52:33 UTC
Created attachment 198581 [details]
Test program to reproduce the bug

I have managed to reproduce the floating-point bug outside the X server.

Run the test program attached, and create a new full-virt guest (as in the
description of this bug). Approximatedly half of the time the test program
begin to spit "nan" right after the guest is initialized. No need to move the
mouse while testing, anymore.

It is really not a Xorg bug.

Comment 7 Eduardo Habkost 2007-09-18 16:55:25 UTC
Changing component to kernel-xen, as it is likely a kernel-xen bug when 
restoring floating point state.

Comment 8 Matěj Cepl 2007-09-20 09:47:56 UTC
*** Bug 251216 has been marked as a duplicate of this bug. ***

Comment 9 Matěj Cepl 2007-09-20 10:04:10 UTC
*** Bug 222615 has been marked as a duplicate of this bug. ***

Comment 10 Eduardo Habkost 2007-09-20 16:48:27 UTC
I have noticed the i386 sleazy-fpu patch that was introduced on 2.6.20 
introduces the problem.

I thought the problem was that math_state_restore() doesn't call clts() under 
Xen (that is expected when handling the device_not_available trap, under Xen). 
However, the problem persistedwith clts() on math_state_restore() (when called 
from switch_to()).

When using the xen-3.1.1-rc1 hypervisor, however, the problem was solved.

We have two problems that need to be fixed:

a) Hypervisor FPU restoring bug under HVM (fixed on xen-3.1.1). The problem 
that this bug is about.
b) sleazy-fpu needs clts() to be called on math_state_restore() (it doesn't 
cause visible problems, but it causes the kernel to trap itself when trying to 
restore math state on switch_to())

Comment 11 Eduardo Habkost 2007-09-20 18:07:19 UTC
Found fix on xen-3.1.1-rc1:
http://xenbits.xensource.com/xen-3.1-testing.hg?rev/8c24767501ff

Comment 12 Adam Tkac 2007-09-21 07:49:22 UTC
kernel-xen-2.6.21-2942.fc8 looks fine. I don't see any mouse problems

Comment 13 Fedora Update System 2007-09-25 08:25:07 UTC
kernel-xen-2.6-2.6.20-2936.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2007-09-28 21:25:07 UTC
kernel-xen-2.6-2.6.20-2936.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.