Bug 1727388 - Modifier keypresses (shift+, ctrl+ etc.) sometimes not properly recognized in Firefox Wayland backend in openQA tests
Summary: Modifier keypresses (shift+, ctrl+ etc.) sometimes not properly recognized in...
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: firefox
Version: 31
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Gecko Maintainer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-05 19:54 UTC by Adam Williamson
Modified: 2019-08-13 18:54 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Adam Williamson 2019-07-05 19:54:12 UTC
We have an openQA test which basically starts up Firefox in GNOME and checks that it works. It's an automated version of this manual test:

https://fedoraproject.org/wiki/QA:Testcase_desktop_browser

and it does everything that test describes.

Last year, it started failing quite often, in a distinctive way: the test is supposed to enter a modified keypress - e.g. shift+; to type a ":", or ctrl+t to open a new tab - but the modifier seems to be ignored, so instead a ";" or a "t" is typed.

There are probably 5 or 6 such keypresses in the test (I haven't counted exactly) and the test fails about half the times it's run, so this is happening on I guess 15-20% of modified keypresses, very approximately.

Today I dug into this a bit more, and noticed something significant: I believe it only happens when we're testing a Firefox build with the Wayland backend. It happens on Rawhide tests since 2018-10-08 - which is exactly when 62.0.3-2 landed, which "Enable[d] Wayland backed by default on Fedora 30". It happened on Fedora 30 Branched tests at first - but suddenly stopped happening after https://bodhi.fedoraproject.org/updates/FEDORA-2019-2e62c6961a was pushed stable, which switched Fedora 30 back to the X11 backend. It never happens on Fedora 29 or Fedora 30 update tests, even though they happen very frequently. So it seems very strongly associated with the Wayland backend.

Unfortunately I haven't yet been able to reproduce this manually. openQA works by running qemu (not via libvirt, directly) with the 'std' graphics device and a VNC server display; it then connects to the VNC server and acts as a client, sending keypresses that way. I tested running a similar VM in virt-manager ('VGA' graphics, which is the same as 'std', and VNC backend) but cannot get the bug to happen by manually typing colons into Firefox in that VM. However, it happens very consistently in openQA, it's very annoying in fact as it makes the test very flaky and I cannot find any workaround.

The grubby details of exactly how openQA (in fact os-autoinst, which is openQA's test runner) connects to the VNC server and sends events can be found around here:

https://github.com/os-autoinst/os-autoinst/blob/master/consoles/VNC.pm#L723

To take the example of typing a colon - that `map_and_send_key` will take ':' as its input and, when it looks that up in `$self->keymap` (which is set up by `$self->init_x11_keymap` in this case) it will get an array, indicating it needs to send two key inputs, shift and ;. When it gets an array it will send a 'down' key event for each key in turn, as fast as possible (there is no sleep between them); then it will sleep for 2000 microseconds; then it will send a key up event for each key in turn, as fast as possible again, and *not* reversed (so it releases the shift key very slightly before it releases the ; key).

It's a bit tricky to know where to report this, since it's kinda at the intersection of qemu and os-autoinst and firefox; but since it seems specific to firefox's wayland backend (we don't have this problem in any other circumstance), I'm filing it there for now.

Comment 1 Adam Williamson 2019-07-05 22:22:44 UTC
After some experimenting, I figured out that changing os-autoinst's key event order avoids this bug. If I make it reverse the array when sending the up events, so it does this:

shift down
; down
(short sleep)
; up
shift up

instead of this:

shift down
; down
(short sleep)
shift up
; up

It seems to solve the problem. I tested by hacking up the openQA test to type in 'https://kernel.org' 30 times, checking that the colon showed up properly each time. I ran that test three times with the original code: in each case it failed within 8 attempts. I ran it three times with the modified code: in each case it succeeded.

I'm going to send a PR for the change to os-autoinst as I think it's a sensible change, but it shouldn't be *necessary*. The key event order that os-autoinst uses is a bit strange but there's no reason it shouldn't work, AFAICS. So it still seems like something else is buggy here.

Comment 2 Ben Cotton 2019-08-13 17:08:10 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 3 Ben Cotton 2019-08-13 18:54:28 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.


Note You need to log in before you can comment on or make changes to this bug.