Bug 1727388

Summary: Modifier keypresses (shift+, ctrl+ etc.) sometimes not properly recognized in Firefox Wayland backend in openQA tests
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: firefoxAssignee: Gecko Maintainer <gecko-bugs-nobody>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 31CC: 0xalen+redhat, anto.trande, gecko-bugs-nobody, jhorak, john.j5live, kengert, okurz, peter.hutterer, pjasicek, rhughes, rstrode, sandmann
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: openqa
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-24 20:05:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Williamson 2019-07-05 19:54:12 UTC
We have an openQA test which basically starts up Firefox in GNOME and checks that it works. It's an automated version of this manual test:

https://fedoraproject.org/wiki/QA:Testcase_desktop_browser

and it does everything that test describes.

Last year, it started failing quite often, in a distinctive way: the test is supposed to enter a modified keypress - e.g. shift+; to type a ":", or ctrl+t to open a new tab - but the modifier seems to be ignored, so instead a ";" or a "t" is typed.

There are probably 5 or 6 such keypresses in the test (I haven't counted exactly) and the test fails about half the times it's run, so this is happening on I guess 15-20% of modified keypresses, very approximately.

Today I dug into this a bit more, and noticed something significant: I believe it only happens when we're testing a Firefox build with the Wayland backend. It happens on Rawhide tests since 2018-10-08 - which is exactly when 62.0.3-2 landed, which "Enable[d] Wayland backed by default on Fedora 30". It happened on Fedora 30 Branched tests at first - but suddenly stopped happening after https://bodhi.fedoraproject.org/updates/FEDORA-2019-2e62c6961a was pushed stable, which switched Fedora 30 back to the X11 backend. It never happens on Fedora 29 or Fedora 30 update tests, even though they happen very frequently. So it seems very strongly associated with the Wayland backend.

Unfortunately I haven't yet been able to reproduce this manually. openQA works by running qemu (not via libvirt, directly) with the 'std' graphics device and a VNC server display; it then connects to the VNC server and acts as a client, sending keypresses that way. I tested running a similar VM in virt-manager ('VGA' graphics, which is the same as 'std', and VNC backend) but cannot get the bug to happen by manually typing colons into Firefox in that VM. However, it happens very consistently in openQA, it's very annoying in fact as it makes the test very flaky and I cannot find any workaround.

The grubby details of exactly how openQA (in fact os-autoinst, which is openQA's test runner) connects to the VNC server and sends events can be found around here:

https://github.com/os-autoinst/os-autoinst/blob/master/consoles/VNC.pm#L723

To take the example of typing a colon - that `map_and_send_key` will take ':' as its input and, when it looks that up in `$self->keymap` (which is set up by `$self->init_x11_keymap` in this case) it will get an array, indicating it needs to send two key inputs, shift and ;. When it gets an array it will send a 'down' key event for each key in turn, as fast as possible (there is no sleep between them); then it will sleep for 2000 microseconds; then it will send a key up event for each key in turn, as fast as possible again, and *not* reversed (so it releases the shift key very slightly before it releases the ; key).

It's a bit tricky to know where to report this, since it's kinda at the intersection of qemu and os-autoinst and firefox; but since it seems specific to firefox's wayland backend (we don't have this problem in any other circumstance), I'm filing it there for now.

Comment 1 Adam Williamson 2019-07-05 22:22:44 UTC
After some experimenting, I figured out that changing os-autoinst's key event order avoids this bug. If I make it reverse the array when sending the up events, so it does this:

shift down
; down
(short sleep)
; up
shift up

instead of this:

shift down
; down
(short sleep)
shift up
; up

It seems to solve the problem. I tested by hacking up the openQA test to type in 'https://kernel.org' 30 times, checking that the colon showed up properly each time. I ran that test three times with the original code: in each case it failed within 8 attempts. I ran it three times with the modified code: in each case it succeeded.

I'm going to send a PR for the change to os-autoinst as I think it's a sensible change, but it shouldn't be *necessary*. The key event order that os-autoinst uses is a bit strange but there's no reason it shouldn't work, AFAICS. So it still seems like something else is buggy here.

Comment 2 Ben Cotton 2019-08-13 17:08:10 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to '31'.

Comment 3 Ben Cotton 2019-08-13 18:54:28 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 31 development cycle.
Changing version to 31.

Comment 4 Ben Cotton 2020-11-03 16:52:49 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2020-11-24 20:05:50 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.