Bug 1481858

Summary: CVE-2017-8379 fix causes major regression in openQA usage (programmatic typing via VNC)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 26CC: amit, berrange, cfergeau, dwmw2, itamar, pbonzini, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-2.9.0-5.fc26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-22 20:43:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Williamson 2017-08-15 21:44:21 UTC
Since 2017-08-09, openQA has been plagued by failures caused by typing errors - cases where key presses sent by the openQA test driver (os-autoinst) to the virtual machine running the actual test were missed. We've always had such failures, but they're usually quite rare, especially when typing at a console (we have a few tricks to mitigate the problem when typing in graphical desktops); the failure rate would be something like 1-2% of tests. Since 2017-08-10, something like 50% or more of all tests that do any significant amount of typing are failing.

After some investigation, it looks like the boxes where the tests run were updated on 2017-08-09 (as part of a mass infra update/reboot task), and went from qemu-2.9.0-1.fc26.1 to  qemu-2.9.0-3.fc26 . The most recent test batch where most tests passed as usual was run with qemu-2.9.0-1.fc26.1 ; the earliest test batch where a large number of tests failed due to typing errors was run with qemu-2.9.0-3.fc26 . I have now forcibly downgraded qemu on the openQA worker host boxes and re-run one set of tests, and for the first time since 2017-08-09, they mostly passed:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=25&build=Update-FEDORA-2017-bd0324f3e9&groupid=2

(the failure is not a typing failure but a screenshot that needs updating). So we're fairly confident the qemu update is the cause of the problem.

It seems highly likely the culprit is this commit:

https://github.com/qemu/qemu/commit/fa18f36a461984eae50ab957e47ec78dae3c14fc

which is intended to fix CVE-2017-8379 . Cole also notes that a SUSE developer, Alex Graf, has landed three commits since that one which seem very relevant to the specific case of openQA:

https://github.com/qemu/qemu/commit/77b0359bf414ad666d1714dc9888f1017c08e283
https://github.com/qemu/qemu/commit/51dbea77a29ea46173373a6dad4ebd95d4661f42
https://github.com/qemu/qemu/commit/d3b0db6dfea6b3a9ee0d96aceb796bdcafa84314

so we're going to check whether applying those commits on top of the 'limit kbd queue depth' commit makes things behave better. I'll do a scratch build and test this soon.

Comment 1 Adam Williamson 2017-08-16 06:23:45 UTC
So I've deployed a scratch build with those three patches applied on openQA staging, and it seems to be behaving *much* better so far. Will plan with Cole tomorrow what to do about sending out updates etc.

Comment 2 Fedora Update System 2017-08-16 22:22:50 UTC
qemu-2.9.0-5.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-a314d15e62

Comment 3 Fedora Update System 2017-08-19 18:54:01 UTC
qemu-2.9.0-5.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a314d15e62

Comment 4 Fedora Update System 2017-08-22 20:43:11 UTC
qemu-2.9.0-5.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.