Bug 1638578

Summary: Console sometimes corrupted on VT switch in qemu vm with qxl graphics
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: qemuAssignee: Fedora Virtualization Maintainers <virt-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 28CC: amit, awilliam, berrange, cfergeau, crobinso, dwmw2, fziglio, itamar, kraxel, pbonzini, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-28 19:01:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot after the bug happened (case where pre-existing content is corrupted but not fully cleared)
none
screenshot after the bug happened (case where new VT contents seems to be just drawn over previous VT contents without clear or corruption) none

Description Adam Williamson 2018-10-12 00:50:39 UTC
In the last few months I've seen openQA tests sometimes failing because the console display in the VM got messed up. What seems to be going on is, when the test does a VT switch, the display gets corrupted; instead of the display being properly cleared to black and the login prompt of the new VT showing up, the current contents of the display either aren't cleared at all or get sort of messed up, and the login prompt on the new VT is drawn over top of the existing screen content.

I'll attach screenshots and videos of this happening.

The tests run using qemu with qxl as the video adapter and VNC (not SPICE) as the server used for the controller process to 'see' the video from the VM.

The openQA worker boxes currently run F28 and are updated to latest stable periodically. The earliest occurrence of this problem I've found so far was on 2018-07-31, though I haven't looked comprehensively, there *may* be an earlier one. The openQA worker boxes were upgraded from F27 to F28 on 2018-07-12,  which incorporated an update to qemu 2.11.2 and a kernel update from 4.16.17-200.fc27 to 4.17.4-200.fc28.x86_64. I don't really see any other update between 2018-07-12 and 2018-07-31 that'd be relevant (there was a kernel update, but the system wasn't rebooted, so it didn't take any effect till much later).

Comment 1 Adam Williamson 2018-10-12 00:51:57 UTC
Created attachment 1493100 [details]
screenshot after the bug happened (case where pre-existing content is corrupted but not fully cleared)

Comment 2 Adam Williamson 2018-10-12 00:52:54 UTC
Created attachment 1493101 [details]
screenshot after the bug happened (case where new VT contents seems to be just drawn over previous VT contents without clear or corruption)

Comment 3 Adam Williamson 2018-10-12 00:56:23 UTC
Videos are too large to attach, but can be found at the following URLs for a while at least until they get garbage-collected.

Corruption case: https://openqa.fedoraproject.org/tests/292996/file/video.ogv (bug happens around 2:39)
No corruption, no clear case: https://openqa.fedoraproject.org/tests/279262/file/video.ogv (bug happens around 2:02)

Comment 4 Cole Robinson 2018-10-12 01:01:01 UTC
CCing some spice+graphics folks

Gerd does this type of graphical corruption narrow it down in any way?

Comment 5 Gerd Hoffmann 2018-10-12 06:40:09 UTC
Hmm, never seen this before.
Host kernel should not matter.
Bug might be in qemu, or spice (unlikely though), or guest kernel.
Does it happen with all guests?

Comment 6 Adam Williamson 2018-10-12 15:13:17 UTC
It's happening on F27, F28, F29 and Rawhide tests, yeah, and it doesn't seem like it appeared first for Rawhide, then 29, then 28, then 27 (as you'd sort of expect if it was a guest-side issue).

I'm going to test switching staging to 'std' instead of 'qxl' graphics and see if it seems like this bug stops happening. As it's an intermittent bug, I'll need a few days' worth of data to be sure whether that changes things.

Comment 7 Ben Cotton 2019-05-02 20:40:21 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Gerd Hoffmann 2019-05-03 07:12:23 UTC
(In reply to Adam Williamson from comment #6)
> It's happening on F27, F28, F29 and Rawhide tests, yeah, and it doesn't seem
> like it appeared first for Rawhide, then 29, then 28, then 27 (as you'd sort
> of expect if it was a guest-side issue).

Any change when running a 5.1 guest kernel?

Comment 9 Adam Williamson 2019-05-21 21:32:13 UTC
Seems I did indeed switch openQA to 'std' graphics for almost all cases at some point (probably in response to this), and indeed haven't been having this problem since doing so. So...I don't know. Sorry :/

I've run into so many different graphics issues with openQA tests at this point I keep forgetting what I set to what to avoid what...I could try setting staging back to qxl for a bit to see if this is still happening, I guess.

Comment 10 Ben Cotton 2019-05-28 19:01:16 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 11 Adam Williamson 2019-07-22 18:13:02 UTC
FWIW, we just ran into a showstopper with std:

https://bugzilla.redhat.com/show_bug.cgi?id=1732113

so I'm gonna switch back to qxl and see how that goes. Guess we'll find out if this bug is still happening too.

Comment 12 Adam Williamson 2019-07-22 19:24:51 UTC
Huh, well, one immediate result of switching back to qxl was this:

https://openqa.stg.fedoraproject.org/tests/574574#step/_console_wait_login/8

note the way the bootsplash hasn't cleared properly. Looks a bit like a similar bug we ran into when we tried virtio, actually: https://bugzilla.redhat.com/show_bug.cgi?id=1403365

Comment 13 Gerd Hoffmann 2019-08-01 08:59:19 UTC
Hmm, doesn't reproduce on a quick try.
How does openqa generate the screenshots?

Comment 14 Adam Williamson 2019-08-01 15:57:01 UTC
Oh, I should mention, it doesn't happen all the time, in fact it seems pretty rare (haven't spotted another case since then).

openQA gets the screenshots from the VNC stream, I think.

Comment 15 Gerd Hoffmann 2019-08-02 06:34:08 UTC
(In reply to Adam Williamson from comment #14)
> Oh, I should mention, it doesn't happen all the time, in fact it seems
> pretty rare (haven't spotted another case since then).
> 
> openQA gets the screenshots from the VNC stream, I think.

The screenshot looks like it could be a vnc problem.
A reliable reproducer would be very helpful to pin it down though.