Bug 1807661 - Display corruption on aarch64 virtual machines
Summary: Display corruption on aarch64 virtual machines
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: ARMTracker BetaBlocker, F32BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2020-02-26 21:03 UTC by Paul Whalen
Modified: 2020-03-12 18:57 UTC (History)
24 users (show)

Fixed In Version: kernel-5.6.0-0.rc5.git0.2.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 18:57:14 UTC
Type: Bug
bcotton: fedora_prioritized_bug-


Attachments (Terms of Use)

Description Paul Whalen 2020-02-26 21:03:51 UTC
1. Please describe the problem:

Since Fedora-Rawhide-20200207.n.2 vnc text on aarch64 appears garbled and unreadable. As a result openqa testing is failing:

https://openqa.stg.fedoraproject.org/tests/733211#step/_console_wait_login/8

This appears to be the first successful compose after the mass rebuild. 

2. What is the Version-Release number of the kernel:

kernel-5.6.0-0.rc0.git5.1.fc32+

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Last working compose was Fedora-Rawhide-20200204.n.0

Installing the Fedora 31 kernel on the guest helps, text is momentarily garbled but quickly "repairs' itself on screen.

Comment 1 Adam Williamson 2020-03-04 19:46:36 UTC
Note, this isn't actually only affecting the console. If you watch videos of openQA tests you can see corruption occurring while anaconda is running too, and sometimes tests fail on that. e.g. https://openqa.stg.fedoraproject.org/tests/742521#step/_software_selection/27 .

Comment 2 Adam Williamson 2020-03-04 19:48:35 UTC
Note, these tests run with `-device virtio-gpu-pci` for the graphics device.

Comment 3 Adam Williamson 2020-03-04 19:49:02 UTC
Paul, can you check whether this happens on bare metal?

Comment 4 Adam Williamson 2020-03-04 20:32:23 UTC
Also seems to happen if we use `-device VGA` instead of `-device virtio-gpu-pci`, FWIW.

Comment 5 Adam Williamson 2020-03-04 22:07:44 UTC
So this is a bit arguable, but I'm going to propose this as a Beta blocker as a violation of the "Bug hinders execution of required Beta test plans or dramatically reduces test coverage" requirement - https://fedoraproject.org/wiki/Fedora_32_Beta_Release_Criteria#Beta_Blocker_Bugs . This bug causes almost all openQA tests to fail on every compose, and openQA is a good part of our test coverage these days. aarch64 is a release-blocking architecture.

I'm also proposing it as a PrioritizedBug, with approximately the same justification - it's a big problem for openQA, and I have not yet been able to figure out a workaround to get the tests running again.

Comment 6 Ben Cotton 2020-03-04 22:14:04 UTC
I'm going to miss the Blocker Review meeting on Monday, so consider me +1 Beta Blocker.

Comment 7 Paul Whalen 2020-03-04 23:37:01 UTC
(In reply to Adam Williamson from comment #3)
> Paul, can you check whether this happens on bare metal?

This does not happen on bare metal (verified on a seattle, Fedora-32-20200304.n.0 compose).

Comment 8 Adam Williamson 2020-03-05 00:07:43 UTC
Thanks! And you wrote 'VNC' in the description, so did you try it with SPICE and find that was OK too? (sadly openQA can't use SPICE...)

Comment 9 Paul Whalen 2020-03-05 21:09:08 UTC
(In reply to Adam Williamson from comment #8)
> Thanks! And you wrote 'VNC' in the description, so did you try it with SPICE
> and find that was OK too? (sadly openQA can't use SPICE...)

Unfortunately, SPICE looks the same.

Comment 10 Adam Williamson 2020-03-05 21:12:54 UTC
so basically it looks like the problem space here is 'all aarch64 VMs', or something close to it. Let's tag some virt-y folks...

Comment 11 Geoffrey Marr 2020-03-09 23:58:07 UTC
Discussed during the 2020-03-09 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion:

"The release must be able host virtual guest instances of the same release" and for its impact on aarch64 testing coverage.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-03-09/f32-blocker-review.2020-03-09-16.01.txt

Comment 12 Adam Williamson 2020-03-11 02:06:14 UTC
This seems to have been fixed in Rawhide in Fedora-Rawhide-20200307.n.1. Most tests passed again in that compose and Fedora-Rawhide-20200309.n.1. Seems like we got a kernel update in that compose:

Package:      kernel-5.6.0-0.rc4.git1.1.fc33
Old package:  kernel-5.6.0-0.rc4.git0.1.fc33
...
Changelog:
  * Fri Mar 06 2020 Jeremy Cline <jcline@redhat.com>
  - Reenable debugging options.

  * Fri Mar 06 2020 Jeremy Cline <jcline@redhat.com> - 5.6.0-0.rc4.git1.1
  - Linux v5.6-rc4-135-gaeb542a1b5c5

Paul, are you able to test with a newer kernel build on F32 - https://koji.fedoraproject.org/koji/buildinfo?buildID=1476218 is the most recent as I write this - and see if that resolves it? Thanks!

Comment 13 Paul Whalen 2020-03-11 14:09:37 UTC
Confirmed, with kernel-5.6.0-0.rc5.git0.2.fc32 I no longer see the issue.

Comment 14 Fedora Update System 2020-03-11 16:45:21 UTC
FEDORA-2020-55b2b79091 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-55b2b79091

Comment 15 Ben Cotton 2020-03-11 17:02:17 UTC
Rejecting as a Prioritized Bug since it is an accepted blocker: https://meetbot.fedoraproject.org/fedora-meeting/2020-03-11/fedora_prioritized_bugs_and_issues.2020-03-11-15.00.log.html#l-39

Comment 16 Paul Whalen 2020-03-12 17:17:27 UTC
This is fixed in Beta 1.2 which includes kernel-5.6.0-0.rc5.git0.2.fc32.

Comment 17 Fedora Update System 2020-03-12 18:57:14 UTC
kernel-5.6.0-0.rc5.git0.2.fc32, kernel-headers-5.6.0-0.rc5.git0.1.fc32 has been pushed to the Fedora 32 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.