Bug 1807661

Summary: Display corruption on aarch64 virtual machines
Product: [Fedora] Fedora Reporter: Paul Whalen <pwhalen>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: airlied, awilliam, bcotton, bskeggs, crobinso, gmarr, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, pbonzini, pbrobinson, robatino, steved
Target Milestone: ---Flags: bcotton: fedora_prioritized_bug-
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard: AcceptedBlocker
Fixed In Version: kernel-5.6.0-0.rc5.git0.2.fc32 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 18:57:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 1705303    

Description Paul Whalen 2020-02-26 21:03:51 UTC
1. Please describe the problem:

Since Fedora-Rawhide-20200207.n.2 vnc text on aarch64 appears garbled and unreadable. As a result openqa testing is failing:

https://openqa.stg.fedoraproject.org/tests/733211#step/_console_wait_login/8

This appears to be the first successful compose after the mass rebuild. 

2. What is the Version-Release number of the kernel:

kernel-5.6.0-0.rc0.git5.1.fc32+

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Last working compose was Fedora-Rawhide-20200204.n.0

Installing the Fedora 31 kernel on the guest helps, text is momentarily garbled but quickly "repairs' itself on screen.

Comment 1 Adam Williamson 2020-03-04 19:46:36 UTC
Note, this isn't actually only affecting the console. If you watch videos of openQA tests you can see corruption occurring while anaconda is running too, and sometimes tests fail on that. e.g. https://openqa.stg.fedoraproject.org/tests/742521#step/_software_selection/27 .

Comment 2 Adam Williamson 2020-03-04 19:48:35 UTC
Note, these tests run with `-device virtio-gpu-pci` for the graphics device.

Comment 3 Adam Williamson 2020-03-04 19:49:02 UTC
Paul, can you check whether this happens on bare metal?

Comment 4 Adam Williamson 2020-03-04 20:32:23 UTC
Also seems to happen if we use `-device VGA` instead of `-device virtio-gpu-pci`, FWIW.

Comment 5 Adam Williamson 2020-03-04 22:07:44 UTC
So this is a bit arguable, but I'm going to propose this as a Beta blocker as a violation of the "Bug hinders execution of required Beta test plans or dramatically reduces test coverage" requirement - https://fedoraproject.org/wiki/Fedora_32_Beta_Release_Criteria#Beta_Blocker_Bugs . This bug causes almost all openQA tests to fail on every compose, and openQA is a good part of our test coverage these days. aarch64 is a release-blocking architecture.

I'm also proposing it as a PrioritizedBug, with approximately the same justification - it's a big problem for openQA, and I have not yet been able to figure out a workaround to get the tests running again.

Comment 6 Ben Cotton 2020-03-04 22:14:04 UTC
I'm going to miss the Blocker Review meeting on Monday, so consider me +1 Beta Blocker.

Comment 7 Paul Whalen 2020-03-04 23:37:01 UTC
(In reply to Adam Williamson from comment #3)
> Paul, can you check whether this happens on bare metal?

This does not happen on bare metal (verified on a seattle, Fedora-32-20200304.n.0 compose).

Comment 8 Adam Williamson 2020-03-05 00:07:43 UTC
Thanks! And you wrote 'VNC' in the description, so did you try it with SPICE and find that was OK too? (sadly openQA can't use SPICE...)

Comment 9 Paul Whalen 2020-03-05 21:09:08 UTC
(In reply to Adam Williamson from comment #8)
> Thanks! And you wrote 'VNC' in the description, so did you try it with SPICE
> and find that was OK too? (sadly openQA can't use SPICE...)

Unfortunately, SPICE looks the same.

Comment 10 Adam Williamson 2020-03-05 21:12:54 UTC
so basically it looks like the problem space here is 'all aarch64 VMs', or something close to it. Let's tag some virt-y folks...

Comment 11 Geoffrey Marr 2020-03-09 23:58:07 UTC
Discussed during the 2020-03-09 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion:

"The release must be able host virtual guest instances of the same release" and for its impact on aarch64 testing coverage.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-03-09/f32-blocker-review.2020-03-09-16.01.txt

Comment 12 Adam Williamson 2020-03-11 02:06:14 UTC
This seems to have been fixed in Rawhide in Fedora-Rawhide-20200307.n.1. Most tests passed again in that compose and Fedora-Rawhide-20200309.n.1. Seems like we got a kernel update in that compose:

Package:      kernel-5.6.0-0.rc4.git1.1.fc33
Old package:  kernel-5.6.0-0.rc4.git0.1.fc33
...
Changelog:
  * Fri Mar 06 2020 Jeremy Cline <jcline>
  - Reenable debugging options.

  * Fri Mar 06 2020 Jeremy Cline <jcline> - 5.6.0-0.rc4.git1.1
  - Linux v5.6-rc4-135-gaeb542a1b5c5

Paul, are you able to test with a newer kernel build on F32 - https://koji.fedoraproject.org/koji/buildinfo?buildID=1476218 is the most recent as I write this - and see if that resolves it? Thanks!

Comment 13 Paul Whalen 2020-03-11 14:09:37 UTC
Confirmed, with kernel-5.6.0-0.rc5.git0.2.fc32 I no longer see the issue.

Comment 14 Fedora Update System 2020-03-11 16:45:21 UTC
FEDORA-2020-55b2b79091 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-55b2b79091

Comment 15 Ben Cotton 2020-03-11 17:02:17 UTC
Rejecting as a Prioritized Bug since it is an accepted blocker: https://meetbot.fedoraproject.org/fedora-meeting/2020-03-11/fedora_prioritized_bugs_and_issues.2020-03-11-15.00.log.html#l-39

Comment 16 Paul Whalen 2020-03-12 17:17:27 UTC
This is fixed in Beta 1.2 which includes kernel-5.6.0-0.rc5.git0.2.fc32.

Comment 17 Fedora Update System 2020-03-12 18:57:14 UTC
kernel-5.6.0-0.rc5.git0.2.fc32, kernel-headers-5.6.0-0.rc5.git0.1.fc32 has been pushed to the Fedora 32 stable repository. If problems still persist, please make note of it in this bug report.