Bug 2016782 - kernel-5.15.0-0.rc6.20211021git2f111a6fd5b5.49.fc36 doesn't boot on aarch64 vm's
Summary: kernel-5.15.0-0.rc6.20211021git2f111a6fd5b5.49.fc36 doesn't boot on aarch64 vm's
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2021-10-23 18:12 UTC by Kevin Fenzi
Modified: 2021-12-15 23:18 UTC (History)
24 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Kevin Fenzi 2021-10-23 18:12:23 UTC
The above kernel doesn't seem to boot on aarch64 vm's. This is breaking rawhide compsoes as it tries to boot in qemu to build the Cloud-Base image (and others). 

There's a lot of uefi spew on the console, but the last part is: 

[Security] 3rd party image[0] can be loaded after EndOfDxe: VenMedia(1428F772-B64A-441E-B8C3-9EBDD7F893C7)/kernel.
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E2D7D40                            
Loading driver at 0x0012D020000 EntryPoint=0x0012EDD6B54                                            
Loading driver at 0x0012D020000 EntryPoint=0x0012EDD6B54                                            
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E2AAB98                            
ProtectUefiImageCommon - 0x3E2D7D40                                                                 
  - 0x000000012D020000 - 0x0000000004730000                                                         
SetUefiImageMemoryAttributes - 0x000000012D020000 - 0x0000000000010000 (0x0000000000004008)         
SetUefiImageMemoryAttributes - 0x000000012D030000 - 0x0000000001DF0000 (0x0000000000020008)         
SetUefiImageMemoryAttributes - 0x000000012EE20000 - 0x0000000002930000 (0x0000000000004008)         
QemuLoadKernelImage: command line: "inst.method=https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20211023.n.0/compose/Everything/aarch64/os/ inst.ks=file:/ks.cfg console=ttyAMA0 console=tty0 initrd=initrd"
[Bds]Stop Hotkey Service!                                                                           
[Bds]UnregisterKeyNotify: 000C/0000 Success                                                         
[Bds]UnregisterKeyNotify: 0017/0000 Success                                                         
[Bds]UnregisterKeyNotify: 0000/000D Success                                                         
EFI stub: Booting Linux Kernel...                                                                   
EFI stub: Generating empty DTB                                                                      
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path                                
EFI stub: Exiting boot services...                                                                  
SetUefiImageMemoryAttributes - 0x000000013F520000 - 0x0000000000040000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x000000013C1B0000 - 0x0000000000040000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x000000013C160000 - 0x0000000000040000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x000000013F4E0000 - 0x0000000000030000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x000000013C110000 - 0x0000000000040000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x000000013C020000 - 0x0000000000040000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x00000001343C0000 - 0x0000000000030000 (0x0000000000000008)         
SetUefiImageMemoryAttributes - 0x0000000134380000 - 0x0000000000030000 (0x0000000000000008)

So it seems to have handed off to booting the kernel, but nothing appears after that here. This seems to happen on mustangs and emags at least. 

I've untagged this kernel for now to try and get a compose through. 

Happy to try and debug more, etc.

Comment 1 Peter Robinson 2021-10-24 11:44:08 UTC
So there's not a huge change between the last two kernels, a lot of it was audio which can be ruled out. Looking at the remainder I'm guessing it's a KVM change as I'm guessing this is a guest:

1ca7554d05ac038c98271f8968ed821266ecaa9c mm/thp: decrease nr_thps in file's mapping on THP split
79f9bc5843142b649575f887dccdf1c07ad75c20 mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()
032146cda85566abcd1c4884d9d23e4e30a07e9a vfs: check fd has read access in kernel_read_file_from_fd()
3ddd60268c24bcac9d744404cc277e9dc52fe6b6 mm, slub: fix incorrect memcg slab count for bulk free
67823a544414def2a36c212abadb55b23bcda00c mm, slub: fix potential use-after-free in slab_debugfs_fops
9037c57681d25e4dcc442d940d6dbe24dd31f461 mm, slub: fix potential memoryleak in kmem_cache_open()
899447f669da76cc3605665e1a95ee877bc464cc mm, slub: fix mismatch between reconstructed freelist depth and cnt
2127d22509aec3a83dffb2a3c736df7ba747a7ce mm, slub: fix two bugs in slab_debug_trace_open()
6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()
5173ed72bcfcddda21ff274ee31c6472fa150f29 memblock: check memory total_size
a6a0251c6fce496744121b4e08c899f45270dbcc mm/migrate: fix CPUHP state to update node demotion order
76af6a054da4055305ddb28c5eb151b9ee4f74f9 mm/migrate: add CPU hotplug to demotion #ifdef
295be91f7ef0027fca2f2e4788e99731aa931834 mm/migrate: optimize hotplug-time demotion order updates
cb185d5f1ebf900f4ae3bf84cee212e6dd035aca userfaultfd: fix a race between writeprotect and exit_mmap()
8913970c19915bbe773d97d42989cd85b7fdc098 mm/userfaultfd: selftests: fix memory corruption with thp enabled
6e6a8ef088e1222cb1250942f51ad9c1ab219ab2 KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
7615c2a514788559c6684234b8fc27f3a843c2c6 KVM: arm64: Report corrupted refcount at EL2
1d58a17ef54599506d44c45ac95be27273a4d2b1 KVM: arm64: Fix host stage-2 PGD refcount

Comment 2 Andrew Jones 2021-10-25 07:11:41 UTC
(In reply to Peter Robinson from comment #1)
> So there's not a huge change between the last two kernels, a lot of it was
> audio which can be ruled out. Looking at the remainder I'm guessing it's a
> KVM change as I'm guessing this is a guest:
> 

Actually, assuming this is a guest, KVM changes can be ruled out, as guests don't enable KVM.

Also, it looks like an exception is occurring very early in Linux boot. I suspect we wouldn't even see anything with earlycon added to the kernel command line (but that's always worth a shot). My guess is that the initial page tables are messed up somehow, but I suggest we just blindly bisect to find the "bad" commit before we try to analyse anything.

Comment 3 Jeremy Linton 2021-12-15 23:18:39 UTC
So, on a libvirt/qemu/kvm/edk setup, its using ACPI so "earlycon" will honor the SPCR. So, putting earlycon by itself on the grub command line after the console=tty0 line and you should see something that is more helpful. Although, you might also try replacing the console=tty0 with console=ttyAMA0 too as, well the console can be problematic.


Note You need to log in before you can comment on or make changes to this bug.