Bug 1630443 - Armv7 guest fails to boot on AArch64 host with 4.18.x
Summary: Armv7 guest fails to boot on AArch64 host with 4.18.x
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-18 16:24 UTC by Paul Whalen
Modified: 2018-09-26 15:19 UTC (History)
18 users (show)

Fixed In Version: kernel-4.18.9-300.fc29.aarch64
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-26 15:17:55 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
boot log with kernel ratelimiting disabled (38.28 KB, text/plain)
2018-09-18 17:16 UTC, Paul Whalen
no flags Details

Description Paul Whalen 2018-09-18 16:24:00 UTC
Description of problem:
Attempting to boot an existing armv7 guest on an aarch64 host with a 4.18.x kernel fails with:

[  144.344787] systemd-journald[493]: Failed to send WATCHDOG=1 notification message: Connection refused
[  214.344533] systemd-journald[493]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected

Version-Release number of selected component (if applicable):
kernel-4.18.x

How reproducible:
Everytime

Steps to Reproduce:

On an aarch64 host with 4.18.x kernel 

1. curl -O https://dl.fedoraproject.org/pub/fedora/linux/releases/28/Spins/armhfp/images/Fedora-Minimal-armhfp-28-1.1-sda.raw.xz
2. unxz Fedora-Minimal-armhfp-28-1.1-sda.raw.xz
3. virt-builder --get-kernel Fedora-Minimal-armhfp-28-1.1-sda.raw
4. sudo mv Fedora-Minimal-armhfp-28-1.1-sda.raw initramfs-4.16.3-301.fc28.armv7hl.img vmlinuz-4.16.3-301.fc28.armv7hl /var/lib/libvirt/images/
5. sudo virt-install --name Fedora-Minimal-armhfp-28-1.1-sda.raw --ram 4096 --arch armv7l --import --os-variant fedora22 \
                     --disk /var/lib/libvirt/images/Fedora-Minimal-armhfp-28-1.1-sda.raw \
                     --boot kernel=/var/lib/libvirt/images/vmlinuz-4.16.3-301.fc28.armv7hl,initrd=/var/lib/libvirt/images/initramfs-4.16.3-301.fc28.armv7hl.img,kernel_args="console=ttyAMA0 rw root=LABEL=_/ rootwait"

Actual results:
[  144.344787] systemd-journald[493]: Failed to send WATCHDOG=1 notification message: Connection refused
[  214.344533] systemd-journald[493]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected


Additional info:

Working as expected with 4.17.x

Comment 1 Paul Whalen 2018-09-18 16:42:36 UTC
Adding systemd.log_level=debug to the kernel args ended with a kernel panic

..
    3.932954] Checked W+X mappings: passed, no W+X pages found
[    3.935144] rodata_test: all tests were successful
[    3.963237] systemd[1]: systemd 238 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
[    3.972250] systemd[1]: Detected virtualization qemu.
[    3.974204] systemd[1]: Detected architecture arm.
[    3.975991] systemd[1]: Running in initial RAM disk.

Welcome to Fedora 28 (Twenty Eight) dracut-047-8.git20180305.fc28 (Initramfs)!

[    3.994367] Core dump to |/bin/false pipe failed
[  OK  ] Reached target Initrd Root Device.
[  OK  ] Listening on Journal Socket.
[    4.027361] Core dump to |/bin/false pipe failed
[    4.038244] systemd: 128 output lines suppressed due to ratelimiting
[    4.040676] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[    4.040676] 
[    4.044280] CPU: 0 PID: 1 Comm: systemd Not tainted 4.16.3-301.fc28.armv7hl #1
[    4.047343] Hardware name: Generic DT based system
[    4.049396] [<c0311d9c>] (unwind_backtrace) from [<c030c57c>] (show_stack+0x18/0x1c)
[    4.052584] [<c030c57c>] (show_stack) from [<c0aa6960>] (dump_stack+0x80/0xa0)
[    4.055617] [<c0aa6960>] (dump_stack) from [<c0351c10>] (panic+0xc8/0x260)
[    4.058529] [<c0351c10>] (panic) from [<c0356814>] (do_exit+0x5c8/0xac8)
[    4.061315] [<c0356814>] (do_exit) from [<c0356db4>] (do_group_exit+0x64/0xe0)
[    4.064341] [<c0356db4>] (do_group_exit) from [<c0361624>] (get_signal+0x60c/0x640)
[    4.067454] [<c0361624>] (get_signal) from [<c030b994>] (do_signal+0x80/0x3cc)
[    4.070447] [<c030b994>] (do_signal) from [<c030be6c>] (do_work_pending+0x68/0xc8)
[    4.072900] [<c030be6c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20)
[    4.074833] Exception stack(0xee0f7fb0 to 0xee0f7ff8)
[    4.076039] 7fa0:                                     014f41f0 014f41f8 00000008 0000002c
[    4.077967] 7fc0: 014bed5c 00000011 00000000 014f41f0 00000012 014f41f8 00000008 b6f72aa4
[    4.079898] 7fe0: 014f41f0 bec29640 b6e25794 b6a4fd08 800e0010 ffffffff
[    4.081490] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[    4.081490]

Comment 2 Paul Whalen 2018-09-18 17:16:44 UTC
Created attachment 1484456 [details]
boot log with kernel ratelimiting disabled

boot log with kernel ratelimiting disabled using "systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M printk.devkmsg=on"

Comment 3 Paul Whalen 2018-09-21 20:22:40 UTC
This is working again with kernel-4.18.9-300.fc29.

Comment 4 Paul Whalen 2018-09-26 15:16:05 UTC
Looking at the changelog, perhaps this fixed it:

arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
commit 7d14919c0d475a795c0127631ac8ecb2b0f31831 upstream.

I think it can be closed now.

Comment 5 Laura Abbott 2018-09-26 15:17:55 UTC
Thanks for looking. I'll close this and it can be reopened if it shows up again.

Comment 6 Peter Robinson 2018-09-26 15:19:10 UTC
Yes, so that fixes the following upstream stable changes : e6b673b741ea and looking at the original it looks about correct for the problems seen.

    KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
    
    This patch refactors KVM to align the host and guest FPSIMD
    save/restore logic with each other for arm64.  This reduces the
    number of redundant save/restore operations that must occur, and
    reduces the common-case IRQ blackout time during guest exit storms
    by saving the host state lazily and optimising away the need to
    restore the host state before returning to the run loop.


Note You need to log in before you can comment on or make changes to this bug.