Bug 2133525 - virt-install on aarch64 failing using --cloud-init option
Summary: virt-install on aarch64 failing using --cloud-init option
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: virt-manager
Version: 38
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Cole Robinson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2140164 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-10 17:51 UTC by John Villalovos
Modified: 2024-05-21 14:19 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-21 14:19:15 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Log of run of reproducer.sh (7.11 KB, text/plain)
2022-10-10 17:51 UTC, John Villalovos
no flags Details
Run with --debug and --cloud-init (26.25 KB, text/plain)
2022-10-11 17:28 UTC, John Villalovos
no flags Details
Run with --debug and an ISO image (88.79 KB, text/plain)
2022-10-11 17:28 UTC, John Villalovos
no flags Details

Description John Villalovos 2022-10-10 17:51:34 UTC
Created attachment 1917114 [details]
Log of run of reproducer.sh

Description of problem:

When running virt-install on aarch64 system (Ampere Altra) if provide the `--cloud-init` option the VM starts up then terminates.  Without `--cloud-init` the VM will boot up and start running and get to a login prompt.

The VM boots and then a message about "Boot Option Restoration", "Press any key to stop system reset". Then after about 5 seconds the VM will terminate.

Running the exact same reproducer script on x86_64 works (using the image https://dl.fedoraproject.org/pub/fedora/linux/releases/36/Cloud/x86_64/images/Fedora-Cloud-Base-36-1.5.x86_64.qcow2)

On x86_64 it boots and I can login as the user `fedora` with the password `Password1`


Version-Release number of selected component (if applicable):
virt-manager-4.0.0-1.fc36.noarch

How reproducible:
Always

Steps to Reproduce:

Running as a local user who is part of the `libvirt` group.

$ cat reproducer.sh
#!/bin/bash

set -u
set -x
set -e

TOP_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
VIRSH_POOL="${TOP_DIR}/virsh-pool"
mkdir -pv "${VIRSH_POOL}"

LIBVIRT_IMAGE_DIR="/var/lib/libvirt/images/"
FEDORA_QCOW_URL="https://dl.fedoraproject.org/pub/fedora/linux/releases/36/Cloud/aarch64/images/Fedora-Cloud-Base-36-1.5.aarch64.qcow2"
FEDORA_QCOW="${TOP_DIR}/$(basename "${FEDORA_QCOW_URL}")"
FEDORA_QCOW_VIRSH_POOL="${VIRSH_POOL}/$(basename "${FEDORA_QCOW}")"

if [[ ! -f "${FEDORA_QCOW}" ]]; then
    wget "${FEDORA_QCOW_URL}"
fi

CLOUD_INIT_USER_DATA="${VIRSH_POOL}/user-data.txt"
CLOUD_INIT_META_DATA="${VIRSH_POOL}/meta-data.txt"

function cloud_config {
    cat <<EOF > "${CLOUD_INIT_USER_DATA}"
#cloud-config
password: Password1
chpasswd: { expire: False }
ssh_pwauth: True
hostname: foo-tester
EOF

    cat <<EOF > "${CLOUD_INIT_META_DATA}"
instance-id: foo-tester
local-hostname: foo-tester
EOF
}
cloud_config

cp -v "${FEDORA_QCOW}" "${FEDORA_QCOW_VIRSH_POOL}"

virt_args=()
virt_args+=(--name foo-tester)
virt_args+=(--memory 1024)
# virt_args+=(--disk size=10,backing_store="${FEDORA_QCOW_VIRSH_POOL}",bus=virtio)
virt_args+=(--disk "${FEDORA_QCOW_VIRSH_POOL}",size=10,format=qcow2,bus=virtio)
virt_args+=(--os-variant fedora36)
virt_args+=(--nographics)
virt_args+=(--network bridge=virbr0,model=virtio)
virt_args+=(--import)
virt_args+=(--cloud-init user-data="${CLOUD_INIT_USER_DATA}",meta-data="${CLOUD_INIT_META_DATA}")
# virt_args+=(--debug)

virt-install "${virt_args[@]}"

Comment 1 John Villalovos 2022-10-10 20:24:42 UTC
I tried to create my own ISO image to hold the `user-data` and `meta-data` files.

Using the `--cdrom` option to virt-install did not work.

But using the `--disk myisoimage.iso,device=cdrom` did work. I was able to login as user `fedora` with password `Password1` into the VM.

Comment 2 Cole Robinson 2022-10-11 16:12:40 UTC
Thanks for the report.

For the failing case, please add --debug and post the failing output.
Then do the same for your manual `--disk ISO,device=cdrom` case

The manual `--cdrom` attempt won't work since that's telling the VM to boot off the cloud-init media

Comment 3 John Villalovos 2022-10-11 17:28:20 UTC
Created attachment 1917336 [details]
Run with --debug and --cloud-init

Comment 4 John Villalovos 2022-10-11 17:28:54 UTC
Created attachment 1917337 [details]
Run with --debug and an ISO image

Comment 5 John Villalovos 2022-10-11 17:29:30 UTC
(In reply to Cole Robinson from comment #2)
> Thanks for the report.
> 
> For the failing case, please add --debug and post the failing output.
> Then do the same for your manual `--disk ISO,device=cdrom` case
> 
> The manual `--cdrom` attempt won't work since that's telling the VM to boot
> off the cloud-init media

Thanks Cole. I have uploaded the two different log files as attachments.

Comment 6 Cole Robinson 2022-10-12 12:47:41 UTC
We debugged a bit more on IRC. There's one issue with virtio-scsi controller not being added, but it's not the root issue. I filed that here: https://github.com/virt-manager/virt-manager/issues/445

The main issue is that OVMF is performing a one time VM reset, which virt-install is not expecting. virt-install is expecting the VM to shutdown only after the OS is booted and cloud-init has run, then virt-install redefines the VM to remove cloud-init ISO media and drop cloud-init smbios data from the VM config. But since the VM is reseting before even hitting the OS, virt-install is ejecting media too early.

I'd like to know why aarch64 OVMF/AAVMF is resetting. kraxel and/or lersek is this expected? here's the output with some control codes removed. It says 'Boot Option Restoration' before the reset is invoked



[Tue, 11 Oct 2022 10:22:46 virt-install 32000] DEBUG (cli:265) Running text console command: virsh --connect qemu:///session console foo-tester
Connected to domain 'foo-tester'
Escape character is ^] (Ctrl + ])
Tpm2GetCapabilityPcrs - 00000004
alg - 4
alg - B
alg - C
alg - D
Image type X64 can't be loaded on AARCH64 UEFI system.
[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01HBdsDxe: loading Boot0002 "UEFI Misc Device 2" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)
BdsDxe: starting Boot0002 "UEFI Misc Device 2" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)
[0m[37m[44m[01;01H/------------------------------------------------------------------------------\[02;01H|                           Boot Option Restoration                            ...------------------------------------------------------------------------------/[26;24HPress any key to stop system reset[48;03HBooting in 5 seconds  [48;03HBooting in 4 seconds  [48;03HBooting in 3 seconds  [48;03HBooting in 2 seconds  [48;03HBooting in 1 second   [05;01H[0m[37m[40mReset System
UEFI firmware starting.
��SyncPcrAllocationsAndPcrMask!
Tpm2GetCapabilityPcrs - 00000004
alg - 4
alg - B
alg - C
alg - D
Image type X64 can't be loaded on AARCH64 UEFI system.
[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01HBdsDxe: loading Boot0004 "Fedora" from HD(1,GPT,5C481284-4AF4-437F-93B6-6FB2721A75D2,0x800,0x32000)/\EFI\fedora\shimaa64.efi
BdsDxe: starting Boot0004 "Fedora" from HD(1,GPT,5C481284-4AF4-437F-93B6-6FB2721A75D2,0x800,0x32000)/\EFI\fedora\shimaa64.efi
[0m[30m[40m[2J[01;01H[0m[37m[40m[02;30HGNU GRUB  version 2.06


[04;02H/---------------------------------------------------------------------------....\----------------------------------------------------------------------------/[43;02H[44;02H     Use the ^ and v keys to select which entry is highlighted.          

      Press enter to boot the selected OS, `e' to edit the commands       

      before booting or `c' for a command-line. ESC to return previous    

      menu.

Comment 7 Gerd Hoffmann 2022-10-13 09:43:23 UTC
> The main issue is that OVMF is performing a one time VM reset, which
> virt-install is not expecting. virt-install is expecting the VM to shutdown
> only after the OS is booted and cloud-init has run, then virt-install
> redefines the VM to remove cloud-init ISO media and drop cloud-init smbios
> data from the VM config. But since the VM is reseting before even hitting
> the OS, virt-install is ejecting media too early.

> Tpm2GetCapabilityPcrs - 00000004

I think this is TPM initialitation.  I have seen this "Boot Option Restoration"
screen in guests with TPM added, and I know that certain TPM operations need to
be done early in boot (PEI phase).  When changing TPM config options in the OVMF
setup menu (like enabling/disabling TPM banks) OVMF goes through a reboot too to
actually apply then.

Comment 8 Laszlo Ersek 2022-10-24 15:14:45 UTC
Gerd is right about the TPM PPI (Physical Presence Interface) opcodes heavily interfering with the normal boot process, but IMO this is something else.

Namely, IMO what you're seeing is the "fallback behavior" of shim. "Boot Option Restoration" is a message from "fallback.c" in the "shim" project. This "fallback" logic is active when you have an installed UEFI operating system on your disk (such as Fedora or RHEL), but you have no UEFI Boot Options for booting specifically that operating system. In such cases, a default / fallback boot logic takes place in the platform firmware (the UEFI boot manager), and the installed OS provides a utility -- called the same as the normal boot loader on *removable* UEFI media -- for restoring (recreating) UEFI boot options. That's what you are seeing here. Once the "fallback" portion of "shim" restores / recreates a UEFI boot option for launching the installed OS, it re-sets the system.

The root cause of this problem is that you define a UEFI domain (a disk image, effectively) without a matching UEFI varstore file (one that would be in sync with the completed installation process of the existent OS image on the disk). Shim perceives this as a loss of UEFI Boot Options, recreates them, and then reboots the VM.

Comment 9 Cole Robinson 2022-11-27 21:44:51 UTC
*** Bug 2140164 has been marked as a duplicate of this bug. ***

Comment 10 Cole Robinson 2022-11-27 22:30:04 UTC
Thanks for the info Laszlo + Gerd. I looked at shim fallback.c; the reset only happens when TPM is present, otherwise it just boots the first image. Justification is here: https://github.com/rhboot/shim/commit/431b8a2e

I reproduced with x86_64 and cloud image too, so it's not aarch64 specific. We are less likely to hit this on x86_64 since efi isn't the default.

Comment 11 Daniel Berrangé 2022-11-28 09:10:17 UTC
(In reply to Cole Robinson from comment #10)
> Thanks for the info Laszlo + Gerd. I looked at shim fallback.c; the reset
> only happens when TPM is present, otherwise it just boots the first image.
> Justification is here: https://github.com/rhboot/shim/commit/431b8a2e
> 
> I reproduced with x86_64 and cloud image too, so it's not aarch64 specific.
> We are less likely to hit this on x86_64 since efi isn't the default.

We want to be able to move people to EFI for x86_64 though, so we definitely need a solution.

If we assume that *every* Linux pre-built disk image that has EFI support is going to contain 'shim' as the first bootloader, then we know we will always get this early reboot.

We can query libosinfo to ask what disks have EFI support IIRC. 

I see the initial XML is being given:

-  <sysinfo type="smbios">
-    <system>
-      <entry name="serial">ds=nocloud</entry>
-    </system>
-  </sysinfo>
-  <on_reboot>destroy</on_reboot>
 </domain>

IIUC, this use of the SMBIOS should be redundant according to docs:

https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html

[quote]
You can provide meta-data and user-data to a local vm boot via files on a vfat or iso9660 filesystem. The filesystem volume label must be cidata or CIDATA.

Alternatively, you can provide meta-data via kernel command line or SMBIOS “serial number” option. 
[/quote]

IOW, as long as we use the label 'cidata', there's no need for SMBIOS settings.

GIT history shows cloud-init supported 'cidata' since 2017, and 'CIDATA' since 2019

Comment 12 Ben Cotton 2023-04-25 18:03:10 UTC
This message is a reminder that Fedora Linux 36 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 36 on 2023-05-16.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '36'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 36 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 13 Ludek Smid 2023-05-25 15:46:24 UTC
Fedora Linux 36 entered end-of-life (EOL) status on 2023-05-16.

Fedora Linux 36 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 14 Cole Robinson 2023-08-29 16:45:22 UTC
Reopening, I believe this is still relevant for virt-install/virt-manager

Comment 15 Aoife Moloney 2024-05-07 15:50:50 UTC
This message is a reminder that Fedora Linux 38 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 38 on 2024-05-21.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '38'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 38 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 16 Aoife Moloney 2024-05-21 14:19:15 UTC
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21.

Fedora Linux 38 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.