Bug 1353689

Summary: AAVMF: Drops to shell with uninitialized NVRAM file
Product: Red Hat Enterprise Linux 7 Reporter: Andrea Bolognani <abologna>
Component: ovmfAssignee: Laszlo Ersek <lersek>
Status: CLOSED CURRENTRELEASE QA Contact: Chao Yang <chayang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: abologna, chayang, drjones, juzhang, rjones
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: ovmf-20160608-2.git988715a.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-10 10:56:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Working boot on RHEL 7.3 nightly
none
Failing boot on RHEL 7.3 nightly
none
Working boot on Fedora 24
none
Failing boot on Fedora 24
none
Working boot on Debian testing
none
Failing boot on Debian testing none

Description Andrea Bolognani 2016-07-07 17:45:10 UTC
This can be tested very easily by starting with a working
guest and replacing its NVRAM file with a copy of the NVRAM
template, like so:

  $ sudo cp -p \
    /var/lib/libvirt/qemu/nvram/guest_VARS.fd .

  $ cat /usr/share/AAVMF/AAVMF_VARS.fd | \
    sudo tee /var/lib/libvirt/qemu/nvram/guest_VARS.fd \
    >/dev/null

Starting the guest now will cause AAVMF to drop to a shell.

If the backup copy taken previously is restored, like so:

  $ sudo cat guest_VARS.fd | \
    sudo tee /var/lib/libvirt/qemu/nvram/guest_VARS.fd \
    >/dev/null

the guest will again boot normally.

Attached are logs for working and failing boots with Fedora
24, RHEL 7.3 nightly and Debian testing as guest OS.

Running on the host:

  kernel-4.5.0-0.44.el7.aarch64
  AAVMF-20160608-1.git988715a.el7.noarch
  qemu-kvm-rhev-2.6.0-11.el7.aarch64

Comment 1 Andrea Bolognani 2016-07-07 17:46:01 UTC
Created attachment 1177401 [details]
Working boot on RHEL 7.3 nightly

Comment 2 Andrea Bolognani 2016-07-07 17:46:31 UTC
Created attachment 1177402 [details]
Failing boot on RHEL 7.3 nightly

Comment 3 Andrea Bolognani 2016-07-07 17:46:56 UTC
Created attachment 1177403 [details]
Working boot on Fedora 24

Comment 4 Andrea Bolognani 2016-07-07 17:47:19 UTC
Created attachment 1177404 [details]
Failing boot on Fedora 24

Comment 5 Andrea Bolognani 2016-07-07 17:47:42 UTC
Created attachment 1177405 [details]
Working boot on Debian testing

Comment 6 Andrea Bolognani 2016-07-07 17:48:03 UTC
Created attachment 1177406 [details]
Failing boot on Debian testing

Comment 9 Richard W.M. Jones 2016-07-07 22:15:15 UTC
For whatever reason the virt-builder Fedora 24 aarch64 image boots fine
from a "blank" varstore.

Laszlo asked me to post the libvirt XML which unfortunately I don't have
(transient guest - now deleted), but I have the next best thing which is
the qemu command line from the log file:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-aarch64 -name tmp-f24,debug-threads=on -S -machine virt-2.6,accel=kvm,usb=off -cpu host -drive file=/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/tmp/tmp-f24.qcow2.nvram,if=pflash,format=raw,unit=1 -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 80568c6b-441c-4b58-8593-8499df33fabf -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-tmp-f24/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device virtio-scsi-device,id=scsi0 -usb -drive file=/var/tmp/tmp-f24.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-device,netdev=hostnet0,id=net0,mac=52:54:00:b9:42:fd -serial pty -msg timestamp=on

The /var/tmp/tmp-f24.qcow2.nvram file mentioned there is a copy of
/usr/share/edk2/aarch64/vars-template-pflash.raw, from
edk2-aarch64-20160418gita8c39ba-1.fc24.noarch .

Comment 10 Laszlo Ersek 2016-07-08 10:13:46 UTC
The problem is that ArmVirtPkg at the moment lacks the equivalent of OvmfPkg commit 14b2ebc30c8b.

And, Rich is not experiencing the bug because his command line specifies a scsi-hd device on a virtio-scsi-device HBA, while Andrea used a virtio-blk-device disk. QemuBootOrderLib in edk2 (used by both OVMF and AAVMF) works a bit differently with the OpenFirmware device paths that correspond to one vs. the other of these two device types. In Rich's case, the issue was masked.

Comment 11 Laszlo Ersek 2016-07-08 10:32:40 UTC
Posted upstream patch:
http://thread.gmane.org/gmane.comp.bios.edk2.devel/14298

Comment 12 Laszlo Ersek 2016-07-08 11:19:10 UTC
Upstream commit efadd41590b4.

Comment 14 Andrea Bolognani 2016-07-08 13:14:36 UTC
I tried the scratch build you provided, and it works as
advertised: both Fedora 24 and RHEL 7.3 recover from having
the NVRAM overwritten by the template, and from the AAVMF
log I can clearly see it's fallback.efi's doing.

Debian, on the other hand, not having any fallback.efi or
even BOOTAA64.EFI, still drops to the AAVMF shell.

The only thing that I should note is that, when it drops to
shell, it takes way longer than before to do so, and in
fact I thought it was stuck completely at first. But it got
there eventually, after what felt like 30 seconds or so.

Comment 16 Laszlo Ersek 2016-07-08 14:05:37 UTC
(In reply to Andrea Bolognani from comment #14)
> I tried the scratch build you provided, and it works as
> advertised: both Fedora 24 and RHEL 7.3 recover from having
> the NVRAM overwritten by the template, and from the AAVMF
> log I can clearly see it's fallback.efi's doing.

Thanks for the test! I'll post the backport soon.

Comment 23 Miroslav Rezanina 2016-07-12 07:16:01 UTC
Fix included in ovmf-20160608-2.git988715a.el7

Comment 26 Laszlo Ersek 2017-10-10 10:56:57 UTC
Fixed in RHEL-ALT-7.3.