Bug 1633328 - Armv7 guest fails to boot with qemu-3.0.0-1
Summary: Armv7 guest fails to boot with qemu-3.0.0-1
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 29
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException RejectedBlocker
Keywords:
Depends On:
Blocks: ARMTracker F29FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2018-09-26 17:04 UTC by Paul Whalen
Modified: 2018-11-28 12:23 UTC (History)
16 users (show)

(edit)
Clone Of:
(edit)
Last Closed:


Attachments (Terms of Use)
F29 QEMU-3.0.0-1 (28.64 KB, text/plain)
2018-09-26 17:08 UTC, Paul Whalen
no flags Details

Description Paul Whalen 2018-09-26 17:04:19 UTC
Description of problem:
Attempting to boot an armv7 disk image drops to dracut with an error that the disk does not exist. 


Version-Release number of selected component (if applicable):
qemu-3.0.0-1.fc29

How reproducible:
every time

Steps to Reproduce:
1. curl -O https://dl.fedoraproject.org/pub/fedora/linux/releases/28/Spins/armhfp/images/Fedora-Minimal-armhfp-28-1.1-sda.raw.xz
2. unxz Fedora-Minimal-armhfp-28-1.1-sda.raw.xz
3. virt-builder --get-kernel Fedora-Minimal-armhfp-28-1.1-sda.raw
4. sudo mv Fedora-Minimal-armhfp-28-1.1-sda.raw initramfs-4.16.3-301.fc28.armv7hl.img vmlinuz-4.16.3-301.fc28.armv7hl /var/lib/libvirt/images/
5. sudo virt-install --name Fedora-Minimal-armhfp-28-1.1-sda.raw --ram 4096 --arch armv7l --import --os-variant fedora22 \
                     --disk /var/lib/libvirt/images/Fedora-Minimal-armhfp-28-1.1-sda.raw \
                     --boot kernel=/var/lib/libvirt/images/vmlinuz-4.16.3-301.fc28.armv7hl,initrd=/var/lib/libvirt/images/initramfs-4.16.3-301.fc28.armv7hl.img,kernel_args="console=ttyAMA0 rw root=LABEL=_/ rootwait"

Actual results:

[  202.492461] dracut-initqueue[337]: Warning: dracut-initqueue timeout - starting timeout scripts
[  202.495303] dracut-initqueue[337]: Warning: Could not boot.
         Starting Setup Virtual Console...
[  OK  ] Started Setup Virtual Console.
[  204.918542] kauditd_printk_skb: 3 callbacks suppressed
[  204.918545] audit: type=1130 audit(1537978726.743:14): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-vconsole-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
         Starting Dracut Emergency Shell...
[  204.929870] audit: type=1131 audit(1537978726.749:15): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-vconsole-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Warning: /dev/disk/by-label/_x2f does not exist

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


dracut:/# blkid
dracut:/# lsmod
Module                  Size  Used by
dm_multipath           24576  0
crc32_arm_ce           16384  0
gpio_keys              20480  0
virtio_mmio            16384  0
virtio                 16384  1 virtio_mmio
virtio_ring            24576  1 virtio_mmio

Additional info:
Downgrading to qemu-2.12.0-4.fc29 works as expected after recreating the vm due to an error with the machine type (virt-3.0).

Comment 1 Paul Whalen 2018-09-26 17:08 UTC
Created attachment 1487390 [details]
F29 QEMU-3.0.0-1

Comment 2 Peter Robinson 2018-09-26 18:00:14 UTC
I wonder if this isn't similar to the DT issue that was seen on power64

https://bugzilla.redhat.com/show_bug.cgi?id=1624539

Comment 3 Paul Whalen 2018-10-02 18:38:22 UTC
Proposing as a blocker for f29 final, criteria "The release must be able host virtual guest instances of the same release."

Comment 4 Cole Robinson 2018-10-02 20:52:01 UTC
I will try and reproduce tomorrow

Comment 5 Cole Robinson 2018-10-03 17:20:12 UTC
Here's the culprit:

commit 17ec075a651a3f9613429c2d97018fce459ed943
Author: Eric Auger <eric.auger@redhat.com>
Date:   Fri Jun 22 13:28:37 2018 +0100

    hw/arm/virt: Use 256MB ECAM region by default
    
    With this patch, virt-3.0 machine uses a new 256MB ECAM region
    by default instead of the legacy 16MB one, if highmem is set
    (LPAE supported by the guest) and (!firmware_loaded || aarch64).
    
    Indeed aarch32 mode FW may not support this high ECAM region.
    
    Signed-off-by: Eric Auger <eric.auger@redhat.com>
    Reviewed-by: Laszlo Ersek <lersek@redhat.com>
    Reviewed-by: Andrew Jones <drjones@redhat.com>
    Message-id: 1529072910-16156-11-git-send-email-eric.auger@redhat.com
    Signed-off-by: Peter Maydell <peter.maydell@linaro.org>



It's tied to -machine virt-3.0 and later, so -M virt-2.12 for example will work. This 'workaround' patch makes things work but it's just testing:

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 281ddcdf6e..cad6074927 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1505,6 +1505,7 @@ static void machvirt_init(MachineState *machine)
     }
 
     vms->highmem_ecam &= vms->highmem && (!firmware_loaded || aarch64);
+    vms->highmem_ecam = 0;
 
     create_rtc(vms, pic);


CCing Eric, Drew, Laszlo. Do you one of you guys know what's going on?

Comment 6 Laszlo Ersek 2018-10-04 08:50:49 UTC
Hi Cole,

we discussed this on the upstream QEMU list. The most important message is the following:

http://mid.mail-archive.com/CAFEAcA-hobu_=VKhYQ+UZ1WX_JFfwCarkPKQhiBtXc3fre57rg@mail.gmail.com

Let me elaborate.

(1) On the QEMU level, the "highmem" machine property would originally control *only* whether the 64-bit PCI MMIO aperture -- for allocating the MMIO BARs of PCI devices -- would be exposed to the guest. Dependent on "highmem", this would occur *in addition* to the 32-bit MMIO aperture, or not occur. In either case, the 32-bit MMIO aperture would be present. IOW, the 64-bit aperture was an optional *addition*, controlled by "highmem".

(2) 32-bit guest kernels need to be built with LPAE support in order to utilize the 64-bit aperture. This guest kernel config is not mandatory, therefore the "highmem" property was designed as follows: "it defaults to 'on', and should your guest kernel lack LPAE support, you are responsible for setting it to 'off'".

(3) In the series that contains the commit you identified, Eric extended the meaning of "highmem" (after discussion with the community). "highmem" would no longer *only* control whether the optional, additional 64-bit MMIO aperture would be present, but it would also *move* the memory-mapped PCI Express config space area ("ECAM") from under 4GB, above 4GB (while enlarging it too).

Here the argument (seen in Peter's message above) was that the 32-bit guest kernel needs the exact same "LPAE support" config option for using the high (*moved*) ECAM as for using the 64-bit MMIO aperture.

(4) Grepping the "src" subdirectory of the libvirtd source, at commit 8ba65c4d9571 ("qemu: fix up permissions for pre-created UNIX sockets", 2018-10-03), I find no hits for "highmem". The git log has no match either.

This tells me that libvirt has never specified "highmem=off", regardless of LPAE support in 32-bit ARM guest kernels. While earlier this might have worked by chance -- due to the additional 64-bit MMIO aperture apparently not tripping up anything, despite the guest kernel being unable to access it, due to lack of LPA --, this is no longer the case. If libvirt doesn't clear "highmem", then the ECAM range will *move* from under 4GB above it, and the guest kernel -- if it lacks LPAE support -- will fail to probe any PCI devices at all.

In brief, it means that the original design

  if your 32-bit ARM guest kernel lacks LPAE, then set highmem=off

is now "enforced".

Comment 7 Laszlo Ersek 2018-10-04 08:58:19 UTC
Interestingly, checking the latest F28 kernel build for 32-bit ARM:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1148681

I see there are separate RPMs (subpackages) for "with LPAE" and "without LPAE":

- kernel-core-4.18.11-200.fc28.armv7hl.rpm
- kernel-lpae-core-4.18.11-200.fc28.armv7hl.rpm

Indeed, "lib/modules/4.18.11-200.fc28.armv7hl/config" in the former says,

> # CONFIG_ARM_LPAE is not set

while in the latter, it says

> CONFIG_ARM_LPAE=y

So, if "Fedora-Minimal-armhfp-28-1.1-sda.raw.xz" (from comment 0) contains the former kernel, then it should be booted with "-machine highmem=off" (at least on the "virt-3.0" machine type).

Comment 8 Laszlo Ersek 2018-10-04 09:01:50 UTC
... Yup, I've downloaded the disk image from the URL cited in comment 0, and checked it with "guestfish". The file called "/config-4.16.3-301.fc28.armv7hl" on the 2nd partition of the disk image has

> # CONFIG_ARM_LPAE is not set

Comment 9 Laszlo Ersek 2018-10-04 09:10:52 UTC
(Note: there is a bit more to the "high ECAM default" than I described above. In addition to pre-3.0 machine types, and to the manual highmem=off setting, the high ECAM also gets disabled if the VCPU is 32-bit *and* it uses externally provided firmware (in practice, UEFI).

However, this special case is not relevant here, as the report in comment 0 doesn't include UEFI firmware.)

Comment 10 Peter Robinson 2018-10-04 09:32:29 UTC
(In reply to Laszlo Ersek from comment #8)
> ... Yup, I've downloaded the disk image from the URL cited in comment 0, and
> checked it with "guestfish". The file called
> "/config-4.16.3-301.fc28.armv7hl" on the 2nd partition of the disk image has
> 
> > # CONFIG_ARM_LPAE is not set

Yes, we have both because devices that don't support LPAE have a big performance hit running a LPAE kernel. Anything that is Cortex-A9/A8 and a number of others don't support LPAE as that was only introduced with the Cortex A7/A15 and later designs (the cortex numbering is not linear in terms of features).

Comment 11 Peter Robinson 2018-10-04 09:34:31 UTC
> So, if "Fedora-Minimal-armhfp-28-1.1-sda.raw.xz" (from comment 0) contains
> the former kernel, then it should be booted with "-machine highmem=off" (at
> least on the "virt-3.0" machine type).

But how do you know whether a kernel supports LPAE or not until you actually start booting it. There's actually a number of PCI-E devices, not even virtual, that don't work without bounce buffers to get to 64 bit address space.

Comment 12 Laszlo Ersek 2018-10-04 11:34:39 UTC
(In reply to Peter Robinson from comment #11)
> > So, if "Fedora-Minimal-armhfp-28-1.1-sda.raw.xz" (from comment 0)
> > contains the former kernel, then it should be booted with "-machine
> > highmem=off" (at least on the "virt-3.0" machine type).
>
> But how do you know whether a kernel supports LPAE or not until you
> actually start booting it.

QEMU leaves this decision to the user. As in, "know your guest kernel".

At the level of libvirt (and of other management tools), such QEMU knobs are
usually exposed as XML elements and attributes in the domain schema.

Then, in order to save the end-user the trouble of manual configuration (in
the domain XML), at least when using well-known guests, the libosinfo
project provides "sane defaults" dependent on the guest distro.

(E.g. libosinfo knows whether a guest distribution has drivers for
virtio-1.0 devices, or only for virtio-0.9.5. Same for XHCI (USB3). And so
on.)

The distro selection key for libosinfo is derived by the management tools
(such as virt-install, virt-manager)
- either automatically (they recognize the ISO, for example),
- or from user information (the "--os-variant fedora22" option from
  comment 0).

> There's actually a number of PCI-E devices, not even virtual, that don't
> work without bounce buffers to get to 64 bit address space.

I'm sorry, I don't follow. Can you please elaborate?

Bounce buffers work around device address width limitations, for DMA
purposes (i.e., bi-lateral access to RAM, by CPU and by device).

LPAE mitigates CPU address width limitations (i.e., uni-lateral access to
RAM, ECAM, and 64-bit MMIO BARs, by the CPU). The "highmem" machine type
property similarly controls where ECAM and 64-bit MMIO BARs can be placed.
LPAE and the corresponding "highmem" QEMU property are not related to DMA,
or to device address width.

Comment 13 Cole Robinson 2018-10-05 17:16:10 UTC
So I don't really know what the way forward is here for F29. virt-install should continue to work if you pass --machine 2.11, so there's a temp workaround. F29+ with UEFI support we should be able to drop the virt-builder/kernel step altogether so that too should cover a lot of cases. There's the qemu workaround patch but that breaks compat with upstream qemu...

Regardless it's quite annoying that anytime someone wants to run stock arm kernel with qemu they will need to know to pass the magic highmem=off... is there no way to automatically determine if its needed, or make kernels fail with an error in this case?

Comment 14 Laszlo Ersek 2018-10-05 19:56:45 UTC
Hi Cole,

the kernel already fails with an error, when it realizes it cannot access ECAM (and, as a result, it will fail to probe any PCIE devices). Please see the guest dmesg attached to comment 1:

> [    4.488278] pci-host-generic 4010000000.pcie: can't claim ECAM area [mem 0x10000000-0x1fffffff]: address conflict with pcie@10000000 [mem 0x10000000-0x3efeffff]
> [    4.491662] pci-host-generic: probe of 4010000000.pcie failed with error -16

The guest-physical address 0x40_1000_0000 is where the high ECAM area starts. (See also QEMU commit 601d626d148a, "hw/arm/virt: Add a new 256MB ECAM region", 2018-06-22; part of the same series.) The kernel truncates the address to 32 bits (0x1000_0000) and then complains.

I don't know how the highmem value could be deduced automatically. Normally such knobs belong to libosinfo.


Referring to the release criterion cited in comment 3, i.e. "the release must be able host virtual guest instances of the same release", I think that might be satisfied here. On the appropriate "known issues" page in the Fedora Wiki, we could state:

"""
For running the 32-bit ARM images as QEMU/KVM libvirt domains, pass the following option to virt-install:
--qemu-commandline='-machine highmem=off'
"""

Comment 15 František Zatloukal 2018-10-08 18:01:41 UTC
Discussed during the 2018-10-08 blocker review meeting: [1]

The decision to classify this bug as an AcceptedFreezeException and RejectedBlocker was made:

"It seems the story here is complex and a simple 'fix' may not be possible. given that, we reject it as a blocker as it *is* possible to run ARM-on-ARM virt, it just requires a non-default arg for some guest cases. We grant an FE for any simple, testable mitigation that appears, and will document the issue in Common Bugs"

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-10-08/f29-blocker-review.2018-10-08-16.00.log.txt

Comment 16 Laszlo Ersek 2018-11-28 12:23:01 UTC
*** Bug 1654225 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.