Description of problem:
Attempting to boot an armv7 disk image drops to dracut with an error that the disk does not exist.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. curl -O https://dl.fedoraproject.org/pub/fedora/linux/releases/28/Spins/armhfp/images/Fedora-Minimal-armhfp-28-1.1-sda.raw.xz
2. unxz Fedora-Minimal-armhfp-28-1.1-sda.raw.xz
3. virt-builder --get-kernel Fedora-Minimal-armhfp-28-1.1-sda.raw
4. sudo mv Fedora-Minimal-armhfp-28-1.1-sda.raw initramfs-4.16.3-301.fc28.armv7hl.img vmlinuz-4.16.3-301.fc28.armv7hl /var/lib/libvirt/images/
5. sudo virt-install --name Fedora-Minimal-armhfp-28-1.1-sda.raw --ram 4096 --arch armv7l --import --os-variant fedora22 \
--disk /var/lib/libvirt/images/Fedora-Minimal-armhfp-28-1.1-sda.raw \
--boot kernel=/var/lib/libvirt/images/vmlinuz-4.16.3-301.fc28.armv7hl,initrd=/var/lib/libvirt/images/initramfs-4.16.3-301.fc28.armv7hl.img,kernel_args="console=ttyAMA0 rw root=LABEL=_/ rootwait"
[ 202.492461] dracut-initqueue: Warning: dracut-initqueue timeout - starting timeout scripts
[ 202.495303] dracut-initqueue: Warning: Could not boot.
Starting Setup Virtual Console...
[ OK ] Started Setup Virtual Console.
[ 204.918542] kauditd_printk_skb: 3 callbacks suppressed
[ 204.918545] audit: type=1130 audit(1537978726.743:14): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-vconsole-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Starting Dracut Emergency Shell...
[ 204.929870] audit: type=1131 audit(1537978726.749:15): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-vconsole-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Warning: /dev/disk/by-label/_x2f does not exist
Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.
Module Size Used by
dm_multipath 24576 0
crc32_arm_ce 16384 0
gpio_keys 20480 0
virtio_mmio 16384 0
virtio 16384 1 virtio_mmio
virtio_ring 24576 1 virtio_mmio
Downgrading to qemu-2.12.0-4.fc29 works as expected after recreating the vm due to an error with the machine type (virt-3.0).
Created attachment 1487390 [details]
I wonder if this isn't similar to the DT issue that was seen on power64
Proposing as a blocker for f29 final, criteria "The release must be able host virtual guest instances of the same release."
I will try and reproduce tomorrow
Here's the culprit:
Author: Eric Auger <firstname.lastname@example.org>
Date: Fri Jun 22 13:28:37 2018 +0100
hw/arm/virt: Use 256MB ECAM region by default
With this patch, virt-3.0 machine uses a new 256MB ECAM region
by default instead of the legacy 16MB one, if highmem is set
(LPAE supported by the guest) and (!firmware_loaded || aarch64).
Indeed aarch32 mode FW may not support this high ECAM region.
Signed-off-by: Eric Auger <email@example.com>
Reviewed-by: Laszlo Ersek <firstname.lastname@example.org>
Reviewed-by: Andrew Jones <email@example.com>
Signed-off-by: Peter Maydell <firstname.lastname@example.org>
It's tied to -machine virt-3.0 and later, so -M virt-2.12 for example will work. This 'workaround' patch makes things work but it's just testing:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 281ddcdf6e..cad6074927 100644
@@ -1505,6 +1505,7 @@ static void machvirt_init(MachineState *machine)
vms->highmem_ecam &= vms->highmem && (!firmware_loaded || aarch64);
+ vms->highmem_ecam = 0;
CCing Eric, Drew, Laszlo. Do you one of you guys know what's going on?
we discussed this on the upstream QEMU list. The most important message is the following:
Let me elaborate.
(1) On the QEMU level, the "highmem" machine property would originally control *only* whether the 64-bit PCI MMIO aperture -- for allocating the MMIO BARs of PCI devices -- would be exposed to the guest. Dependent on "highmem", this would occur *in addition* to the 32-bit MMIO aperture, or not occur. In either case, the 32-bit MMIO aperture would be present. IOW, the 64-bit aperture was an optional *addition*, controlled by "highmem".
(2) 32-bit guest kernels need to be built with LPAE support in order to utilize the 64-bit aperture. This guest kernel config is not mandatory, therefore the "highmem" property was designed as follows: "it defaults to 'on', and should your guest kernel lack LPAE support, you are responsible for setting it to 'off'".
(3) In the series that contains the commit you identified, Eric extended the meaning of "highmem" (after discussion with the community). "highmem" would no longer *only* control whether the optional, additional 64-bit MMIO aperture would be present, but it would also *move* the memory-mapped PCI Express config space area ("ECAM") from under 4GB, above 4GB (while enlarging it too).
Here the argument (seen in Peter's message above) was that the 32-bit guest kernel needs the exact same "LPAE support" config option for using the high (*moved*) ECAM as for using the 64-bit MMIO aperture.
(4) Grepping the "src" subdirectory of the libvirtd source, at commit 8ba65c4d9571 ("qemu: fix up permissions for pre-created UNIX sockets", 2018-10-03), I find no hits for "highmem". The git log has no match either.
This tells me that libvirt has never specified "highmem=off", regardless of LPAE support in 32-bit ARM guest kernels. While earlier this might have worked by chance -- due to the additional 64-bit MMIO aperture apparently not tripping up anything, despite the guest kernel being unable to access it, due to lack of LPA --, this is no longer the case. If libvirt doesn't clear "highmem", then the ECAM range will *move* from under 4GB above it, and the guest kernel -- if it lacks LPAE support -- will fail to probe any PCI devices at all.
In brief, it means that the original design
if your 32-bit ARM guest kernel lacks LPAE, then set highmem=off
is now "enforced".
Interestingly, checking the latest F28 kernel build for 32-bit ARM:
I see there are separate RPMs (subpackages) for "with LPAE" and "without LPAE":
Indeed, "lib/modules/4.18.11-200.fc28.armv7hl/config" in the former says,
> # CONFIG_ARM_LPAE is not set
while in the latter, it says
So, if "Fedora-Minimal-armhfp-28-1.1-sda.raw.xz" (from comment 0) contains the former kernel, then it should be booted with "-machine highmem=off" (at least on the "virt-3.0" machine type).
... Yup, I've downloaded the disk image from the URL cited in comment 0, and checked it with "guestfish". The file called "/config-4.16.3-301.fc28.armv7hl" on the 2nd partition of the disk image has
> # CONFIG_ARM_LPAE is not set
(Note: there is a bit more to the "high ECAM default" than I described above. In addition to pre-3.0 machine types, and to the manual highmem=off setting, the high ECAM also gets disabled if the VCPU is 32-bit *and* it uses externally provided firmware (in practice, UEFI).
However, this special case is not relevant here, as the report in comment 0 doesn't include UEFI firmware.)
(In reply to Laszlo Ersek from comment #8)
> ... Yup, I've downloaded the disk image from the URL cited in comment 0, and
> checked it with "guestfish". The file called
> "/config-4.16.3-301.fc28.armv7hl" on the 2nd partition of the disk image has
> > # CONFIG_ARM_LPAE is not set
Yes, we have both because devices that don't support LPAE have a big performance hit running a LPAE kernel. Anything that is Cortex-A9/A8 and a number of others don't support LPAE as that was only introduced with the Cortex A7/A15 and later designs (the cortex numbering is not linear in terms of features).
> So, if "Fedora-Minimal-armhfp-28-1.1-sda.raw.xz" (from comment 0) contains
> the former kernel, then it should be booted with "-machine highmem=off" (at
> least on the "virt-3.0" machine type).
But how do you know whether a kernel supports LPAE or not until you actually start booting it. There's actually a number of PCI-E devices, not even virtual, that don't work without bounce buffers to get to 64 bit address space.
(In reply to Peter Robinson from comment #11)
> > So, if "Fedora-Minimal-armhfp-28-1.1-sda.raw.xz" (from comment 0)
> > contains the former kernel, then it should be booted with "-machine
> > highmem=off" (at least on the "virt-3.0" machine type).
> But how do you know whether a kernel supports LPAE or not until you
> actually start booting it.
QEMU leaves this decision to the user. As in, "know your guest kernel".
At the level of libvirt (and of other management tools), such QEMU knobs are
usually exposed as XML elements and attributes in the domain schema.
Then, in order to save the end-user the trouble of manual configuration (in
the domain XML), at least when using well-known guests, the libosinfo
project provides "sane defaults" dependent on the guest distro.
(E.g. libosinfo knows whether a guest distribution has drivers for
virtio-1.0 devices, or only for virtio-0.9.5. Same for XHCI (USB3). And so
The distro selection key for libosinfo is derived by the management tools
(such as virt-install, virt-manager)
- either automatically (they recognize the ISO, for example),
- or from user information (the "--os-variant fedora22" option from
> There's actually a number of PCI-E devices, not even virtual, that don't
> work without bounce buffers to get to 64 bit address space.
I'm sorry, I don't follow. Can you please elaborate?
Bounce buffers work around device address width limitations, for DMA
purposes (i.e., bi-lateral access to RAM, by CPU and by device).
LPAE mitigates CPU address width limitations (i.e., uni-lateral access to
RAM, ECAM, and 64-bit MMIO BARs, by the CPU). The "highmem" machine type
property similarly controls where ECAM and 64-bit MMIO BARs can be placed.
LPAE and the corresponding "highmem" QEMU property are not related to DMA,
or to device address width.
So I don't really know what the way forward is here for F29. virt-install should continue to work if you pass --machine 2.11, so there's a temp workaround. F29+ with UEFI support we should be able to drop the virt-builder/kernel step altogether so that too should cover a lot of cases. There's the qemu workaround patch but that breaks compat with upstream qemu...
Regardless it's quite annoying that anytime someone wants to run stock arm kernel with qemu they will need to know to pass the magic highmem=off... is there no way to automatically determine if its needed, or make kernels fail with an error in this case?
the kernel already fails with an error, when it realizes it cannot access ECAM (and, as a result, it will fail to probe any PCIE devices). Please see the guest dmesg attached to comment 1:
> [ 4.488278] pci-host-generic 4010000000.pcie: can't claim ECAM area [mem 0x10000000-0x1fffffff]: address conflict with pcie@10000000 [mem 0x10000000-0x3efeffff]
> [ 4.491662] pci-host-generic: probe of 4010000000.pcie failed with error -16
The guest-physical address 0x40_1000_0000 is where the high ECAM area starts. (See also QEMU commit 601d626d148a, "hw/arm/virt: Add a new 256MB ECAM region", 2018-06-22; part of the same series.) The kernel truncates the address to 32 bits (0x1000_0000) and then complains.
I don't know how the highmem value could be deduced automatically. Normally such knobs belong to libosinfo.
Referring to the release criterion cited in comment 3, i.e. "the release must be able host virtual guest instances of the same release", I think that might be satisfied here. On the appropriate "known issues" page in the Fedora Wiki, we could state:
For running the 32-bit ARM images as QEMU/KVM libvirt domains, pass the following option to virt-install:
Discussed during the 2018-10-08 blocker review meeting: 
The decision to classify this bug as an AcceptedFreezeException and RejectedBlocker was made:
"It seems the story here is complex and a simple 'fix' may not be possible. given that, we reject it as a blocker as it *is* possible to run ARM-on-ARM virt, it just requires a non-default arg for some guest cases. We grant an FE for any simple, testable mitigation that appears, and will document the issue in Common Bugs"
*** Bug 1654225 has been marked as a duplicate of this bug. ***