Bug 1684466
Summary: | boot rhel8 guest failed with 98 virtio disks that using multifunction of pcie-root-port | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yiqian Wei <yiwei> | ||||||||||
Component: | qemu-kvm | Assignee: | Sergio Lopez <slopezpa> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Yiqian Wei <yiwei> | ||||||||||
Severity: | low | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 8.0 | CC: | chayang, coli, ddepaula, jinzhao, juzhang, rbalakri, slopezpa, virt-maint, xuwei, yiwei | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 8.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2019-11-06 07:13:36 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Yiqian Wei
2019-03-01 10:40:57 UTC
Created attachment 1539786 [details]
boot guest with cmd
Created attachment 1539789 [details]
Full console log
Actual results:
[ 185.591489] random: fast init done
[ TIME ] Timed out waiting for device dev-ma…d\x2d74\x2d\x2d184\x2dswap.device.
[DEPEND] Dependency failed for Resume from h…/dev/mapper/rhel_vm--74--184-swap.
[ OK ] Reached target Local File Systems (Pre).
[ OK ] Reached target Local File Systems.
Starting Create Volatile Files and Directories...
[ OK ] Started Create Volatile Files and Directories.
[ OK ] Reached target System Initialization.
[ OK ] Reached target Basic System.
[ 318.489514] dracut-initqueue[392]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 319.072988] dracut-initqueue[392]: Warning: dracut-initqueue timeout - starting timeout scripts
---
[ 384.061781] dracut-initqueue[392]: Warning: dracut-initqueue timeout - starting timeout scripts
[ 384.064983] dracut-initqueue[392]: Warning: Could not boot.
Starting Setup Virtual Console...
[ OK ] Started Setup Virtual Console.
Starting Dracut Emergency Shell...
Warning: /dev/mapper/rhel_vm--74--184-root does not exist
Warning: /dev/rhel_vm-74-184/root does not exist
Warning: /dev/rhel_vm-74-184/swap does not exist
Generating "/run/initramfs/rdsosreport.txt"
Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.
dracut:/#
Created attachment 1539790 [details]
/run/initramfs/rdsosreport.txt
I've reproduced the issue here, identifying two problems: - Extremely slow PCI device initialization. This is caused by the intel-iommu creating a Flatview for each device, and it's being worked on upstream. We should probably create a separate BZ for tracking this. - The Guest fails to initialize the AHCI PCI device which holds the boot disk. The problem here is that the Guest runs out of IRQ vectors (-28 is ENOSPC), with each virtio device consuming two (config + vq). I think this is more a limitation than an actual bug, and we should simply document it somewhere. Upstream has a patch addressing this issue, included in qemu-4.0: commit 4b519ef1de9a7cb8123abadab9e6c5697373087c Author: Peter Xu <peterx> Date: Wed Mar 13 17:43:23 2019 +0800 intel-iommu: optimize nodmar memory regions Previously we have per-device system memory aliases when DMAR is disabled by the system. It will slow the system down if there are lots of devices especially when DMAR is disabled, because each of the aliased system address space will contain O(N) slots, and rendering such N address spaces will be O(N^2) complexity. This patch introduces a shared nodmar memory region and for each device we only create an alias to the shared memory region. With the aliasing, QEMU memory core API will be able to detect when devices are sharing the same address space (which is the nodmar address space) when rendering the FlatViews and the total number of FlatViews can be dramatically reduced when there are a lot of devices. Suggested-by: Paolo Bonzini <pbonzini> Signed-off-by: Peter Xu <peterx> Message-Id: <20190313094323.18263-1-peterx> Signed-off-by: Paolo Bonzini <pbonzini> (In reply to Sergio Lopez from comment #4) Hi Sergio I had tested this issue with the fixed version. following are detailed test result. > I've reproduced the issue here, identifying two problems: > > - Extremely slow PCI device initialization. This is caused by the > intel-iommu creating a Flatview for each device, and it's being worked on > upstream. We should probably create a separate BZ for tracking this. Compared with comment0, more fast when PCI device initializtion. detailed infomation, you can check attachment(fixed log) Is it the accepted result? If yes, we can verify this issue according to the test result of fixed version. > > - The Guest fails to initialize the AHCI PCI device which holds the boot > disk. The problem here is that the Guest runs out of IRQ vectors (-28 is > ENOSPC), with each virtio device consuming two (config + vq). I think this > is more a limitation than an actual bug, and we should simply document it > somewhere. For above issue, guest also login failed when boot up with 98 disks. I also tried it with RHEL7.7 guest according to comment0, guest can boot up successfully with 98 disks. I am not sure above issue you metioned is same with https://bugzilla.redhat.com/show_bug.cgi?id=1526370, could you help to check it? If yes, we can track above issue through bz1526370. Thanks Jing Created attachment 1579564 [details]
fixed log
(In reply to Yiqian Wei from comment #7) > (In reply to Sergio Lopez from comment #4) > Hi Sergio > > I had tested this issue with the fixed version. following are detailed test > result. > > > I've reproduced the issue here, identifying two problems: > > > > - Extremely slow PCI device initialization. This is caused by the > > intel-iommu creating a Flatview for each device, and it's being worked on > > upstream. We should probably create a separate BZ for tracking this. > > Compared with comment0, more fast when PCI device initializtion. detailed > infomation, you can check attachment(fixed log) > Is it the accepted result? If yes, we can verify this issue according to the > test result of fixed version. Yes, we can consider it verified. > > - The Guest fails to initialize the AHCI PCI device which holds the boot > > disk. The problem here is that the Guest runs out of IRQ vectors (-28 is > > ENOSPC), with each virtio device consuming two (config + vq). I think this > > is more a limitation than an actual bug, and we should simply document it > > somewhere. > > For above issue, guest also login failed when boot up with 98 disks. > I also tried it with RHEL7.7 guest according to comment0, guest can boot up > successfully with 98 disks. This is a limitation on the guest (it's running out of interrupt vectors), so this isn't exactly a bug. We should probably just document it somewhere and move on. > I am not sure above issue you metioned is same with > https://bugzilla.redhat.com/show_bug.cgi?id=1526370, could you help to check > it? > > If yes, we can track above issue through bz1526370. No, it isn't exactly the same. Thanks, Sergio. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723 |