Bug 1983208
| Summary: | i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Terry Bowman (AMD) <tbowman> | |
| Component: | qemu-kvm | Assignee: | John Allen (AMD) <johnalle> | |
| qemu-kvm sub component: | CPU Models | QA Contact: | Yanghang Liu <yanghliu> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | alex.williamson, chayang, ctatman, imammedo, jinzhao, johnalle, jon.grimm, juzhang, mrezanin, mst, nilal, pradeepvineshreddy.kodamati, suravee.suthikulpanit, terry.bowman, virt-maint, wei.huang2, yanghliu, yfu | |
| Version: | 9.1 | Keywords: | Triaged | |
| Target Milestone: | beta | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | 9.2 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-7.2.0-1.el9 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1982898 | |||
| : | 2024367 (view as bug list) | Environment: | ||
| Last Closed: | 2023-05-09 07:19:27 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1982898, 2135806 | |||
| Bug Blocks: | 1950418, 2024367 | |||
|
Description
Terry Bowman (AMD)
2021-07-16 20:27:28 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage. Take care to resolve the cloned to bug 1982898 as well there wasn't any progress on the feature from AMD side upstream, moving it 9.1 for now Moved this BZ to virt-maint (backlog) while we wait for the upstream progress to be made. New upstream patch set on qemu-devel: 07/02 Joao Martins ( 0) [PATCH RFCv2 0/4] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU New upstream patch set on qemu-devel: 23/02 Joao Martins ( 0) [PATCH v3 0/6] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU New upstream patch set on qemu-devel: 20/04 Joao Martins (162) [PATCH v4 0/5] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU New upstream patch set on qemu-devel: 20/05 Joao Martins ( 0) [PATCH v5 0/5] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU New upstream set: 15/07 Joao Martins (219) [PATCH v8 00/11] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU This has now been merged in upstream qemu, e5b6555fb8e8a91dd1d6. Nitesh: What's the timeline here, it feels a bit late for 9.1 - do we backport or just sit back and wait for it to land in the 9.2 rebase? (I'm not sure if we have hardware to test it) Since the fix doesn't look trivial, we are approaching the freeze of 9.1 normal development cycle, and we may not have a hardware to test it, let's defer this to 9.2. Hi Terry, Can I assign this BZ to you? Since the patches are already upstream, they should come as part of the qemu rebase. Hence, no dev work should be required. However, we need an assignee who could help QE coordinate the testing with AMD (if required) or answer QE's questions. Thanks (In reply to Nitesh Narayan Lal from comment #18) > Hi Terry, Can I assign this BZ to you? > Since the patches are already upstream, they should come as part of the qemu > rebase. Hence, no dev work should be required. > However, we need an assignee who could help QE coordinate the testing with > AMD (if required) or answer QE's questions. > Thanks Hi Nitesh, Please assign to John Allen. Thanks, Terry. Assigning it to John. Will mark this as TestOnly once we have the QEMU rebase BZ. It looks to me as if this code is in our 9.2 initial backports; so it's looking promising. Do we need any firmware changes? On my new favourite AMD box, I've just created a 1.2T VM and passed a host PCIe device through - not thoroughly tested though yet. With qemu-kvm-7.0.0-13.el9 we get: 2022-12-08 21:37:20.340+0000: starting up libvirt version: 8.9.0, package: 2.el9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2022-11-02-10:18:30, ), qemu version: 7.0.0qemu-kvm-7.0.0-13.el9, kernel: 5.14.0-205.el9.x86_64, hostname: virtlab1023.lab.eng.rdu2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1/.config \ /usr/libexec/qemu-kvm \ -name guest=rhel9.1,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-rhel9.1/master-key.aes"}' \ -blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \ -machine pc-q35-rhel9.0.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -accel kvm \ -cpu host,migratable=on \ -global driver=cfi.pflash01,property=secure,value=on \ -m 1331200 \ -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1395864371200}' \ -overcommit mem-lock=off \ -smp 32,sockets=32,cores=1,threads=1 \ -uuid 3f7a7b7c-dae9-4098-87f6-2a32ce69739f \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=33,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \ -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \ -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \ -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \ -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \ -device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \ -device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \ -device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \ -device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \ -device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \ -device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \ -device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \ -device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \ -device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel-guest-image-9.1-20221027.3.x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \ -netdev tap,fd=34,vhost=on,vhostfd=36,id=hostnet0 \ -device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:10:d6:bf","bus":"pci.1","addr":"0x0"}' \ -chardev pty,id=charserial0 \ -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \ -chardev socket,id=charchannel0,fd=32,server=on,wait=off \ -device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \ -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \ -audiodev '{"id":"audio1","driver":"none"}' \ -vnc 127.0.0.1:0,audiodev=audio1 \ -device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \ -device '{"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}' \ -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \ -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/1 (label charserial0) 2022-12-08T21:37:20.674754Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}: VFIO_MAP_DMA failed: Invalid argument 2022-12-08T21:37:20.680306Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}: vfio 0000:63:00.0: failed to setup container for group 57: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55dadf9ad670, 0x100000000, 0x14480000000, 0x7e2c97e00000) = -22 (Invalid argument) 2022-12-08 21:37:20.727+0000: shutting down, reason=failed but with emu-kvm-7.1.0-5.el9 we get: /usr/libexec/qemu-kvm \ -name guest=rhel9.1,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-3-rhel9.1/master-key.aes"}' \ -blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \ -machine pc-q35-rhel9.0.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -accel kvm \ -cpu host,migratable=on \ -global driver=cfi.pflash01,property=secure,value=on \ -m 1331200 \ -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1395864371200}' \ -overcommit mem-lock=off \ -smp 32,sockets=32,cores=1,threads=1 \ -uuid 3f7a7b7c-dae9-4098-87f6-2a32ce69739f \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=34,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \ -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \ -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \ -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \ -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \ -device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \ -device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \ -device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \ -device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \ -device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \ -device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \ -device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \ -device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \ -device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel-guest-image-9.1-20221027.3.x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \ -netdev tap,fd=35,vhost=on,vhostfd=37,id=hostnet0 \ -device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:10:d6:bf","bus":"pci.1","addr":"0x0"}' \ -chardev pty,id=charserial0 \ -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \ -chardev socket,id=charchannel0,fd=33,server=on,wait=off \ -device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \ -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \ -audiodev '{"id":"audio1","driver":"none"}' \ -vnc 127.0.0.1:0,audiodev=audio1 \ -device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \ -device '{"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}' \ -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \ -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/1 (label charserial0) so it looks fine, and the guest sees it, and shows it in ip link. (Although I don't have a cable plugged in to test it yet) Alex: Can you please check that the qemu commandline shown in the previous comment is what I should have for testing VFIO devices (I'm passing one port of an X710 through) Note I foolishly also tried hotplugging it and the guest (after a long long set of timeouts from stuff getting paused during the memory locking) spay out: i40e 000:07:00.0: enabling device (0000 -> 00002) ... Cannot map registers, bar size 0x0 too small, aborting i40e: probe of 0000:07:00.0 failed with error -12 (In reply to Dr. David Alan Gilbert from comment #23) > Alex: > Can you please check that the qemu commandline shown in the previous > comment is what I should have for testing VFIO > devices (I'm passing one port of an X710 through) Looks fine to me, IIRC this support was transparent as far as the command line and I don't see anything special about the vfio-pci device specification, as it should be. Is there a migration compatibility issue here though? AIUI, this patch shifts the guest above-4G memory to way, way, way above 4G, past this host physical memory hole. Both the working an non-working VMs above use the pc-q35-rhel9.0.0 machine type, but the fact that one works and one doesn't suggests the memory layouts are different. > Note I foolishly also tried hotplugging it and the guest (after a long long > set of timeouts from > stuff getting paused during the memory locking) spay out: > > i40e 000:07:00.0: enabling device (0000 -> 00002) > ... Cannot map registers, bar size 0x0 too small, aborting > i40e: probe of 0000:07:00.0 failed with error -12 Taking a long time to pin the memory is not unexpected with a VM this large, but I have no explanation why the resulting device shows up with a zero sized BAR in the end. (In reply to Alex Williamson from comment #24) > (In reply to Dr. David Alan Gilbert from comment #23) > > Alex: > > Can you please check that the qemu commandline shown in the previous > > comment is what I should have for testing VFIO > > devices (I'm passing one port of an X710 through) > > Looks fine to me, IIRC this support was transparent as far as the command > line and I don't see anything special about the vfio-pci device > specification, as it should be. Great. > Is there a migration compatibility issue here though? AIUI, this patch > shifts the guest above-4G memory to way, way, way above 4G, past this host > physical memory hole. Both the working an non-working VMs above use the > pc-q35-rhel9.0.0 machine type, but the fact that one works and one doesn't > suggests the memory layouts are different. Right; there's a pcmc->enforce_amd_1tb_hole which is set on new machine types. > > Note I foolishly also tried hotplugging it and the guest (after a long long > > set of timeouts from > > stuff getting paused during the memory locking) spay out: > > > > i40e 000:07:00.0: enabling device (0000 -> 00002) > > ... Cannot map registers, bar size 0x0 too small, aborting > > i40e: probe of 0000:07:00.0 failed with error -12 > > Taking a long time to pin the memory is not unexpected with a VM this large, > but I have no explanation why the resulting device shows up with a zero > sized BAR in the end. OK, I'll see if it's repeatable and if so that's a separate bug. (Note I've not actually sent a packet on this device, since I don't have the cable plugged in yet, but I'll see what I can do) (In reply to Dr. David Alan Gilbert from comment #25) > (In reply to Alex Williamson from comment #24) > > (In reply to Dr. David Alan Gilbert from comment #23) > > > Alex: > > > Can you please check that the qemu commandline shown in the previous > > > comment is what I should have for testing VFIO > > > devices (I'm passing one port of an X710 through) > > > > Looks fine to me, IIRC this support was transparent as far as the command > > line and I don't see anything special about the vfio-pci device > > specification, as it should be. > > Great. > > > Is there a migration compatibility issue here though? AIUI, this patch > > shifts the guest above-4G memory to way, way, way above 4G, past this host > > physical memory hole. Both the working an non-working VMs above use the > > pc-q35-rhel9.0.0 machine type, but the fact that one works and one doesn't > > suggests the memory layouts are different. > > Right; there's a pcmc->enforce_amd_1tb_hole which is set on new machine > types. And I've just confirmed that on the RHEL9.2 7.2.0rc4 rebuild, this works nicely on the pc-q35-rhel9.2.0 but not on the older rhel9.0.0 machine type. Moving to ONQA since it semes to work in the rc release for me. Hi David, I feel a little confused about the current bug status. Is there any downstream qemu-kvm package for QE using to verify this bug ? (In reply to Yanghang Liu from comment #28) > Hi David, > > I feel a little confused about the current bug status. > > Is there any downstream qemu-kvm package for QE using to verify this bug ? You should find they've just landed in the 7.2.0 rpms created on the 16th and 20th. Dave, John, can one of you please also share the list of upstream commits so that we can add them to the devel dashboard as required by the QEMU rebase process. Making this BZ dependent on QEMU 7.2 rebase BZ and adding the fixed in version from qemu 7.2 rebase BZ (2135806). The Reproducer in qemu-kvm-7.0.0-13.el9.x86_64:
[1] import a domain with 1TB memory and a hostdev PF
# virt-install --machine=q35 --noreboot --name=rhel92 --memory=1048576 --vcpus=16 --graphics type=vnc,port=5992,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:92:92 --import --noautoconsole --disk path=/home/images/RHEL92.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --hostdev pci_0000_e2_00_0 --osinfo detect=on,require=off
[2] start the domain
# virsh start rhel92
error: Failed to start domain 'rhel92'
error: internal error: qemu unexpectedly closed the monitor: 2023-01-10T09:01:06.019103Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}: VFIO_MAP_DMA failed: Invalid argument
2023-01-10T09:01:06.024601Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}: vfio 0000:21:00.0: failed to setup container for group 29: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x5563efb50a40, 0x100000000, 0xff80000000, 0x7e8cbfe00000) = -22 (Invalid argument)
note: The same domain but with 4GB memory and a hostdev PF can be started successfully
(In reply to Yanghang Liu from comment #37) > The Reproducer in qemu-kvm-7.0.0-13.el9.x86_64: > > [1] import a domain with 1TB memory and a hostdev PF > # virt-install --machine=q35 --noreboot --name=rhel92 --memory=1048576 > --vcpus=16 --graphics type=vnc,port=5992,listen=0.0.0.0 --boot=uefi > --network bridge=switch,model=virtio,mac=52:54:00:00:92:92 --import > --noautoconsole --disk > path=/home/images/RHEL92.qcow2,bus=virtio,cache=none,format=qcow2,io=threads, > size=20 --hostdev pci_0000_e2_00_0 --osinfo detect=on,require=off sorry for a typo here. It's "--hostdev pci_0000_21_00_0" instead of "--hostdev pci_0000_e2_00_0" > [2] start the domain > # virsh start rhel92 > error: Failed to start domain 'rhel92' > error: internal error: qemu unexpectedly closed the monitor: > 2023-01-10T09:01:06.019103Z qemu-kvm: -device > {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3", > "addr":"0x0"}: VFIO_MAP_DMA failed: Invalid argument > 2023-01-10T09:01:06.024601Z qemu-kvm: -device > {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3", > "addr":"0x0"}: vfio 0000:21:00.0: failed to setup container for group 29: > memory listener initialization failed: Region pc.ram: > vfio_dma_map(0x5563efb50a40, 0x100000000, 0xff80000000, 0x7e8cbfe00000) = > -22 (Invalid argument) > > note: The same domain but with 4GB memory and a hostdev PF can be started successfully Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. The verification in qemu-kvm-7.2.0-2.el9.x86_64:
[1] import a domain with 1TB memory and a hostdev PF
# virt-install --machine=q35 --noreboot --name=rhel92 --memory=1048576 --vcpus=16 --graphics type=vnc,port=5992,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:92:92 --import --noautoconsole --disk path=/home/images/RHEL92.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --hostdev pci_0000_21_00_0 --osinfo detect=on,require=off
[2] start the domain <-- The domain with 1TB memory and a hostdev PF can be started successfully
# virsh start rhel92
[3] check the PF status in the domain
# ifconfig
enp3s0np0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 0c:42:a1:d1:d1:c4 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# dmesg
[ 6.058810] mlx5_core 0000:03:00.0: firmware version: 22.35.1012
[ 6.058878] mlx5_core 0000:03:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:00:02.2 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[ 6.423640] mlx5_core 0000:03:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 6.425136] mlx5_core 0000:03:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[ 6.435626] mlx5_core 0000:03:00.0: Port module event: module 0, Cable unplugged
[ 6.436919] mlx5_core 0000:03:00.0: mlx5_pcie_event:289:(pid 101): PCIe slot power capability was not advertised.
[ 6.454875] mlx5_core 0000:03:00.0: mlx5e: IPSec ESP acceleration enabled
[ 6.456300] mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 6.655209] mlx5_core 0000:03:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 6.826782] mlx5_core 0000:03:00.0 enp3s0np0: renamed from eth0
[ 7.496821] mlx5_core 0000:03:00.0 enp3s0np0: Link down
# lshw -c network -businfo
Bus info Device Class Description
=======================================================
pci@0000:03:00.0 enp3s0np0 network MT2892 Family [ConnectX-6 Dx]
Hi John, Could you please check comment 37 and comment 40 ? Is it enough for QE to verify this bug ? Feel free to let me know if you need QE to do more tests. Hi David,
The detail host info is as following:
host name: dell-per7525-26.lab.eng.pek2.redhat.com
memory size: 1.5T
CPU model: AMD EPYC-Rome
BIOS Model name: AMD EPYC 7713 64-Core Processor
CPU family: 25
Model: 1
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
Stepping: 1
kernel version: 5.14.0-228.el9.x86_64
Let me know if you want to know more details about the host :)
Move bug status to VERIFIED based on comment 37 and comment 40 (In reply to Yanghang Liu from comment #41) > Hi John, > > Could you please check comment 37 and comment 40 ? > > Is it enough for QE to verify this bug ? > > Feel free to let me know if you need QE to do more tests. It looks OK to me, but I'm not intimately familiar with the issue. Let me check with IOMMU SMEs here and see if there is any additional testing they would like to see for the bug. (In reply to John Allen (AMD) from comment #48) > (In reply to Yanghang Liu from comment #41) > > Hi John, > > > > Could you please check comment 37 and comment 40 ? > > > > Is it enough for QE to verify this bug ? > > > > Feel free to let me know if you need QE to do more tests. > > It looks OK to me, but I'm not intimately familiar with the issue. Let me > check with IOMMU SMEs here and see if there is any additional testing they > would like to see for the bug. The IOMMU SMEs got back to me and I think we're fine with the testing that has been done. No additional testing desired. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2162 |