RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1983208 - i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU
Summary: i386/pc: Fix creation of >= 1Tb guests on AMD systems with IOMMU
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: beta
: 9.2
Assignee: John Allen (AMD)
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On: 1982898 2135806
Blocks: amd9.0bugs, amdserver9.0bugs 2024367
TreeView+ depends on / blocked
 
Reported: 2021-07-16 20:27 UTC by Terry Bowman (AMD)
Modified: 2023-05-09 07:42 UTC (History)
18 users (show)

Fixed In Version: qemu-kvm-7.2.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1982898
: 2024367 (view as bug list)
Environment:
Last Closed: 2023-05-09 07:19:27 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2023:2162 0 None None None 2023-05-09 07:19:56 UTC

Description Terry Bowman (AMD) 2021-07-16 20:27:28 UTC
+++ This bug was initially created as a clone of Bug #1982898 +++

Short Description:
This series lets Qemu properly spawn i386 guests with >= 1Tb with VFIO, particularly when running on AMD systems with an IOMMU.

Upstream Patches (RFC):
https://lore.kernel.org/qemu-devel/20210622154905.30858-1-joao.m.martins@oracle.com/

Description of problem:

Since Linux v5.4, VFIO validates whether the IOVA in DMA_MAP ioctl is valid and it
will return -EINVAL on those cases. On x86, Intel hosts aren't particularly
affected by this extra validation. But AMD systems with IOMMU have a hole in
the 1TB boundary which is *reserved* for HyperTransport I/O addresses located
here  FD_0000_0000h - FF_FFFF_FFFFh. See IOMMU manual [1], specifically
section '2.1.2 IOMMU Logical Topology', Table 3 on what those addresses mean.

VFIO DMA_MAP calls in this IOVA address range fall through this check and hence return
 -EINVAL, consequently failing the creation the guests bigger than 1010G. Example
of the failure:

qemu-system-x86_64: -device vfio-pci,host=0000:41:10.1,bootindex=-1: VFIO_MAP_DMA: -22
qemu-system-x86_64: -device vfio-pci,host=0000:41:10.1,bootindex=-1: vfio 0000:41:10.1: 
	failed to setup container for group 258: memory listener initialization failed:
		Region pc.ram: vfio_dma_map(0x55ba53e7a9d0, 0x100000000, 0xff30000000, 0x7ed243e00000) = -22 (Invalid argument)

Prior to v5.4, we could map using these IOVAs *but* that's still not the right thing
to do and could trigger certain IOMMU events (e.g. INVALID_DEVICE_REQUEST), or
spurious guest VF failures from the resultant IOMMU target abort (see Errata 1155[2])
as documented on the links down below.

This series tries to address that by dealing with this AMD-specific 1Tb hole,
similarly to how we deal with the 4G hole today in x86 in general. 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 John Ferlan 2021-07-26 19:21:37 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Take care to resolve the cloned to bug 1982898 as well

Comment 2 Igor Mammedov 2022-01-06 08:52:48 UTC
there wasn't any progress on the feature from AMD side upstream,
moving it 9.1 for now

Comment 6 Nitesh Narayan Lal 2022-01-18 19:34:55 UTC
Moved this BZ to virt-maint (backlog) while we wait for the upstream progress to be made.

Comment 8 Dr. David Alan Gilbert 2022-02-08 19:58:23 UTC
New upstream patch set on qemu-devel:
07/02 Joao Martins      (  0) [PATCH RFCv2 0/4] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

Comment 9 Dr. David Alan Gilbert 2022-02-24 15:08:51 UTC
New upstream patch set on qemu-devel:
23/02 Joao Martins      (  0) [PATCH v3 0/6] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

Comment 10 Dr. David Alan Gilbert 2022-04-21 19:01:39 UTC
New upstream patch set on qemu-devel:
20/04 Joao Martins      (162) [PATCH v4 0/5] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

Comment 11 Dr. David Alan Gilbert 2022-06-09 14:43:44 UTC
New upstream patch set on qemu-devel:
20/05 Joao Martins      (  0) [PATCH v5 0/5] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

Comment 13 Dr. David Alan Gilbert 2022-07-18 11:01:51 UTC
New upstream set:
15/07 Joao Martins      (219) [PATCH v8 00/11] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

Comment 14 Dr. David Alan Gilbert 2022-07-27 09:13:57 UTC
This has now been merged in upstream qemu, e5b6555fb8e8a91dd1d6.

Nitesh: What's the timeline here, it feels a bit late for 9.1 - do we backport or just sit back and wait for it to land in the 9.2 rebase?
(I'm not sure if we have hardware to test it)

Comment 15 Nitesh Narayan Lal 2022-07-27 13:13:40 UTC
Since the fix doesn't look trivial, we are approaching the freeze of 9.1 normal development cycle, and we may not have a hardware to test it, let's defer this to 9.2.

Comment 18 Nitesh Narayan Lal 2022-08-24 18:08:21 UTC
Hi Terry, Can I assign this BZ to you?
Since the patches are already upstream, they should come as part of the qemu rebase. Hence, no dev work should be required.
However, we need an assignee who could help QE coordinate the testing with AMD (if required) or answer QE's questions.
Thanks

Comment 19 Terry Bowman (AMD) 2022-08-24 19:22:56 UTC
(In reply to Nitesh Narayan Lal from comment #18)
> Hi Terry, Can I assign this BZ to you?
> Since the patches are already upstream, they should come as part of the qemu
> rebase. Hence, no dev work should be required.
> However, we need an assignee who could help QE coordinate the testing with
> AMD (if required) or answer QE's questions.
> Thanks

Hi Nitesh,

Please assign to John Allen.

Comment 20 Nitesh Narayan Lal 2022-08-24 19:28:23 UTC
Thanks, Terry. Assigning it to John.
Will mark this as TestOnly once we have the QEMU rebase BZ.

Comment 21 Dr. David Alan Gilbert 2022-11-17 17:47:47 UTC
It looks to me as if this code is in our 9.2 initial backports; so it's looking promising.
Do we need any firmware changes?

Comment 22 Dr. David Alan Gilbert 2022-12-08 21:43:07 UTC
On my new favourite AMD box, I've just created a 1.2T VM and passed a host PCIe device through - not thoroughly tested though yet.

With qemu-kvm-7.0.0-13.el9
we get:

2022-12-08 21:37:20.340+0000: starting up libvirt version: 8.9.0, package: 2.el9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2022-11-02-10:18:30, ), qemu version: 7.0.0qemu-kvm-7.0.0-13.el9, kernel: 5.14.0-205.el9.x86_64, hostname: virtlab1023.lab.eng.rdu2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-rhel9.1/.config \
/usr/libexec/qemu-kvm \
-name guest=rhel9.1,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-rhel9.1/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.0.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu host,migratable=on \
-global driver=cfi.pflash01,property=secure,value=on \
-m 1331200 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1395864371200}' \
-overcommit mem-lock=off \
-smp 32,sockets=32,cores=1,threads=1 \
-uuid 3f7a7b7c-dae9-4098-87f6-2a32ce69739f \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=33,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \
-device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \
-device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \
-device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \
-device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \
-device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \
-device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \
-device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel-guest-image-9.1-20221027.3.x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \
-netdev tap,fd=34,vhost=on,vhostfd=36,id=hostnet0 \
-device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:10:d6:bf","bus":"pci.1","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev socket,id=charchannel0,fd=32,server=on,wait=off \
-device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \
-device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 127.0.0.1:0,audiodev=audio1 \
-device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \
-device '{"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}' \
-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
2022-12-08T21:37:20.674754Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}: VFIO_MAP_DMA failed: Invalid argument
2022-12-08T21:37:20.680306Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}: vfio 0000:63:00.0: failed to setup container for group 57: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55dadf9ad670, 0x100000000, 0x14480000000, 0x7e2c97e00000) = -22 (Invalid argument)
2022-12-08 21:37:20.727+0000: shutting down, reason=failed



but with emu-kvm-7.1.0-5.el9 we get:

/usr/libexec/qemu-kvm \
-name guest=rhel9.1,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-3-rhel9.1/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.0.0,usb=off,smm=on,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu host,migratable=on \
-global driver=cfi.pflash01,property=secure,value=on \
-m 1331200 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1395864371200}' \
-overcommit mem-lock=off \
-smp 32,sockets=32,cores=1,threads=1 \
-uuid 3f7a7b7c-dae9-4098-87f6-2a32ce69739f \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=34,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \
-device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \
-device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \
-device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \
-device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \
-device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \
-device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \
-device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.3","addr":"0x0"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel-guest-image-9.1-20221027.3.x86_64.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","bus":"pci.4","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1}' \
-netdev tap,fd=35,vhost=on,vhostfd=37,id=hostnet0 \
-device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:10:d6:bf","bus":"pci.1","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev socket,id=charchannel0,fd=33,server=on,wait=off \
-device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \
-device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 127.0.0.1:0,audiodev=audio1 \
-device '{"driver":"virtio-vga","id":"video0","max_outputs":1,"bus":"pcie.0","addr":"0x1"}' \
-device '{"driver":"vfio-pci","host":"0000:63:00.0","id":"hostdev0","bus":"pci.7","addr":"0x0"}' \
-device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)

so it looks fine, and the guest sees it, and shows it in ip link.
(Although I don't have a cable plugged in to test it yet)

Comment 23 Dr. David Alan Gilbert 2022-12-08 21:48:15 UTC
Alex: 
  Can you please check that the qemu commandline shown in the previous comment is what I should have for testing VFIO
devices (I'm passing one port of an X710 through)

Note I foolishly also tried hotplugging it and the guest (after a long long set of timeouts from
stuff getting paused during the memory locking) spay out:

  i40e 000:07:00.0: enabling device (0000 -> 00002)
  ... Cannot map registers, bar size 0x0 too small, aborting
  i40e: probe of 0000:07:00.0 failed with error -12

Comment 24 Alex Williamson 2022-12-08 22:43:00 UTC
(In reply to Dr. David Alan Gilbert from comment #23)
> Alex: 
>   Can you please check that the qemu commandline shown in the previous
> comment is what I should have for testing VFIO
> devices (I'm passing one port of an X710 through)

Looks fine to me, IIRC this support was transparent as far as the command line and I don't see anything special about the vfio-pci device specification, as it should be.

Is there a migration compatibility issue here though?  AIUI, this patch shifts the guest above-4G memory to way, way, way above 4G, past this host physical memory hole.  Both the working an non-working VMs above use the pc-q35-rhel9.0.0 machine type, but the fact that one works and one doesn't suggests the memory layouts are different.
 
> Note I foolishly also tried hotplugging it and the guest (after a long long
> set of timeouts from
> stuff getting paused during the memory locking) spay out:
> 
>   i40e 000:07:00.0: enabling device (0000 -> 00002)
>   ... Cannot map registers, bar size 0x0 too small, aborting
>   i40e: probe of 0000:07:00.0 failed with error -12

Taking a long time to pin the memory is not unexpected with a VM this large, but I have no explanation why the resulting device shows up with a zero sized BAR in the end.

Comment 25 Dr. David Alan Gilbert 2022-12-08 23:26:42 UTC
(In reply to Alex Williamson from comment #24)
> (In reply to Dr. David Alan Gilbert from comment #23)
> > Alex: 
> >   Can you please check that the qemu commandline shown in the previous
> > comment is what I should have for testing VFIO
> > devices (I'm passing one port of an X710 through)
> 
> Looks fine to me, IIRC this support was transparent as far as the command
> line and I don't see anything special about the vfio-pci device
> specification, as it should be.

Great.

> Is there a migration compatibility issue here though?  AIUI, this patch
> shifts the guest above-4G memory to way, way, way above 4G, past this host
> physical memory hole.  Both the working an non-working VMs above use the
> pc-q35-rhel9.0.0 machine type, but the fact that one works and one doesn't
> suggests the memory layouts are different.

Right; there's a pcmc->enforce_amd_1tb_hole which is set on new machine types.

> > Note I foolishly also tried hotplugging it and the guest (after a long long
> > set of timeouts from
> > stuff getting paused during the memory locking) spay out:
> > 
> >   i40e 000:07:00.0: enabling device (0000 -> 00002)
> >   ... Cannot map registers, bar size 0x0 too small, aborting
> >   i40e: probe of 0000:07:00.0 failed with error -12
> 
> Taking a long time to pin the memory is not unexpected with a VM this large,
> but I have no explanation why the resulting device shows up with a zero
> sized BAR in the end.

OK, I'll see if it's repeatable and if so that's a separate bug.

(Note I've not actually sent a packet on this device, since I don't have the cable plugged in yet, but I'll see what I can do)

Comment 26 Dr. David Alan Gilbert 2022-12-09 10:43:15 UTC
(In reply to Dr. David Alan Gilbert from comment #25)
> (In reply to Alex Williamson from comment #24)
> > (In reply to Dr. David Alan Gilbert from comment #23)
> > > Alex: 
> > >   Can you please check that the qemu commandline shown in the previous
> > > comment is what I should have for testing VFIO
> > > devices (I'm passing one port of an X710 through)
> > 
> > Looks fine to me, IIRC this support was transparent as far as the command
> > line and I don't see anything special about the vfio-pci device
> > specification, as it should be.
> 
> Great.
> 
> > Is there a migration compatibility issue here though?  AIUI, this patch
> > shifts the guest above-4G memory to way, way, way above 4G, past this host
> > physical memory hole.  Both the working an non-working VMs above use the
> > pc-q35-rhel9.0.0 machine type, but the fact that one works and one doesn't
> > suggests the memory layouts are different.
> 
> Right; there's a pcmc->enforce_amd_1tb_hole which is set on new machine
> types.

And I've just confirmed that on the RHEL9.2 7.2.0rc4 rebuild, this works nicely on the
pc-q35-rhel9.2.0 but not on the older rhel9.0.0 machine type.

Comment 27 Dr. David Alan Gilbert 2022-12-15 12:05:22 UTC
Moving to ONQA since it semes to work in the rc release for me.

Comment 28 Yanghang Liu 2022-12-16 06:48:58 UTC
Hi David,

I feel a little confused about the current bug status.

Is there any downstream qemu-kvm package for QE using to verify this bug ?

Comment 32 Dr. David Alan Gilbert 2022-12-24 00:13:37 UTC
(In reply to Yanghang Liu from comment #28)
> Hi David,
> 
> I feel a little confused about the current bug status.
> 
> Is there any downstream qemu-kvm package for QE using to verify this bug ?

You should find they've just landed in the 7.2.0 rpms created on the 16th and 20th.

Comment 33 Nitesh Narayan Lal 2022-12-29 07:35:24 UTC
Dave, John, can one of you please also share the list of upstream commits so that we can add them to the devel dashboard as required by the QEMU rebase process.
Making this BZ dependent on QEMU 7.2 rebase BZ and adding the fixed in version from qemu 7.2 rebase BZ (2135806).

Comment 37 Yanghang Liu 2023-01-10 09:07:58 UTC
The Reproducer in qemu-kvm-7.0.0-13.el9.x86_64:

[1] import a domain with 1TB memory and a hostdev PF
# virt-install --machine=q35 --noreboot --name=rhel92 --memory=1048576 --vcpus=16 --graphics type=vnc,port=5992,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:92:92 --import --noautoconsole --disk path=/home/images/RHEL92.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --hostdev pci_0000_e2_00_0 --osinfo detect=on,require=off

[2] start the domain 
# virsh start rhel92 
error: Failed to start domain 'rhel92'
error: internal error: qemu unexpectedly closed the monitor: 2023-01-10T09:01:06.019103Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}: VFIO_MAP_DMA failed: Invalid argument
2023-01-10T09:01:06.024601Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3","addr":"0x0"}: vfio 0000:21:00.0: failed to setup container for group 29: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x5563efb50a40, 0x100000000, 0xff80000000, 0x7e8cbfe00000) = -22 (Invalid argument)

note: The same domain but with 4GB memory and a hostdev PF can be started successfully

Comment 38 Yanghang Liu 2023-01-10 09:40:24 UTC
(In reply to Yanghang Liu from comment #37)
> The Reproducer in qemu-kvm-7.0.0-13.el9.x86_64:
> 
> [1] import a domain with 1TB memory and a hostdev PF
> # virt-install --machine=q35 --noreboot --name=rhel92 --memory=1048576
> --vcpus=16 --graphics type=vnc,port=5992,listen=0.0.0.0 --boot=uefi
> --network bridge=switch,model=virtio,mac=52:54:00:00:92:92 --import
> --noautoconsole --disk
> path=/home/images/RHEL92.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,
> size=20 --hostdev pci_0000_e2_00_0 --osinfo detect=on,require=off

sorry for a typo here.

It's "--hostdev pci_0000_21_00_0" instead of "--hostdev pci_0000_e2_00_0"

> [2] start the domain 
> # virsh start rhel92 
> error: Failed to start domain 'rhel92'
> error: internal error: qemu unexpectedly closed the monitor:
> 2023-01-10T09:01:06.019103Z qemu-kvm: -device
> {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3",
> "addr":"0x0"}: VFIO_MAP_DMA failed: Invalid argument
> 2023-01-10T09:01:06.024601Z qemu-kvm: -device
> {"driver":"vfio-pci","host":"0000:21:00.0","id":"hostdev0","bus":"pci.3",
> "addr":"0x0"}: vfio 0000:21:00.0: failed to setup container for group 29:
> memory listener initialization failed: Region pc.ram:
> vfio_dma_map(0x5563efb50a40, 0x100000000, 0xff80000000, 0x7e8cbfe00000) =
> -22 (Invalid argument)
> 
> note: The same domain but with 4GB memory and a hostdev PF can be started successfully

Comment 39 Yanan Fu 2023-01-10 09:48:08 UTC
Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 40 Yanghang Liu 2023-01-10 09:52:38 UTC
The verification in qemu-kvm-7.2.0-2.el9.x86_64:

[1] import a domain with 1TB memory and a hostdev PF
# virt-install --machine=q35 --noreboot --name=rhel92 --memory=1048576 --vcpus=16 --graphics type=vnc,port=5992,listen=0.0.0.0 --boot=uefi --network bridge=switch,model=virtio,mac=52:54:00:00:92:92 --import --noautoconsole --disk path=/home/images/RHEL92.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20 --hostdev pci_0000_21_00_0 --osinfo detect=on,require=off

[2] start the domain  <-- The domain with 1TB memory and a hostdev PF can be started successfully
# virsh start rhel92 

[3] check the PF status in the domain
# ifconfig
enp3s0np0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 0c:42:a1:d1:d1:c4  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg
[    6.058810] mlx5_core 0000:03:00.0: firmware version: 22.35.1012
[    6.058878] mlx5_core 0000:03:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:00:02.2 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    6.423640] mlx5_core 0000:03:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[    6.425136] mlx5_core 0000:03:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048)
[    6.435626] mlx5_core 0000:03:00.0: Port module event: module 0, Cable unplugged
[    6.436919] mlx5_core 0000:03:00.0: mlx5_pcie_event:289:(pid 101): PCIe slot power capability was not advertised.
[    6.454875] mlx5_core 0000:03:00.0: mlx5e: IPSec ESP acceleration enabled
[    6.456300] mlx5_core 0000:03:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[    6.655209] mlx5_core 0000:03:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295
[    6.826782] mlx5_core 0000:03:00.0 enp3s0np0: renamed from eth0
[    7.496821] mlx5_core 0000:03:00.0 enp3s0np0: Link down

# lshw -c network -businfo
Bus info          Device     Class          Description
=======================================================
pci@0000:03:00.0  enp3s0np0  network        MT2892 Family [ConnectX-6 Dx]

Comment 41 Yanghang Liu 2023-01-10 09:56:50 UTC
Hi John,

Could you please check comment 37 and comment 40 ?

Is it enough for QE to verify this bug ?

Feel free to let me know if you need QE to do more tests.

Comment 43 Yanghang Liu 2023-01-10 10:11:24 UTC
Hi David,

The detail host info is as following:
host name:  dell-per7525-26.lab.eng.pek2.redhat.com
memory size: 1.5T
CPU model: AMD EPYC-Rome
           BIOS Model name:     AMD EPYC 7713 64-Core Processor                
           CPU family:          25
           Model:               1
           Thread(s) per core:  2
           Core(s) per socket:  64
           Socket(s):           2
           Stepping:            1
kernel version: 5.14.0-228.el9.x86_64

Let me know if you want to know more details about the host :)

Comment 47 Yanghang Liu 2023-01-10 14:18:56 UTC
Move bug status to VERIFIED based on comment 37 and comment 40

Comment 48 John Allen (AMD) 2023-01-10 19:12:07 UTC
(In reply to Yanghang Liu from comment #41)
> Hi John,
> 
> Could you please check comment 37 and comment 40 ?
> 
> Is it enough for QE to verify this bug ?
> 
> Feel free to let me know if you need QE to do more tests.

It looks OK to me, but I'm not intimately familiar with the issue. Let me check with IOMMU SMEs here and see if there is any additional testing they would like to see for the bug.

Comment 50 John Allen (AMD) 2023-01-23 17:11:17 UTC
(In reply to John Allen (AMD) from comment #48)
> (In reply to Yanghang Liu from comment #41)
> > Hi John,
> > 
> > Could you please check comment 37 and comment 40 ?
> > 
> > Is it enough for QE to verify this bug ?
> > 
> > Feel free to let me know if you need QE to do more tests.
> 
> It looks OK to me, but I'm not intimately familiar with the issue. Let me
> check with IOMMU SMEs here and see if there is any additional testing they
> would like to see for the bug.

The IOMMU SMEs got back to me and I think we're fine with the testing that has been done. No additional testing desired.

Comment 54 errata-xmlrpc 2023-05-09 07:19:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162


Note You need to log in before you can comment on or make changes to this bug.