Bug 1427005
Summary: | [RFE] libvirt support of VT-d protected device assignment | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Peter Xu <peterx> |
Component: | libvirt | Assignee: | Ján Tomko <jtomko> |
Status: | CLOSED ERRATA | QA Contact: | Jingjing Shao <jishao> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | ailan, alex.williamson, dyuan, hhuang, jasowang, jsuchane, jtomko, laine, lhuang, mtessun, peterx, rbalakri, xuzhang, yafu, yalzhang, yama |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | 7.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-3.2.0-6.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-01 17:24:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1335808 | ||
Bug Blocks: |
Description
Peter Xu
2017-02-27 04:15:20 UTC
Upstream patches: https://www.redhat.com/archives/libvir-list/2017-March/msg01072.html Any feedback on: * how to properly probe if QEMU supports kernel-irqchip=split * the documentation patches is especially welcome. Pushed upstream as: commit 8023b21a95f271e51810de7f1362e609eaadc1e4 Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:41:17 +0200 conf: add <ioapic driver> to <features> Add a new <ioapic> element with a driver attribute. Possible values are qemu and kvm. With 'qemu', the I/O APIC can be put in the userspace even for KVM domains. https://bugzilla.redhat.com/show_bug.cgi?id=1427005 commit 6b5c6314b2f7a3b54c94a591e6b0dcd13ef1c6ce Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:11 +0200 qemu: format kernel_irqchip on the command line Add kernel_irqchip=split/on to the QEMU command line and a capability that looks for it in query-command-line-options output. For the 'split' option, use a version check since it cannot be reasonably probed. https://bugzilla.redhat.com/show_bug.cgi?id=1427005 commit 2020e2c6f2656ca1aa9032859ccde76185c37c39 Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:11 +0200 conf: add <driver intremap> to <iommu> Add a new attribute to control interrupt remapping. https://bugzilla.redhat.com/show_bug.cgi?id=1427005 commit 04028a9db9f2657e8d57d1e4705073c908aa248c Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:11 +0200 qemu: format intel-iommu,intremap on the command line https://bugzilla.redhat.com/show_bug.cgi?id=1427005 commit d12781b47eb0c9f3a498d88b632c327aa08aaf8a Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:11 +0200 conf: add caching_mode attribute to iommu device Add a new attribute to control the caching mode. https://bugzilla.redhat.com/show_bug.cgi?id=1427005 commit a56914486ca67f921ee6e3ce26b5787fccb47155 Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:11 +0200 qemu: format caching-mode on iommu command line Format the caching-mode option for the intel-iommu device, based on its <driver caching> attribute value. https://bugzilla.redhat.com/show_bug.cgi?id=1427005 commit 3a276c6524026b661ed7bee4539fc5387b963611 Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:12 +0200 conf: split out virDomainIOMMUDefCheckABIStability commit 935d927aa881753fff30f6236eedcf9680bca638 Author: Ján Tomko <jtomko> CommitDate: 2017-05-15 15:44:12 +0200 conf: add ABI stability checks for IOMMU options https://bugzilla.redhat.com/show_bug.cgi?id=1427005 git describe: v3.3.0-47-g935d927 Hi Jan, I try to verify this bug but I find three issues as below. Can you help to check them and give some feedback? Thank you in advance. (1) If I add the iommu device without caching_mode='on', the host will crash if I attach vf to the guest (2) The attribute intremap='split' is not allowed in the guest xml, but the error info has the 'split' (3) In the guest, the devices share the same iommu-groups, the device can not be attached to nested guest. The details info : (1) If I add the xml to guest, without caching_mode='on' <iommu model='intel'/> or <ioapic driver='qemu'/> .... <iommu model='intel'> <driver intremap='on'/> ===> or <driver intremap='off'/> </iommu> # virsh list Id Name State ---------------------------------------------------- 16 q35-js running # cat vf.xml <interface type='hostdev' managed='yes'> <mac address='02:24:6b:89:bc:e9'/> <source> <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x1'/> </source> </interface> # virsh attach-device q35-js vf.xml The host will crash (2) Add the xml as below without add " <ioapic driver='qemu'/> " <iommu model='intel'> <driver intremap='on'/> </iommu> Start the guest and get the error # virsh start q35-js error: Failed to start domain q35-js error: internal error: qemu unexpectedly closed the monitor: 2017-05-26T06:23:33.798217Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0) 2017-05-26T06:23:33.857340Z qemu-kvm: -device intel-iommu,intremap=on: Intel Interrupt Remapping cannot work with kernel-irqchip=on, please use 'split|off'. But If I change the attribute intremap='split' <iommu model='intel'> <driver intremap='split'/> </iommu> I also get error # virsh edit q35-js error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rng Extra element devices in interleave Element domain failed to validate content Failed. Try again? [y,n,i,f,?]: error: XML error: unknown intremap value: split (3) Add the xml as below to guest <ioapic driver='qemu'/> ... <iommu model='intel'> <driver intremap='on' caching_mode='on'/> </iommu> # virsh start q35-js Domain q35-js started # ps -ef | grep iommu .... -device intel-iommu,intremap=on,caching-mode=on ... login in the guest # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-663.el7.x86_64 root=/dev/mapper/rhel-root ro console=tty0 console=ttyS0,115200 reboot=pci biosdevname=0 crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet intel_iommu=on # lspci | grep Eth 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device 03:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter (rev 20) # virsh nodedev-list --tree +- pci_0000_00_1e_0 | | | +- pci_0000_02_00_0 | | | +- pci_0000_03_01_0 | | | +- net_enp3s1_52_54_00_1c_10_ac # virsh nodedev-dumpxml pci_0000_03_01_0 <device> <name>pci_0000_03_01_0</name> <path>/sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/0000:03:01.0</path> <parent>pci_0000_02_00_0</parent> <driver> <name>8139cp</name> </driver> <capability type='pci'> <domain>0</domain> <bus>3</bus> <slot>1</slot> <function>0</function> <product id='0x8139'>RTL-8100/8101L/8139 PCI Fast Ethernet Adapter</product> <vendor id='0x10ec'>Realtek Semiconductor Co., Ltd.</vendor> <iommuGroup number='9'> <address domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/> <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0x03' slot='0x01' function='0x0'/> </iommuGroup> </capability> </device> # cat pf.xml <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x03' slot='0x01' function='0x0'/> </source> </hostdev> # virsh attach-device rhel7 pf.xml error: Failed to attach device from pf.xml error: internal error: unable to execute QEMU command 'device_add': vfio error: 0000:03:01.0: failed to setup container for group 9: failed to set iommu for container: Device or resource busy 1) not a libvirt bug 2) libvirt should catch that and report a nicer error. Please file a new bug. 3) AFAIK that does not work on bare metal either. If the devices are in the same IOMMU group, they all need to be detached from the host before assigning one of them to the guest. (In reply to Ján Tomko from comment #15) > 1) not a libvirt bug I think it is really a bug for it causes the host crash. So is there another bug to track this issue? or need to file a new bug for other component ? > 2) libvirt should catch that and report a nicer error. Please file a new bug. OK. file a new bug for this. https://bugzilla.redhat.com/show_bug.cgi?id=1457610 > 3) AFAIK that does not work on bare metal either. If the devices are in the > same IOMMU group, they all need to be detached from the host before > assigning one of them to the guest. I try this but still get error. the details are as below. libvirt-3.2.0-7.el7.x86_64 # virsh nodedev-list --tree +- pci_0000_b4_00_0 | +- pci_0000_b5_00_0 | +- net_eth0_52_54_00_ee_67_31 # virsh nodedev-dumpxml pci_0000_b5_00_0 <device> <name>pci_0000_b5_00_0</name> <path>/sys/devices/pci0000:b4/0000:b4:00.0/0000:b5:00.0</path> <parent>pci_0000_b4_00_0</parent> <driver> <name>virtio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>181</bus> <slot>0</slot> <function>0</function> <product id='0x1041'>Virtio network device</product> <vendor id='0x1af4'>Red Hat, Inc</vendor> <iommuGroup number='9'> <address domain='0x0000' bus='0xb4' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0xb5' slot='0x00' function='0x0'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='2.5' width='1'/> <link validity='sta' speed='2.5' width='1'/> </pci-express> </capability> </device> # virsh nodedev-dumpxml pci_0000_b4_00_0 <====it is pcieport <device> <name>pci_0000_b4_00_0</name> <path>/sys/devices/pci0000:b4/0000:b4:00.0</path> <parent>computer</parent> <driver> <name>pcieport</name> </driver> <capability type='pci'> <domain>0</domain> <bus>180</bus> <slot>0</slot> <function>0</function> <product id='0x3420'>7500/5520/5500/X58 I/O Hub PCI Express Root Port 0</product> <vendor id='0x8086'>Intel Corporation</vendor> <capability type='pci-bridge'/> <iommuGroup number='9'> <address domain='0x0000' bus='0xb4' slot='0x00' function='0x0'/> <address domain='0x0000' bus='0xb5' slot='0x00' function='0x0'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' speed='2.5' width='1'/> <link validity='sta' speed='2.5' width='1'/> </pci-express> </capability> </device> # virsh nodedev-detach pci_0000_b5_00_0 Device pci_0000_b5_00_0 detached # virsh nodedev-dumpxml pci_0000_b5_00_0 |grep driver -A2 <driver> <name>vfio-pci</name> </driver> # virsh nodedev-detach pci_0000_b4_00_0 Device pci_0000_b4_00_0 detached # virsh nodedev-dumpxml pci_0000_b4_00_0 | grep driver -A # # cat pf.xml <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0xb5' slot='0x00' function='0x0'/> </source> </hostdev> # virsh attach-device rhel7 pf.xml [ 5591.704766] Out of memory: Kill process 4947 (qemu-kvm) score 474 or sacrifice child [ 5591.706008] Killed process 4947 (qemu-kvm) total-vm:2026324kB, anon-rss:184776kB, file-rss:0kB, shmem-rss:0kB error: Failed to attach device from pf.xml error: internal error: child reported: Kernel does not provide mount namespace: No such file or directory # virsh domstate rhel7 --reason shut off (crashed) (In reply to Jingjing Shao from comment #16) > (In reply to Ján Tomko from comment #15) > > 1) not a libvirt bug > > I think it is really a bug for it causes the host crash. So is there another > bug to track this issue? or need to file a new bug for other component ? I found two bugs to track this issues, they may be caused by the same reason. https://bugzilla.redhat.com/show_bug.cgi?id=1441605 https://bugzilla.redhat.com/show_bug.cgi?id=1450309#c3 > > > 3) AFAIK that does not work on bare metal either. If the devices are in the > > same IOMMU group, they all need to be detached from the host before > > assigning one of them to the guest. > > I try this but still get error. the details are as below. Please help to check the third issue and give me some feedback, thank you in advance. (In reply to Jingjing Shao from comment #16) > # virsh attach-device rhel7 pf.xml > [ 5591.704766] Out of memory: Kill process 4947 (qemu-kvm) score 474 or > sacrifice child > [ 5591.706008] Killed process 4947 (qemu-kvm) total-vm:2026324kB, > anon-rss:184776kB, file-rss:0kB, shmem-rss:0kB Seems like qemu cannot allocate enough memory. With the iommu device, more locked memory might be required. Did you try setting the hard_limit to something huge? <memtune> <hard_limit unit='G'>100</hard_limit> </memtune> This should also increase the amount of memory QEMU can lock. > error: Failed to attach device from pf.xml > error: internal error: child reported: Kernel does not provide mount > namespace: No such file or directory > > # virsh domstate rhel7 --reason > shut off (crashed) I am not sure if this error is relevant after the OOM error. But if it's relevant, see if you can reproduce the same error with namespaces = [] in qemu.conf. (In reply to Ján Tomko from comment #18) > (In reply to Jingjing Shao from comment #16) > > # virsh attach-device rhel7 pf.xml > > [ 5591.704766] Out of memory: Kill process 4947 (qemu-kvm) score 474 or > > sacrifice child > > [ 5591.706008] Killed process 4947 (qemu-kvm) total-vm:2026324kB, > > anon-rss:184776kB, file-rss:0kB, shmem-rss:0kB > > Seems like qemu cannot allocate enough memory. With the iommu device, more > locked memory might be required. Did you try setting the hard_limit to > something huge? No, I did not set the hard_limit as below and I add the memory part # virsh dumpxml q35-js <domain type='kvm' id='62'> <name>q35-js</name> <uuid>34cc0dae-8998-480c-b2db-171ce1e7461a</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static'>1</vcpu> > > <memtune> > <hard_limit unit='G'>100</hard_limit> > </memtune> > > This should also increase the amount of memory QEMU can lock. > > > error: Failed to attach device from pf.xml > > error: internal error: child reported: Kernel does not provide mount > > namespace: No such file or directory > > > > # virsh domstate rhel7 --reason > > shut off (crashed) > > I am not sure if this error is relevant after the OOM error. But if it's > relevant, see if you can reproduce the same error with namespaces = [] in > qemu.conf. I add the namespaces = [] in the qemu.conf and can not reproduce this error but I got another error as below # virsh attach-device rhel7 pf.xml [ 990.408929] Out of memory: Kill process 2975 (qemu-kvm) score 557 or sacrifice child [ 990.410051] Killed process 2975 (qemu-kvm) total-vm:2054172kB, anon-rss:438364kB, file-rss:12kB, shmem-rss:0kB error: Failed to attach device from pf.xml error: Unable to write to '/sys/fs/cgroup/devices/machine.slice/machine-qemu\x2d1\x2drhel7.scope/devices.deny': No such file or directory # virsh domstate rhel7 --reason shut off (crashed) Can you help to check this issue ? About case (3) in Comment 14 - the problem isn't because the other devices are in the same iommu group as the device you're trying to assign - those other two devices are PCI controllers (a dmi-to-pci-bridge and a pci-bridge), and it's normal for the parent PCI controllers of a device to be in the same IOMMU group. That isn't a problem because they are (or at least *should be*) excepted from the "all devices in the group must be unbound from their host drivers" rule by vfio. So something else is causing this error: error: internal error: unable to execute QEMU command 'device_add': vfio error: 0000:03:01.0: failed to setup container for group 9: failed to set iommu for container: Device or resource busy (In reply to Jingjing Shao from comment #19) > I add the namespaces = [] in the qemu.conf and can not reproduce this error > but I got another error as below > > # virsh attach-device rhel7 pf.xml > [ 990.408929] Out of memory: Kill process 2975 (qemu-kvm) score 557 or > sacrifice child > [ 990.410051] Killed process 2975 (qemu-kvm) total-vm:2054172kB, > anon-rss:438364kB, file-rss:12kB, shmem-rss:0kB The qemu process was killed by the OOM killer here. Is there enough free memory on the host? Was this with or without the hard_limit set? > error: Failed to attach device from pf.xml > error: Unable to write to > '/sys/fs/cgroup/devices/machine.slice/machine-qemu\x2d1\x2drhel7.scope/ > devices.deny': No such file or directory Hi Ján, Thanks your patient reply . It really caused by the memory problem. I try the test which include a l1 guest with 5G memory and a l2 guest (nested)with 1G memory and get the result as expected. But if the memory of two guests is not suitable, I still meet the error info as above. So does we have some doc for the memory configuration ? i.e. 1. what is minimum memory of l1 guest? 2. If the l1 guest is with minimum memory, what is maximum memory of nested guest? And I test with four scenarios as below and can you help to check if they are enough to verify this bug? 1. Host ->virtual network-> l1 guest ->virtual network-> l2 guest 2. Host ->virtual network-> l1 guest -> pci assignment -> l2 guest 3. Host ->vf pci assignment-> l1 guest ->virtual network-> l2 guest 4. Host ->vf pci assignment-> l1 guest ->pci assignment-> l2 guest And stretch test Host ->vf pci assignment with numa node-> l1 guest ->pci assignment -> l2 guest I will provice the detailed steps as below one by one Preparation : 1. Add the info to l1 guest q35-js ... <features> <ioapic driver='qemu'/> </features> ... <iommu model='intel'> <driver intremap='on' caching_mode='on'/> </iommu> ... 2. start the guest # virsh start q35-js Domain q35-js started 3. Check the qemu command line -device intel-iommu,intremap=on,caching-mode=on scenario 1 : Host ->virtual network-> l1 guest ->virtual network-> l2 guest 1. Prepare a l1 guest with xml as below <interface type='network'> <mac address='52:54:00:ee:67:31'/> <source network='default' bridge='vir'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> 2. check the device in the l1 guest # virsh nodedev-list --tree +- pci_0000_00_1b_0 | | | +- net_eth0_52_54_00_98_14_7e | # virsh nodedev-dumpxml pci_0000_00_1b_0 <device> <name>pci_0000_00_1b_0</name> <path>/sys/devices/pci0000:00/0000:00:1b.0</path> <parent>computer</parent> <driver> <name>virtio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>27</slot> <function>0</function> <product id='0x1000'>Virtio network device</product> <vendor id='0x1af4'>Red Hat, Inc</vendor> <iommuGroup number='11'> <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/> </iommuGroup> </capability> </device> 3. Attach the device with macvtap and add the xml as below to l2 guest rhel7 <interface type='direct'> <mac address='52:54:00:b1:9c:b0'/> <source dev='eth0' mode='bridge'/> <model type='rtl8139'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> 4. start the l2 guest rhel7 and check the device in the l2 guest # virsh nodedev-dumpxml pci_0000_00_02_0 <device> <name>pci_0000_00_02_0</name> <path>/sys/devices/pci0000:00/0000:00:02.0</path> <parent>computer</parent> <driver> <name>8139cp</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>2</slot> <function>0</function> <product id='0x8139'>RTL-8100/8101L/8139 PCI Fast Ethernet Adapter</product> <vendor id='0x10ec'>Realtek Semiconductor Co., Ltd.</vendor> </capability> </device> scenario 2 : Host ->virtual network-> l1 guest -> pci assignment -> l2 guest Repeat the step1~2 in scenario 1 ... 3. Start the guest and attach the device to guest with the vfio pci assignment # cat pf3.xml <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/> </source> </hostdev> # virsh attach-device rhel7 pf3.xml Device attached successfully # virsh dumpxml rhel7 | grep interface ... <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </hostdev> ... Check the xml of device # virsh nodedev-dumpxml pci_0000_00_1b_0 <device> <name>pci_0000_00_1b_0</name> <path>/sys/devices/pci0000:00/0000:00:1b.0</path> <parent>computer</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>27</slot> <function>0</function> <product id='0x1000'>Virtio network device</product> <vendor id='0x1af4'>Red Hat, Inc</vendor> <iommuGroup number='11'> <address domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/> </iommuGroup> </capability> </device> 4.login the l2 guest rhel7 and check the device # lspci 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device (rev 01) # virsh nodedev-dumpxml pci_0000_00_02_0 <device> <name>pci_0000_00_02_0</name> <path>/sys/devices/pci0000:00/0000:00:02.0</path> <parent>computer</parent> <driver> <name>virtio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>2</slot> <function>0</function> <product id='0x1000'>Virtio network device</product> <vendor id='0x1af4'>Red Hat, Inc</vendor> </capability> </device> scenario 3 : Host ->vf pci assignment-> l1 guest ->virtual network-> l2 guest (with numa node) 1. prepare l1 guest with numa configuration, attach vf to the guest, check the dumpxml of device # virsh dumpxml q35-js <numa> <cell id='0' cpus='0' memory='5242880' unit='KiB'/> </numa> .... <interface type='hostdev' managed='yes'> <mac address='02:24:6b:89:bc:e9'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x3'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </interface> .... #lspci +- pci_0000_b4_01_0 | +- pci_0000_b6_00_0 # virsh nodedev-dumpxml pci_0000_b6_00_0 <device> <name>pci_0000_b6_00_0</name> <path>/sys/devices/pci0000:b4/0000:b4:01.0/0000:b6:00.0</path> <parent>pci_0000_b4_01_0</parent> <driver> <name>vfio-pci</name> </driver> <capability type='pci'> <domain>0</domain> <bus>182</bus> <slot>0</slot> <function>0</function> <product id='0x10ed'>82599 Ethernet Controller Virtual Function</product> <vendor id='0x8086'>Intel Corporation</vendor> <iommuGroup number='15'> <address domain='0x0000' bus='0xb4' slot='0x01' function='0x0'/> <address domain='0x0000' bus='0xb6' slot='0x00' function='0x0'/> </iommuGroup> <numa node='0'/> <pci-express> <link validity='cap' port='0' width='0'/> <link validity='sta' width='0'/> </pci-express> </capability> </device> Check the numa_node # cat /sys/devices/pci0000:b4/0000:b4:01.0/0000:b6:00.0/numa_node 0 2. Create a virtual network whose source is this device to l2 guest rhel7 <interface type='direct'> <mac address='52:54:00:ef:ab:ac'/> <source dev='enp182s0' mode='bridge'/> <target dev='macvtap0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> login the l2 guest to check this device # lspci 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device scenario 4: Host ->vf pci assignment-> l1 guest ->pci assignment-> l2 guest (With numa node) Repeat the step1 in scenario 3 2,Attach the vf device to l2 guest rhel7 # cat pf.xml <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0xb6' slot='0x00' function='0x0'/> </source> </hostdev> # virsh attach-device rhel7 pf.xml Device attached successfully # virsh dumpxml rhel7 ... <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0xb6' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </hostdev> ... 3. Check the device info in guest2 # lspci 00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) # virsh nodedev-dumpxml pci_0000_00_03_0 <device> <name>pci_0000_00_03_0</name> <path>/sys/devices/pci0000:00/0000:00:03.0</path> <parent>computer</parent> <driver> <name>ixgbevf</name> </driver> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>3</slot> <function>0</function> <product id='0x10ed'>82599 Ethernet Controller Virtual Function</product> <vendor id='0x8086'>Intel Corporation</vendor> <pci-express> <link validity='cap' port='0' width='0'/> <link validity='sta' width='0'/> </pci-express> </capability> </device> (In reply to Jingjing Shao from comment #23) > Hi Ján, > > Thanks your patient reply . It really caused by the memory problem. > I try the test which include a l1 guest with 5G memory and a l2 guest > (nested)with 1G memory and get the result as expected. > > But if the memory of two guests is not suitable, I still meet the error info > as above. > > So does we have some doc for the memory configuration ? > Yes. We have documented this to be an undecidable problem: http://libvirt.org/formatdomain.html#elementsMemoryTuning Since commit 7e66766 using <memoryBacking><locked> implies no limit on locked memory. (setting <hard_limit> also works, because it also influences the locked memory limit) > i.e. > 1. what is minimum memory of l1 guest? > 2. If the l1 guest is with minimum memory, what is maximum memory of nested > guest? So the unhelpful but honest answer to these questions is: the minimum/maximum that successfully works for your use-case +/- some memory to reduce the chance of failing later > > And I test with four scenarios as below and can you help to check if they > are enough to verify this bug? Yes, these look sufficient to me. > > 1. Host ->virtual network-> l1 guest ->virtual network-> l2 guest > 2. Host ->virtual network-> l1 guest -> pci assignment -> l2 guest > 3. Host ->vf pci assignment-> l1 guest ->virtual network-> l2 guest > 4. Host ->vf pci assignment-> l1 guest ->pci assignment-> l2 guest > > And stretch test > Host ->vf pci assignment with numa node-> l1 guest ->pci assignment -> l2 > guest According to the comment 29, change the status to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |