Description of problem: VM fails to run vhostuser server socket successfully on ppc64le, First of all, I ran virt-install --connect=qemu:///system \ --network vhostuser,source_type=unix,source_path=/tmp/vhost-sock0,source_mode=server,model=virtio,driver_queues=2 \ --network network=default \ --name=rhel_loopback \ --disk path=/opt/images/rhel-8.3-ppc64le-kvm.qcow2,format=qcow2 \ --ram 8192 \ --memorybacking hugepages=on,hugepages.page0.size=2,hugepages.page0.unit=M \ --vcpus=4,cpuset=73,74,75,76 \ --numatune mode=strict,nodeset=8 \ --nographics --noautoconsole \ --import the guest domain can be create successfully, but will crash after a while. then I add the option'--qemu-commandline="-machine ic-mode=xics,kernel-irqchip=on"' to the virt-install command, the system can boot, but the vhostuser server port still can not use. Version-Release number of selected component (if applicable): DPDK Version 19.11.3 OpenvSwitch version 2.13.2 Rpm: openvswitch2.13-2.13.0-77.el8fdp.ppc64le System: Power 9 with RHEL-8.3.0 libvirt version: 6.6.0, package: 12.module+el8.3.1+9458+e57b3fac (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2021-01-14-16:23:49, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+8829+e7a0a3ea.1, kernel: 4.18.0-240.el8.ppc64le, hostname: netqe-p9-03.lab3.eng.bos.redhat.com How reproducible: Steps to Reproduce: 1.Install the openvswitch on power,run it as root user, because dpdk on power only can ran successfully as root user. yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/18.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-18.el8fdp.noarch.rpm yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.13/2.13.0/77.el8fdp/ppc64le/openvswitch2.13-2.13.0-77.el8fdp.ppc64le.rpm sed -i -e 's/OVS_USER_ID="openvswitch:hugetlbfs"/OVS_USER_ID="root:root"/' /etc/sysconfig/openvswitch systemctl enable openvswitch systemctl start openvswitch 2.create a dpdkvhostuserclient on the ovs bridge 3.Install the libvirt and make the qemu group configuration is set to hugetlbfs. sed -i -e 's/#group = "root"/group = "hugetlbfs"/' /etc/libvirt/qemu.conf 4.systemctl enable libvirtd systemctl start libvirtd 5. create a guest with vhostuser server virt-install --connect=qemu:///system \ --network vhostuser,source_type=unix,source_path=/tmp/vhost-sock0,source_mode=server,model=virtio,driver_queues=2 \ --network network=default \ --name=rhel_loopback \ --disk path=/opt/images/rhel-8.3-ppc64le-kvm.qcow2,format=qcow2 \ --ram 8192 \ --memorybacking hugepages=on,hugepages.page0.size=2,hugepages.page0.unit=M,locked=yes,access.mode=shared \ --cpu numa.cell0.memory=8388608,numa.cell0.cpus=0-3 \ --vcpus=4,cpuset=73,74,75,76 \ --numatune mode=strict,nodeset=8 \ --nographics \ --noautoconsole \ --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on" \ --import Actual results: The guest rhel_loopback status should be running, However, after 10s, the vm rhel_loopback was shut down without the --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on". I can create a guest successfully, but can not start it, because of the vhostuser interface issue. Even with the --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on" option, the vhostuser server port still can not communicate with the client in ovs bridge. Expected results: the vm runs, and the vhostuser server and client can interact with each other. Additional info: Here are some related bugs on x86_64, I am not sure the dpdk run as root user, not the hugetlbfs is the key of this problem. https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/?highlight=dpdk https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD vhostuser socket creation fails due to selinux https://bugzilla.redhat.com/show_bug.cgi?id=1597285 Fixing the permission mismatch for DPDK vhost user ports with openvswitch and qemu https://bugzilla.redhat.com/show_bug.cgi?id=1478791 So I open this bug to discuss is there a workaround for DPDK vhost user ports with openvswitch and qemu on Power System(ppc64le).
Ping, can you please attach debug logs? Also, does this happen solely on ppc64le?
I mainly work on ppc64le, but my other teammate ran the same test on x86_64, they said there are no issues, they can not reproduce it when they use the hugetlbfs to ran openvswitch, libvirt & qemu on x86_64 machine.
Created attachment 1764579 [details] virt-install command without ic-mode=xics,kernel-irqchip=on demsg
Created attachment 1764580 [details] withoutxics the log of var_log_libvirt_qemu_rhel_loopback.log
Created attachment 1764581 [details] withoutxics-_var_log_openvswitch_ovs-vswitchd
Created attachment 1764582 [details] withoutxics.domain.xml xml file of the domain that generated by virt-install command without the option --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"
Created attachment 1764583 [details] withxics the log of var_log_libvirt_qemu_rhel_loopback.log the log of /var/log/libvirt/qemu/rhel_loopback.log when add the virt-install comand option: --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"
Created attachment 1764585 [details] the domain xml file of the rhel_loopback when with xics option the domain xml file of the rhel_loopback, when use virt-install command with option --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"
Created attachment 1764586 [details] withxics the log of withxics-_var_log_openvswitch_ovs-vswitchd.log the log of /var/log/openvswitch/ovs-vswitchd.log when use the virt-install command with --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"
From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt: 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-sock0,server char device redirected to /dev/pts/1 (label charserial0) 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave. 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM Falling back to kernel-irqchip=off 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed This means, that qemu crashed. I'm not sure it's because of that warning it printed, but looking into the code this is XML snippet that enables kernel_irqchip: <features> <ioapic driver='qemu'/> </features> I did not find anything for '-machine ic-mode=xics' and honestly, I have no idea what it does. But I guess the reason that ovs disconnects from the socket (leaving NIC unusable from inside the VM) has something to do with (from attached withoutxics-_var_log_openvswitch_ovs-vswitchd-1.log): 2021-03-19T02:49:59.289Z|00020|dpdk|WARN|EAL: No available hugepages reported in hugepages-1048576kB 2021-03-19T02:52:43.055Z|00097|dpif_netdev|ERR|There is no available (non-isolated) pmd thread for port 'dpdk0' queue 0. This queue will not be polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other queues? 2021-03-19T02:52:43.055Z|00098|dpif_netdev|ERR|There is no available (non-isolated) pmd thread for port 'dpdk0' queue 1. This queue will not be polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other queues? 2021-03-19T02:57:50.206Z|00117|netdev_dpdk|ERR|Failed to create mempool "ovscc694b5f00021580016384" with a request of 8192 mbufs 2021-03-19T02:57:50.206Z|00118|netdev_dpdk|ERR|Failed to create memory pool for netdev vhost0, with MTU 1500 on socket 0: Cannot allocate memory 2021-03-19T02:57:50.206Z|00119|dpif_netdev|ERR|Failed to set interface vhost0 new configuration Anyway, I don't think this is libvirt issue. I'm not sure what's the correct component to switch the bug to for further investigation. BTW: I can see you enable 2MiB hugepages - are those available on ppc hosts? I had no idea.
(In reply to Michal Privoznik from comment #10) > From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt: > > 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev > socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for > connection on: disconnected:unix:/tmp/vhost-sock0,server > char device redirected to /dev/pts/1 (label charserial0) > 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave. > 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but > unavailable: IRQ_XIVE capability must be present for KVM > Falling back to kernel-irqchip=off > 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed > > > This means, that qemu crashed. I'm not sure it's because of that warning it > printed, but looking into the code this is XML snippet that enables > kernel_irqchip: > > <features> > <ioapic driver='qemu'/> > </features> > > I did not find anything for '-machine ic-mode=xics' and honestly, I have no > idea what it does. Hi Michal, the qemu would crash at first. Then with appending option '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to virt-install, it could boot successfully. Ping uploaded six attachments, I think you should focus on the attachments mentioned at comment 7, comment 8 and comment 9 which were captured when guest booted successfully with option '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"'. After VM booted successfully, we failed to run 'testpmd' in VM. I suspect it is an issue related to dpdk or IOMMU type/group, please help check the output as below: [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0 -- --burst 64 -i --rxq=2 --txq=2 --rxd=4096 --txd=1024 --coremask=0x6 --auto-start --port-topology=chained --log-level=0 EAL: Detected 4 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: WARNING! Base virtual address hint (0x100ab0000 != 0x7ffb9fe00000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x101720000 != 0x7ff79fc00000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x102390000 != 0x7ff39fa00000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x103000000 != 0x7fef9f800000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: PCI device 0001:00:01.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 1af4:1000 net_virtio EAL: 0001:00:01.0 failed to select IOMMU type EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (0) EAL: Can't read from PCI bar (0) : offset (1e) EAL: Can't write to PCI bar (0) : offset (4) EAL: Can't read from PCI bar (0) : offset (14) EAL: Can't read from PCI bar (0) : offset (18) EAL: Can't write to PCI bar (0) : offset (e) EAL: Can't read from PCI bar (0) : offset (c) virtio_init_queue(): virtqueue does not exist EAL: fail to disable req notifier. EAL: fail to disable req notifier. EAL: Requested device 0001:00:01.0 cannot be used testpmd: No probed ethernet devices Interactive-mode selected Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0 EAL: Error - exiting with code: 1 Cause: rxq 2 invalid - must be >= 0 && <= 0
(In reply to Jianwen Ji from comment #11) > (In reply to Michal Privoznik from comment #10) > > From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt: > > > > 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev > > socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for > > connection on: disconnected:unix:/tmp/vhost-sock0,server > > char device redirected to /dev/pts/1 (label charserial0) > > 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave. > > 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but > > unavailable: IRQ_XIVE capability must be present for KVM > > Falling back to kernel-irqchip=off > > 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed > > > > > > This means, that qemu crashed. I'm not sure it's because of that warning it > > printed, but looking into the code this is XML snippet that enables > > kernel_irqchip: > > > > <features> > > <ioapic driver='qemu'/> > > </features> > > > > I did not find anything for '-machine ic-mode=xics' and honestly, I have no > > idea what it does. > > Hi Michal, the qemu would crash at first. Then with appending option > '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to > virt-install, it could boot successfully. Ping uploaded six attachments, I > think you should focus on the attachments mentioned at comment 7, comment 8 > and comment 9 which were captured when guest booted successfully with option > '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"'. Yeah, I'm not saying this isn't a bug. It clearly is. However, nothing indicates this is a libvirt bug. Let's switch over to qemu for further analysis. > > After VM booted successfully, we failed to run 'testpmd' in VM. I suspect it > is an issue related to dpdk or IOMMU type/group, please help check the > output as below: > > [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0 > -- --burst 64 -i --rxq=2 --txq=2 --rxd=4096 --txd=1024 --coremask=0x6 > --auto-start --port-topology=chained --log-level=0 Where could one get this testpmd utility?
(In reply to Michal Privoznik from comment #12) > (In reply to Jianwen Ji from comment #11) > > (In reply to Michal Privoznik from comment #10) > > > From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt: > > > > > > 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev > > > socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for > > > connection on: disconnected:unix:/tmp/vhost-sock0,server > > > char device redirected to /dev/pts/1 (label charserial0) > > > 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave. > > > 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but > > > unavailable: IRQ_XIVE capability must be present for KVM > > > Falling back to kernel-irqchip=off > > > 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed > > > > > > > > > This means, that qemu crashed. I'm not sure it's because of that warning it > > > printed, but looking into the code this is XML snippet that enables > > > kernel_irqchip: > > > > > > <features> > > > <ioapic driver='qemu'/> > > > </features> > > > > > > I did not find anything for '-machine ic-mode=xics' and honestly, I have no > > > idea what it does. > > > > Hi Michal, the qemu would crash at first. Then with appending option > > '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to > > virt-install, it could boot successfully. Ping uploaded six attachments, I > > think you should focus on the attachments mentioned at comment 7, comment 8 > > and comment 9 which were captured when guest booted successfully with option > > '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"'. > > Yeah, I'm not saying this isn't a bug. It clearly is. However, nothing > indicates this is a libvirt bug. Let's switch over to qemu for further > analysis. > > > > > After VM booted successfully, we failed to run 'testpmd' in VM. I suspect it > > is an issue related to dpdk or IOMMU type/group, please help check the > > output as below: > > > > [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0 > > -- --burst 64 -i --rxq=2 --txq=2 --rxd=4096 --txd=1024 --coremask=0x6 > > --auto-start --port-topology=chained --log-level=0 > > Where could one get this testpmd utility? This tool is provided by dpdk package. Please install dpdk, then testpmd or dpdk-testpmd will be installed under /usr/bin/ directory.
Let me put some background here. We are doing Open vSwitch PVP testing on Power9 machine with RHEL-8.3.0 by following the steps at [1](which describes the setup on x86_64 machine). Due to some limitation and difference between x86_64 and ppc64le, in order to make OVS/Qemu boot successfully, we applied few workarounds and specific configuration on ppc64le, please refer to bug Description for few workarounds we did. Generally we followed [1]. Now we get stuck at the section 'You can quickly check if your VM is setup correctly by starting testpmd as follows' of chapter 'Create the loopback Virtual Machine' at [1], please see the testpmd output mentioned at comment 11. We can't identify what the root cause of testpmd failure is, maybe it is an dpdk issue, qemu issue, ppc64le issue or something else. [1] https://github.com/chaudron/ovs_perf/blob/RHEL8/README.md#full-day-pvp-test
(In reply to Michal Privoznik from comment #10) > From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt: > > 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev > socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for > connection on: disconnected:unix:/tmp/vhost-sock0,server > char device redirected to /dev/pts/1 (label charserial0) > 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave. > 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but > unavailable: IRQ_XIVE capability must be present for KVM > Falling back to kernel-irqchip=off > 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed > > > This means, that qemu crashed. I'm not sure it's because of that warning it > printed, but looking into the code this is XML snippet that enables > kernel_irqchip: > > <features> > <ioapic driver='qemu'/> > </features> > > I did not find anything for '-machine ic-mode=xics' and honestly, I have no > idea what it does. > > But I guess the reason that ovs disconnects from the socket (leaving NIC > unusable from inside the VM) has something to do with (from attached > withoutxics-_var_log_openvswitch_ovs-vswitchd-1.log): > > > 2021-03-19T02:49:59.289Z|00020|dpdk|WARN|EAL: No available hugepages > reported in hugepages-1048576kB > > 2021-03-19T02:52:43.055Z|00097|dpif_netdev|ERR|There is no available > (non-isolated) pmd thread for port 'dpdk0' queue 0. This queue will not be > polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other > queues? > 2021-03-19T02:52:43.055Z|00098|dpif_netdev|ERR|There is no available > (non-isolated) pmd thread for port 'dpdk0' queue 1. This queue will not be > polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other > queues? > > 2021-03-19T02:57:50.206Z|00117|netdev_dpdk|ERR|Failed to create mempool > "ovscc694b5f00021580016384" with a request of 8192 mbufs > 2021-03-19T02:57:50.206Z|00118|netdev_dpdk|ERR|Failed to create memory pool > for netdev vhost0, with MTU 1500 on socket 0: Cannot allocate memory > 2021-03-19T02:57:50.206Z|00119|dpif_netdev|ERR|Failed to set interface > vhost0 new configuration > > Anyway, I don't think this is libvirt issue. I'm not sure what's the correct > component to switch the bug to for further investigation. > > BTW: I can see you enable 2MiB hugepages - are those available on ppc hosts? > I had no idea. For hugepages on power system, as the doc[1] described, the sizes of static huge pages on IBM POWER8 systems are 16MiB and 16GiB, as opposed to 2MiB and 1GiB on AMD64 and Intel 64 and on IBM POWER9. [1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/appe-kvm_on_multiarch
Hi, I'm trying to reproduce the problem on P9 (In reply to Ping Zhang from comment #0) ... > 2.create a dpdkvhostuserclient on the ovs bridge Could you provide the commands you use to: - create the ovs bridge - create the dpdkvhostuserclient on the bridge - the host kernel command line - the content of /proc/meminfo Thanks
Thanks Laurent for quickly investing in this bug. Let me clarify 2 points at first: 1. The warning "kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM" is in line with expectations. If the host firmware is too old, this message will always be reported when you launching a guest. 2. For "QEMU waiting for connection on: disconnected:unix:/tmp/vhost-sock0,server", you can refer to https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/?highlight=dpdk#vhost-user-client > If the corresponding dpdkvhostuserclient port has not yet been configured in OVS with vhost-server-path=/path/to/socket, QEMU will print a log similar to the following: >> QEMU waiting for connection on: disconnected:unix:/path/to/socket,server I am also interested in what are the steps to configure the ovs bridge, I tried to reproduce it with the steps provided by BZ 1516114, host and guest can ping each other. OVS configuration: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true systemctl restart openvswitch ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev ovs-vsctl add-port br0 vhost-vm-1 -- set Interface vhost-vm-1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhost-vm-1 ip add add 192.168.2.1/24 dev br0; ip link set dev br0 up ovs-vsctl show 1ffdc4e6-63e1-427d-b2eb-c69e2491be4d Bridge br0 datapath_type: netdev Port vhost-vm-1 Interface vhost-vm-1 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhost-vm-1"} Port br0 Interface br0 type: internal Port vhost-vm1 Interface vhost-vm1 type: dpdkvhostuser ovs_version: "2.13.4" Launch guest with: -chardev socket,id=charnet0,path=/tmp/vhost-vm-1,server \ -netdev vhost-user,chardev=charnet0,queues=2,id=hostnet0 \ -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=52:54:00:98:d6:d7,bus=pci.0,addr=0x6
(In reply to Yihuang Yu from comment #17) > Thanks Laurent for quickly investing in this bug. > > Let me clarify 2 points at first: > 1. The warning "kernel_irqchip allowed but unavailable: IRQ_XIVE capability > must be present for KVM" is in line with expectations. If the host firmware > is too old, this message will always be reported when you launching a guest. > 2. For "QEMU waiting for connection on: > disconnected:unix:/tmp/vhost-sock0,server", you can refer to > https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/ > ?highlight=dpdk#vhost-user-client > > If the corresponding dpdkvhostuserclient port has not yet been configured in OVS with vhost-server-path=/path/to/socket, QEMU will print a log similar to the following: > >> QEMU waiting for connection on: disconnected:unix:/path/to/socket,server > > I am also interested in what are the steps to configure the ovs bridge, I > tried to reproduce it with the steps provided by BZ 1516114, host and guest > can ping each other. > Hi Yihuang, as mentioned at comment 14, currently we are stuck with failing to run 'testpmd' in VM. Ping will reply the questions asked by Laurent at comment 16.
(In reply to Laurent Vivier from comment #16) > Hi, > > I'm trying to reproduce the problem on P9 > > (In reply to Ping Zhang from comment #0) > ... > > 2.create a dpdkvhostuserclient on the ovs bridge > > Could you provide the commands you use to: > - create the ovs bridge > - create the dpdkvhostuserclient on the bridge > > Thanks #Disable SELinux sed -i -e 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config setenforce permissive # Set the hugepage # Because the total memory of P9 system is 16G, # the the maximum hugepages of 1G size hugepage can be set is 16, # the the maximum hugepages of 2M size hugepage can be set is 8192. sed -i -e 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="default_hugepagesz=2M hugepagesz=2M hugepages=8192 /' /etc/default/grub grub2-mkconfig -o /boot/grub2/grub.cfg Our Power Systems are the multi-NUMA system, the cores we assign to both Open vSwitch and Qemu need to be one same NUMA node as the network card. For some more background information on this see the OVS-DPDK Parameters: Dealing with multi-NUMA blog post. [root@netqe-p9-03 ~]# lscpu |grep -E "^CPU\(s\)|On-line|Thread\(s\) per core" CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 [root@netqe-p9-03 ~]# lstopo-no-graphics Now we apply the cpu-partitioning profile, and configure the isolated core mask: Machine (252GB total) NUMANode L#0 (P#0 124GB) Package L#0 L3 L#0 (10MB) + L2 L#0 (512KB) L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#1) PU L#2 (P#2) PU L#3 (P#3) ...... NUMANode L#1 (P#8 128GB) Package L#1 L3 L#8 (10MB) + L2 L#8 (512KB) L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 PU L#64 (P#64) PU L#65 (P#65) PU L#66 (P#66) PU L#67 (P#67) HostBridge L#9 PCIBridge PCI 8086:1583 Net L#11 "enP48p1s0f0" PCI 8086:1583 Net L#12 "enP48p1s0f1" the card i used for testing is the 40G XL410 on numa node 8 of this p9 system. # Isolated_cpu yum -y install driverctl tuned tuned-profiles-cpu-partitioning lshw numactl rdma-core libibverbs systemctl enable tuned systemctl start tuned echo isolated_cores=1-31,65-95 >> /etc/tuned/cpu-partitioning-variables.conf tuned-adm profile cpu-partitioning sed -i -e 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="isolcpus=1-31,65-95 /' /etc/default/grub grub2-editenv - unset kernelopts grub2-mkconfig -o /boot/grub2/grub.cfg reboot #Setup Open vSwitch yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/23.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch.rpm yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.13/2.13.0/77.el8fdp/ppc64le/openvswitch2.13-2.13.0-77.el8fdp.ppc64le.rpm #XL710-Q2 40G driverctl -v set-override 0030:01:00.0 vfio-pci driverctl -v set-override 0030:01:00.1 vfio-pci To make sure the dpdk and openvswitch run as the root user, Modified config file the /etc/sysconfig/openvswitch sed -i -e 's/OVS_USER_ID="openvswitch:hugetlbfs"/OVS_USER_ID="root:root"/' /etc/sysconfig/openvswitch Then, Start Open vSwitch, and automatically start it after every reboot: systemctl enable openvswitch systemctl start openvswitch Create 8192 hugepages for dpdk the XL410 card is on numa node 8, so # XL710-Q2 40G echo 8192 > /sys/devices/system/node/node8/hugepages/hugepages-2048kB/nr_hugepages # XL710-Q2 40G ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem="0,0,0,0,0,0,0,0,2048" ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1f80000000000000000 ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x40000000000000000 # Do not forget this step after config dpdk systemctl restart openvswitch For the Physical to Virtual back to Physical(PVP) test we only need one bridge with two ports. In addition, we will configure our interfaces with 2 receive queues: ovs-vsctl --if-exists del-br ovs_pvp_br0 ovs-vsctl add-br ovs_pvp_br0 -- \ set bridge ovs_pvp_br0 datapath_type=netdev # XL710-Q2 40G ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \ set Interface dpdk0 type=dpdk -- \ set Interface dpdk0 options:dpdk-devargs=0030:01:00.0 -- \ set interface dpdk0 options:n_rxq=2 \ other_config:pmd-rxq-affinity="0:68,1:69" -- \ set Interface dpdk0 ofport_request=1 ovs-vsctl add-port ovs_pvp_br0 vhost0 -- \ set Interface vhost0 type=dpdkvhostuserclient -- \ set Interface vhost0 options:vhost-server-path="/tmp/vhost-sock0" -- \ set interface vhost0 options:n_rxq=2 \ other_config:pmd-rxq-affinity="0:68,1:69" -- \ set Interface vhost0 ofport_request=2 above all, those are the commands that I used to create an ovs bridge, and the dpdkvhostuserclient on the bridge > - the host kernel command line [root@netqe-p9-03 ~]# cat /proc/cmdline root=/dev/mapper/rhel_netqe--p9--03-root ro isolcpus=1-31,65-95 default_hugepagesz=2M hugepagesz=2M hugepages=8192 crashkernel=auto rd.lvm.lv=rhel_netqe-p9-03/root rd.lvm.lv=rhel_netqe-p9-03/swap skew_tick=1 nohz=on nohz_full=1-31,65-95 rcu_nocbs=1-31,65-95 tuned.non_isolcpus=ffffffff,00000001,ffffffff,00000001 intel_pstate=disable nosoftlockup > - the content of /proc/meminfo [root@netqe-p9-03 ~]# cat /proc/meminfo MemTotal: 263733120 kB MemFree: 226488768 kB MemAvailable: 225854720 kB Buffers: 4352 kB Cached: 368960 kB SwapCached: 0 kB Active: 343872 kB Inactive: 244288 kB Active(anon): 246336 kB Inactive(anon): 18176 kB Active(file): 97536 kB Inactive(file): 226112 kB Unevictable: 74560 kB Mlocked: 74560 kB SwapTotal: 4194240 kB SwapFree: 4194240 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 290624 kB Mapped: 154560 kB Shmem: 30592 kB KReclaimable: 183424 kB Slab: 1670336 kB SReclaimable: 183424 kB SUnreclaim: 1486912 kB KernelStack: 23552 kB PageTables: 4096 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 127672192 kB Committed_AS: 983744 kB VmallocTotal: 549755813888 kB VmallocUsed: 0 kB VmallocChunk: 0 kB Percpu: 188416 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 13434880 kB CmaFree: 13434880 kB HugePages_Total: 8192 HugePages_Free: 7168 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 16777216 kB [root@netqe-p9-03 ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Model: 2.2 (pvr 004e 1202) Model name: POWER9, altivec supported CPU max MHz: 3800.0000 CPU min MHz: 2166.0000 L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node8 CPU(s): 64-127 [root@netqe-p9-03 ~]# virsh freepages --all Node 0: 64KiB: 1601892 2048KiB: 4096 1048576KiB: 0 Node 0: 64KiB: 1601892 2048KiB: 4096 1048576KiB: 0
Created attachment 1766698 [details] the detailed commands used to configure the test environment
Ping, thank you for all the new details. Could you check the kernel host logs to see if you have any KVM related error? I'm trying to run some basic DPDK tests on a P9 and qemu exits with no reason but I have the following error in host kernel logs: [17777.759100] CPU 2/KVM[14222]: unhandled signal 11 at 0000000000000028 nip 000000013482aefc lr 000000013482aef8 code 1
(In reply to Laurent Vivier from comment #23) > Ping, > > thank you for all the new details. > > > Could you check the kernel host logs to see if you have any KVM related > error? > > I'm trying to run some basic DPDK tests on a P9 and qemu exits with no > reason but I have the following error in host kernel logs: > > [17777.759100] CPU 2/KVM[14222]: unhandled signal 11 at 0000000000000028 nip > 000000013482aefc lr 000000013482aef8 code 1 Tested 5.12.0-rc4 host kernel, same result
(In reply to Laurent Vivier from comment #23) > Ping, > > thank you for all the new details. > > > Could you check the kernel host logs to see if you have any KVM related > error? > > I'm trying to run some basic DPDK tests on a P9 and qemu exits with no > reason but I have the following error in host kernel logs: > > [17777.759100] CPU 2/KVM[14222]: unhandled signal 11 at 0000000000000028 nip > 000000013482aefc lr 000000013482aef8 code 1 Laurent, We had similar host kernel error logs at first. After we appended option '-M pseries,ic-mode=xics,kernel-irqchip=on' to qemu-kvm or '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to virt-install, the qemu could boot successfully and no more above host kernel error logs appeared. You may also need to do configuration: sed -i -e 's/#group = "root"/group = "hugetlbfs"/' /etc/libvirt/qemu.conf
It seems the problem can depend on the host type. With my simple test, I don't have the KVM error with an IBM 8335-GTW (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u). Ping, could you check your server type, and if it's not an IBM Witherspoon re-run you test on a Witerspoon (without the '-M pseries,ic-mode=xics,kernel-irqchip=on' parameter)? Thanks
(In reply to Jianwen Ji from comment #11) ... > [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0 > -- --burst 64 -i --rxq=2 --txq=2 --rxd=4096 --txd=1024 --coremask=0x6 > --auto-start --port-topology=chained --log-level=0 > EAL: Detected 4 lcore(s) > EAL: Detected 1 NUMA nodes > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > EAL: Selected IOVA mode 'PA' > EAL: No available hugepages reported in hugepages-1048576kB > EAL: Probing VFIO support... > EAL: VFIO support initialized > EAL: WARNING! Base virtual address hint (0x100ab0000 != 0x7ffb9fe00000) not > respected! > EAL: This may cause issues with mapping memory into secondary processes > EAL: WARNING! Base virtual address hint (0x101720000 != 0x7ff79fc00000) not > respected! > EAL: This may cause issues with mapping memory into secondary processes > EAL: WARNING! Base virtual address hint (0x102390000 != 0x7ff39fa00000) not > respected! > EAL: This may cause issues with mapping memory into secondary processes > EAL: WARNING! Base virtual address hint (0x103000000 != 0x7fef9f800000) not > respected! > EAL: This may cause issues with mapping memory into secondary processes > EAL: PCI device 0001:00:01.0 on NUMA socket -1 > EAL: Invalid NUMA socket, default to 0 > EAL: probe driver: 1af4:1000 net_virtio > EAL: 0001:00:01.0 failed to select IOMMU type > EAL: Can't write to PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (12) > EAL: Can't write to PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (12) > EAL: Can't write to PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (0) > EAL: Can't read from PCI bar (0) : offset (1e) > EAL: Can't write to PCI bar (0) : offset (4) > EAL: Can't read from PCI bar (0) : offset (14) > EAL: Can't read from PCI bar (0) : offset (18) > EAL: Can't write to PCI bar (0) : offset (e) > EAL: Can't read from PCI bar (0) : offset (c) > virtio_init_queue(): virtqueue does not exist > EAL: fail to disable req notifier. > EAL: fail to disable req notifier. > EAL: Requested device 0001:00:01.0 cannot be used > testpmd: No probed ethernet devices > Interactive-mode selected > Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0 > EAL: Error - exiting with code: 1 > Cause: rxq 2 invalid - must be >= 0 && <= 0 Maxime, any idea why we have "failed to select IOMMU type" and "Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0" ? Thanks
(In reply to Laurent Vivier from comment #27) > (In reply to Jianwen Ji from comment #11) > ... > > [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0 > > -- --burst 64 -i --rxq=2 --txq=2 --rxd=4096 --txd=1024 --coremask=0x6 > > --auto-start --port-topology=chained --log-level=0 > > EAL: Detected 4 lcore(s) > > EAL: Detected 1 NUMA nodes > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > > EAL: Selected IOVA mode 'PA' > > EAL: No available hugepages reported in hugepages-1048576kB > > EAL: Probing VFIO support... > > EAL: VFIO support initialized > > EAL: WARNING! Base virtual address hint (0x100ab0000 != 0x7ffb9fe00000) not > > respected! > > EAL: This may cause issues with mapping memory into secondary processes > > EAL: WARNING! Base virtual address hint (0x101720000 != 0x7ff79fc00000) not > > respected! > > EAL: This may cause issues with mapping memory into secondary processes > > EAL: WARNING! Base virtual address hint (0x102390000 != 0x7ff39fa00000) not > > respected! > > EAL: This may cause issues with mapping memory into secondary processes > > EAL: WARNING! Base virtual address hint (0x103000000 != 0x7fef9f800000) not > > respected! > > EAL: This may cause issues with mapping memory into secondary processes > > EAL: PCI device 0001:00:01.0 on NUMA socket -1 > > EAL: Invalid NUMA socket, default to 0 > > EAL: probe driver: 1af4:1000 net_virtio > > EAL: 0001:00:01.0 failed to select IOMMU type > > EAL: Can't write to PCI bar (0) : offset (12) > > EAL: Can't read from PCI bar (0) : offset (12) > > EAL: Can't read from PCI bar (0) : offset (12) > > EAL: Can't write to PCI bar (0) : offset (12) > > EAL: Can't read from PCI bar (0) : offset (12) > > EAL: Can't write to PCI bar (0) : offset (12) > > EAL: Can't read from PCI bar (0) : offset (0) > > EAL: Can't read from PCI bar (0) : offset (1e) > > EAL: Can't write to PCI bar (0) : offset (4) > > EAL: Can't read from PCI bar (0) : offset (14) > > EAL: Can't read from PCI bar (0) : offset (18) > > EAL: Can't write to PCI bar (0) : offset (e) > > EAL: Can't read from PCI bar (0) : offset (c) > > virtio_init_queue(): virtqueue does not exist > > EAL: fail to disable req notifier. > > EAL: fail to disable req notifier. > > EAL: Requested device 0001:00:01.0 cannot be used > > testpmd: No probed ethernet devices > > Interactive-mode selected > > Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0 > > EAL: Error - exiting with code: 1 > > Cause: rxq 2 invalid - must be >= 0 && <= 0 > > Maxime, > > any idea why we have > > "failed to select IOMMU type" Are you trying to use it with or without a vIOMMU? If with vIOMMU, are you enabling its support in ovs's other_config? It is achieved with this (and can be done even if not using vIOMMU): # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true If no vIOMMU is involved, is the guest VFIO module probed with enabled_unsafe_noiommu_mode=Y parameter? > > and > > "Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0" > > ? It might be because no port was successfully initialized. If we solve first issue, this one may just disappear. > Thanks
(In reply to Maxime Coquelin from comment #28) > (In reply to Laurent Vivier from comment #27) ... > > any idea why we have > > > > "failed to select IOMMU type" > > Are you trying to use it with or without a vIOMMU? > If with vIOMMU, are you enabling its support in ovs's other_config? It's without vIOMMU > It is achieved with this (and can be done even if not using vIOMMU): > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true My testcase only uses testpmd in guest and host, It doesn't involve ovs. (to simplify, I'm following https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk) > If no vIOMMU is involved, is the guest VFIO module probed with > enabled_unsafe_noiommu_mode=Y parameter? > It's on POWER9, and it doesn't seem to have such parameter with vfio.
David, do we support vfio _inside_ a pseries guest?
(In reply to Laurent Vivier from comment #29) > (In reply to Maxime Coquelin from comment #28) > > (In reply to Laurent Vivier from comment #27) > ... > > > any idea why we have > > > > > > "failed to select IOMMU type" > > > > Are you trying to use it with or without a vIOMMU? > > If with vIOMMU, are you enabling its support in ovs's other_config? > > It's without vIOMMU > > > It is achieved with this (and can be done even if not using vIOMMU): > > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true > > My testcase only uses testpmd in guest and host, It doesn't involve ovs. > (to simplify, I'm following > https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk) > > > If no vIOMMU is involved, is the guest VFIO module probed with > > enabled_unsafe_noiommu_mode=Y parameter? > > > > It's on POWER9, and it doesn't seem to have such parameter with vfio. It does not seem it can't be enabled on POWER9: menuconfig VFIO_NOIOMMU bool "VFIO No-IOMMU support" depends on VFIO help VFIO is built on the ability to isolate devices using the IOMMU. Only with an IOMMU can userspace access to DMA capable devices be considered secure. VFIO No-IOMMU mode enables IOMMU groups for devices without IOMMU backing for the purpose of re-using the VFIO infrastructure in a non-secure mode. Use of this mode will result in an unsupportable kernel and will therefore taint the kernel. Device assignment to virtual machines is also not possible with this mode since there is no IOMMU to provide DMA translation. If you don't know what to do here, say N. Maybe it is not enabled in your Kernel?
(In reply to Maxime Coquelin from comment #31) > (In reply to Laurent Vivier from comment #29) > > (In reply to Maxime Coquelin from comment #28) > > > (In reply to Laurent Vivier from comment #27) > > ... > > > > any idea why we have > > > > > > > > "failed to select IOMMU type" > > > > > > Are you trying to use it with or without a vIOMMU? > > > If with vIOMMU, are you enabling its support in ovs's other_config? > > > > It's without vIOMMU > > > > > It is achieved with this (and can be done even if not using vIOMMU): > > > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true > > > > My testcase only uses testpmd in guest and host, It doesn't involve ovs. > > (to simplify, I'm following > > https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk) > > > > > If no vIOMMU is involved, is the guest VFIO module probed with > > > enabled_unsafe_noiommu_mode=Y parameter? > > > > > > > It's on POWER9, and it doesn't seem to have such parameter with vfio. > > It does not seem it can't be enabled on POWER9: > > menuconfig VFIO_NOIOMMU > bool "VFIO No-IOMMU support" > depends on VFIO > help > VFIO is built on the ability to isolate devices using the IOMMU. > Only with an IOMMU can userspace access to DMA capable devices be > considered secure. VFIO No-IOMMU mode enables IOMMU groups for > devices without IOMMU backing for the purpose of re-using the VFIO > infrastructure in a non-secure mode. Use of this mode will result > in an unsupportable kernel and will therefore taint the kernel. > Device assignment to virtual machines is also not possible with > this mode since there is no IOMMU to provide DMA translation. > > If you don't know what to do here, say N. > > > Maybe it is not enabled in your Kernel? Yes, you're right: # grep CONFIG_VFIO_NOIOMMU /boot/config-4.18.0-240.el8.ppc64le # CONFIG_VFIO_NOIOMMU is not set I'm going to try to build a kernel with that option enabled. But I think this also means we can't support this in RHEL 8 because we don't enable new option, and we will not in RHEL 9 as we don't support KVM on POWER anymore. Thanks
(In reply to Laurent Vivier from comment #26) > It seems the problem can depend on the host type. > > With my simple test, I don't have the KVM error with an IBM 8335-GTW > (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u). > > Ping, > > could you check your server type, and if it's not an IBM Witherspoon re-run > you test on a Witerspoon (without the '-M > pseries,ic-mode=xics,kernel-irqchip=on' parameter)? > > Thanks The model of P9 we are running tests on is 9006-22P (supermicro,p9dsu2u). For more details, please refer to https://beaker.engineering.redhat.com/view/netqe-p9-03.lab3.eng.bos.redhat.com#details . We'll try our tests on a Witherspoon.
(In reply to Maxime Coquelin from comment #31) > (In reply to Laurent Vivier from comment #29) > > (In reply to Maxime Coquelin from comment #28) > > > (In reply to Laurent Vivier from comment #27) > > ... > > > > any idea why we have > > > > > > > > "failed to select IOMMU type" > > > > > > Are you trying to use it with or without a vIOMMU? > > > If with vIOMMU, are you enabling its support in ovs's other_config? > > > > It's without vIOMMU > > > > > It is achieved with this (and can be done even if not using vIOMMU): > > > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true > > > > My testcase only uses testpmd in guest and host, It doesn't involve ovs. > > (to simplify, I'm following > > https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk) > > > > > If no vIOMMU is involved, is the guest VFIO module probed with > > > enabled_unsafe_noiommu_mode=Y parameter? > > > > > > > It's on POWER9, and it doesn't seem to have such parameter with vfio. > > It does not seem it can't be enabled on POWER9: > > menuconfig VFIO_NOIOMMU > bool "VFIO No-IOMMU support" > depends on VFIO > help > VFIO is built on the ability to isolate devices using the IOMMU. > Only with an IOMMU can userspace access to DMA capable devices be > considered secure. VFIO No-IOMMU mode enables IOMMU groups for > devices without IOMMU backing for the purpose of re-using the VFIO > infrastructure in a non-secure mode. Use of this mode will result > in an unsupportable kernel and will therefore taint the kernel. > Device assignment to virtual machines is also not possible with > this mode since there is no IOMMU to provide DMA translation. > > If you don't know what to do here, say N. > > > Maybe it is not enabled in your Kernel? I've built a kernel with this option but the result is the same: # modprobe vfio enable_unsafe_noiommu_mode=Y [it seems the parameter is ignored:] # cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode N # echo Y > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode # cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode Y # sudo modprobe vfio-pci # testpmd -l 0,1,2 --socket-mem 1024 -n 4 --proc-type auto --file-prefix pg -- --portmask=3 --forward-mode=macswap --port-topology=chained --disable-rss -i --rxq=1 --txq=1 --rxd=256 --txd=256 --nb-cores=2 --auto-start EAL: Detected 3 lcore(s) EAL: Detected 1 NUMA nodes EAL: Auto-detected process type: PRIMARY EAL: Multi-process socket /var/run/dpdk/pg/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-2048kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: WARNING! Base virtual address hint (0x180050000 != 0x1c0000000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x2c0060000 != 0x7ff780000000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x3000b0000 != 0x7fff8f840000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x4400c0000 != 0x7fef40000000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x480110000 != 0x7fff8d880000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: WARNING! Base virtual address hint (0x5c0120000 != 0x7fe700000000) not respected! EAL: This may cause issues with mapping memory into secondary processes EAL: PCI device 0000:00:01.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 1af4:1000 net_virtio EAL: PCI device 0001:00:07.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 1af4:1000 net_virtio EAL: 0001:00:07.0 failed to select IOMMU type EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (0) EAL: Can't read from PCI bar (0) : offset (1e) EAL: Can't write to PCI bar (0) : offset (4) EAL: Can't read from PCI bar (0) : offset (14) EAL: Can't read from PCI bar (0) : offset (18) EAL: Can't write to PCI bar (0) : offset (e) EAL: Can't read from PCI bar (0) : offset (c) virtio_init_queue(): virtqueue does not exist EAL: fail to disable req notifier. EAL: fail to disable req notifier. EAL: Requested device 0001:00:07.0 cannot be used EAL: PCI device 0002:00:08.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 1af4:1000 net_virtio EAL: 0002:00:08.0 failed to select IOMMU type EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (0) EAL: Can't read from PCI bar (0) : offset (1e) EAL: Can't write to PCI bar (0) : offset (4) EAL: Can't read from PCI bar (0) : offset (14) EAL: Can't read from PCI bar (0) : offset (18) EAL: Can't write to PCI bar (0) : offset (e) EAL: Can't read from PCI bar (0) : offset (c) virtio_init_queue(): virtqueue does not exist EAL: fail to disable req notifier. EAL: fail to disable req notifier. EAL: Requested device 0002:00:08.0 cannot be used testpmd: No probed ethernet devices Set macswap packet forwarding mode Interactive-mode selected Fail: input rxq (1) can't be greater than max_rx_queues (0) of port 0 EAL: Error - exiting with code: 1 Cause: rxq 1 invalid - must be >= 0 && <= 0
I've checked with strace the reason of the ENODEV: openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11 ioctl(11, VFIO_GET_API_VERSION, 0) = 0 ioctl(11, VFIO_CHECK_EXTENSION, 0x1) = 0 ioctl(11, VFIO_CHECK_EXTENSION, 0x7) = 1 ioctl(11, VFIO_CHECK_EXTENSION, 0x8) = 1 ... readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group", "../../../kernel/iommu_groups/1", 4096) = 30 openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25 ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) write(1, "EAL: 0001:00:07.0 failed to select IOMMU type\n", 48) = 48 ... readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group", "../../../kernel/iommu_groups/2", 4096) = 30 openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25 ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) write(1, "EAL: 0002:00:08.0 failed to select IOMMU type\n", 48) = 48
(In reply to Laurent Vivier from comment #35) > I've checked with strace the reason of the ENODEV: > > openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11 > ioctl(11, VFIO_GET_API_VERSION, 0) = 0 > ioctl(11, VFIO_CHECK_EXTENSION, 0x1) = 0 > ioctl(11, VFIO_CHECK_EXTENSION, 0x7) = 1 > ioctl(11, VFIO_CHECK_EXTENSION, 0x8) = 1 > ... > readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group", > "../../../kernel/iommu_groups/1", 4096) = 30 > openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25 > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 > ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) > ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) > ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) > write(1, "EAL: 0001:00:07.0 failed to select IOMMU type\n", 48) = 48 > ... > readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group", > "../../../kernel/iommu_groups/2", 4096) = 30 > openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25 > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 > ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) > ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) > ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) > write(1, "EAL: 0002:00:08.0 failed to select IOMMU type\n", 48) = 48 Alex, do you know why we have the ENODEV and EPERM (it's in a KVM guest on POWER9). I've enabled the CONFIG_VFIO_NOIOMMU option in a RHEL8 kernel. Thanks
(In reply to Laurent Vivier from comment #36) > (In reply to Laurent Vivier from comment #35) > > I've checked with strace the reason of the ENODEV: > > > > openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11 > > ioctl(11, VFIO_GET_API_VERSION, 0) = 0 > > ioctl(11, VFIO_CHECK_EXTENSION, 0x1) = 0 > > ioctl(11, VFIO_CHECK_EXTENSION, 0x7) = 1 > > ioctl(11, VFIO_CHECK_EXTENSION, 0x8) = 1 > > ... > > readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group", > > "../../../kernel/iommu_groups/1", 4096) = 30 > > openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25 > > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 > > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 > > ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) > > ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) > > ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) > > write(1, "EAL: 0001:00:07.0 failed to select IOMMU type\n", 48) = 48 > > ... > > readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group", > > "../../../kernel/iommu_groups/2", 4096) = 30 > > openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25 > > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 > > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 > > ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) > > ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) > > ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) > > write(1, "EAL: 0002:00:08.0 failed to select IOMMU type\n", 48) = 48 > > Alex, do you know why we have the ENODEV and EPERM (it's in a KVM guest on > POWER9). > I've enabled the CONFIG_VFIO_NOIOMMU option in a RHEL8 kernel. A device bound to vfio-pci making use of no-iommu will create a noiommu vfio group file, ex. /dev/vfio/noiommu-1. Only these groups can be used with the no-iommu drive and is meant to make sure that no-iommu is not a directly fungible iommu backend, the userspace driver needs to be aware of the difference. The openat() calls are successfully opening a regular vfio group file, which suggests there is some sort of vIOMMU support in the VM. These groups cannot be used with no-iommu.
(In reply to Alex Williamson from comment #37) > (In reply to Laurent Vivier from comment #36) > > (In reply to Laurent Vivier from comment #35) > > > I've checked with strace the reason of the ENODEV: > > > > > > openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11 > > > ioctl(11, VFIO_GET_API_VERSION, 0) = 0 > > > ioctl(11, VFIO_CHECK_EXTENSION, 0x1) = 0 > > > ioctl(11, VFIO_CHECK_EXTENSION, 0x7) = 1 > > > ioctl(11, VFIO_CHECK_EXTENSION, 0x8) = 1 > > > ... > > > readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group", > > > "../../../kernel/iommu_groups/1", 4096) = 30 > > > openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25 > > > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 > > > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 > > > ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) > > > ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) > > > ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) > > > write(1, "EAL: 0001:00:07.0 failed to select IOMMU type\n", 48) = 48 > > > ... > > > readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group", > > > "../../../kernel/iommu_groups/2", 4096) = 30 > > > openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25 > > > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0 > > > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0 > > > ioctl(11, VFIO_SET_IOMMU, 0x1) = -1 ENODEV (No such device) > > > ioctl(11, VFIO_SET_IOMMU, 0x7) = -1 EPERM (Operation not permitted) > > > ioctl(11, VFIO_SET_IOMMU, 0x8) = -1 ENODEV (No such device) > > > write(1, "EAL: 0002:00:08.0 failed to select IOMMU type\n", 48) = 48 > > > > Alex, do you know why we have the ENODEV and EPERM (it's in a KVM guest on > > POWER9). > > I've enabled the CONFIG_VFIO_NOIOMMU option in a RHEL8 kernel. > > A device bound to vfio-pci making use of no-iommu will create a noiommu vfio > group file, ex. /dev/vfio/noiommu-1. Only these groups can be used with the > no-iommu drive and is meant to make sure that no-iommu is not a directly > fungible iommu backend, the userspace driver needs to be aware of the > difference. The openat() calls are successfully opening a regular vfio > group file, which suggests there is some sort of vIOMMU support in the VM. > These groups cannot be used with no-iommu. Thank you Alex. This explains why ioctl(VFIO_SET_IOMMU) with VFIO_NOIOMMU_IOMMU (8) returns ENODEV and why ioctl(VFIO_SET_IOMMU) with VFIO_SPAPR_TCE_v2_IOMMU (7) returns EPERM. So I think the vIOMMU support we have here is the TCE v2.
> do we support vfio _inside_ a pseries guest? AFAIK, yes. I'm confused as to why noiommu is coming into this discussion. pseries guests *always* have a (paravirtualized) vIOMMU - it's part of the PAPR spec.
(In reply to David Gibson from comment #39) > > do we support vfio _inside_ a pseries guest? > > AFAIK, yes. > > I'm confused as to why noiommu is coming into this discussion. pseries It's my fault, I misunderstood the use of vIOMMU in pseries. > guests *always* have a (paravirtualized) vIOMMU - it's part of the PAPR spec. So the question now is why PAPR vIOMMU doesn't work with testpmd (DPDK)
(In reply to Laurent Vivier from comment #26) > It seems the problem can depend on the host type. > > With my simple test, I don't have the KVM error with an IBM 8335-GTW > (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u). > > Ping, > > could you check your server type, and if it's not an IBM Witherspoon re-run > you test on a Witerspoon (without the '-M > pseries,ic-mode=xics,kernel-irqchip=on' parameter)? > > Thanks My Server used to test is 9006-22P (supermicro,p9dsu2u), I re-run the virt-install command on the 8335-GTC (ibm,witherspoon) without the '-M pseries,ic-mode=xics,kernel-irqchip=on' parameter, it works well, no the issues on boston type. Maybe, you are right, it seems depend on host type.
vhost-user is not supported on PPC64LE, so close as WONTFIX