1927194 – vhostuser socket creation failed on ppc64le

Bug 1927194 - vhostuser socket creation failed on ppc64le

Summary: vhostuser socket creation failed on ppc64le

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.3
Hardware:	ppc64le
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.5
Assignee:	Laurent Vivier
QA Contact:	Yihuang Yu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-10 10:27 UTC by Ping Zhang
Modified:	2021-04-28 01:00 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-06 13:23:47 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
virt-install command without ic-mode=xics,kernel-irqchip=on demsg (8.27 KB, text/plain) 2021-03-19 06:28 UTC, Ping Zhang	no flags	Details
withoutxics the log of var_log_libvirt_qemu_rhel_loopback.log (3.14 KB, text/plain) 2021-03-19 06:31 UTC, Ping Zhang	no flags	Details
withoutxics-_var_log_openvswitch_ovs-vswitchd (33.28 KB, text/plain) 2021-03-19 06:35 UTC, Ping Zhang	no flags	Details
withoutxics.domain.xml (1.32 KB, text/plain) 2021-03-19 06:37 UTC, Ping Zhang	no flags	Details
withxics the log of var_log_libvirt_qemu_rhel_loopback.log (12.75 KB, text/plain) 2021-03-19 06:40 UTC, Ping Zhang	no flags	Details
the domain xml file of the rhel_loopback when with xics option (1.63 KB, text/plain) 2021-03-19 06:42 UTC, Ping Zhang	no flags	Details
withxics the log of withxics-_var_log_openvswitch_ovs-vswitchd.log (21.21 KB, text/plain) 2021-03-19 06:45 UTC, Ping Zhang	no flags	Details
the detailed commands used to configure the test environment (137.64 KB, application/pdf) 2021-03-26 16:30 UTC, Ping Zhang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1478791	0	medium	CLOSED	Fixing the permission mismatch for DPDK vhost user ports with openvswitch and qemu	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1597285	0	unspecified	CLOSED	vhostuser socket creation fails due to selinux	2021-02-22 00:41:40 UTC

Description Ping Zhang 2021-02-10 10:27:30 UTC

Description of problem:
VM fails to run vhostuser server socket successfully on ppc64le, 
First of all, I ran 
virt-install --connect=qemu:///system \
--network vhostuser,source_type=unix,source_path=/tmp/vhost-sock0,source_mode=server,model=virtio,driver_queues=2 \
--network network=default \
--name=rhel_loopback \
--disk path=/opt/images/rhel-8.3-ppc64le-kvm.qcow2,format=qcow2 \
--ram 8192 \
--memorybacking hugepages=on,hugepages.page0.size=2,hugepages.page0.unit=M \
--vcpus=4,cpuset=73,74,75,76 \
--numatune mode=strict,nodeset=8 \
--nographics --noautoconsole \
--import
the guest domain can be create successfully, but will crash after a while.

then I add the option'--qemu-commandline="-machine ic-mode=xics,kernel-irqchip=on"' to the virt-install command, 
the system can boot, but the vhostuser server port still can not use.

Version-Release number of selected component (if applicable):
DPDK Version 19.11.3
OpenvSwitch version 2.13.2
Rpm: openvswitch2.13-2.13.0-77.el8fdp.ppc64le
System: Power 9 with RHEL-8.3.0

libvirt version: 6.6.0, package: 12.module+el8.3.1+9458+e57b3fac (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2021-01-14-16:23:49, ), 
qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+8829+e7a0a3ea.1,
kernel: 4.18.0-240.el8.ppc64le, 
hostname: netqe-p9-03.lab3.eng.bos.redhat.com


How reproducible:


Steps to Reproduce:
1.Install the openvswitch on power,run it as root user, because dpdk on power only can ran successfully as root user.
yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/18.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-18.el8fdp.noarch.rpm 

yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.13/2.13.0/77.el8fdp/ppc64le/openvswitch2.13-2.13.0-77.el8fdp.ppc64le.rpm 

sed -i -e 's/OVS_USER_ID="openvswitch:hugetlbfs"/OVS_USER_ID="root:root"/' /etc/sysconfig/openvswitch

systemctl enable openvswitch
systemctl start openvswitch


2.create a dpdkvhostuserclient on the ovs bridge
3.Install the libvirt and make the qemu group configuration is set to hugetlbfs.
 sed -i -e 's/#group = "root"/group = "hugetlbfs"/' /etc/libvirt/qemu.conf
4.systemctl enable libvirtd
systemctl start libvirtd

5. create a guest with vhostuser server 
virt-install --connect=qemu:///system \
--network vhostuser,source_type=unix,source_path=/tmp/vhost-sock0,source_mode=server,model=virtio,driver_queues=2 \
--network network=default \
--name=rhel_loopback \
--disk path=/opt/images/rhel-8.3-ppc64le-kvm.qcow2,format=qcow2 \
--ram 8192 \
--memorybacking hugepages=on,hugepages.page0.size=2,hugepages.page0.unit=M,locked=yes,access.mode=shared \
--cpu numa.cell0.memory=8388608,numa.cell0.cpus=0-3 \
--vcpus=4,cpuset=73,74,75,76 \
--numatune mode=strict,nodeset=8 \
--nographics \
--noautoconsole \
--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on" \
--import


Actual results:
The guest rhel_loopback status should be running,
However, after 10s, the vm rhel_loopback was shut down without the --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on".

I can create a guest successfully, but can not start it, because of the vhostuser interface issue.

Even with the --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on" option, the vhostuser server port still can not communicate with the client in ovs bridge.

Expected results:
the vm runs, and the vhostuser server and client can interact with each other.

Additional info:
Here are some related bugs on x86_64,
I am not sure the dpdk run as root user, not the hugetlbfs is the key of this problem.
https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/?highlight=dpdk
https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD 

vhostuser socket creation fails due to selinux
https://bugzilla.redhat.com/show_bug.cgi?id=1597285

Fixing the permission mismatch for DPDK vhost user ports with openvswitch and qemu
https://bugzilla.redhat.com/show_bug.cgi?id=1478791

So I open this bug to discuss is there a workaround for DPDK vhost user ports with openvswitch and qemu on Power System(ppc64le).

Comment 1 Michal Privoznik 2021-02-25 13:35:06 UTC

Ping, can you please attach debug logs? Also, does this happen solely on ppc64le?

Comment 2 Ping Zhang 2021-03-12 15:35:41 UTC

I mainly work on ppc64le, but my other teammate ran the same test on x86_64, they said there are no issues, they can not reproduce it when they use the hugetlbfs to ran openvswitch, libvirt & qemu on x86_64 machine.

Comment 3 Ping Zhang 2021-03-19 06:28:56 UTC

Created attachment 1764579 [details]
virt-install command without ic-mode=xics,kernel-irqchip=on demsg

Comment 4 Ping Zhang 2021-03-19 06:31:20 UTC

Created attachment 1764580 [details]
withoutxics the log of var_log_libvirt_qemu_rhel_loopback.log

Comment 5 Ping Zhang 2021-03-19 06:35:26 UTC

Created attachment 1764581 [details]
withoutxics-_var_log_openvswitch_ovs-vswitchd

Comment 6 Ping Zhang 2021-03-19 06:37:40 UTC

Created attachment 1764582 [details]
withoutxics.domain.xml

xml file of the domain that generated by virt-install command without the option --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"

Comment 7 Ping Zhang 2021-03-19 06:40:07 UTC

Created attachment 1764583 [details]
withxics the log of var_log_libvirt_qemu_rhel_loopback.log

the log of /var/log/libvirt/qemu/rhel_loopback.log when add the virt-install comand option: --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"

Comment 8 Ping Zhang 2021-03-19 06:42:21 UTC

Created attachment 1764585 [details]
the domain xml file of the rhel_loopback when with xics option

the domain xml file of the rhel_loopback, when use virt-install command with option --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"

Comment 9 Ping Zhang 2021-03-19 06:45:37 UTC

Created attachment 1764586 [details]
withxics the log of withxics-_var_log_openvswitch_ovs-vswitchd.log

the log of /var/log/openvswitch/ovs-vswitchd.log when use the virt-install command  with --qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"

Comment 10 Michal Privoznik 2021-03-19 14:19:43 UTC

From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt:

2021-03-19T02:57:45.884305Z qemu-kvm: -chardev socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-sock0,server
char device redirected to /dev/pts/1 (label charserial0)
2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave.
2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2021-03-19 02:58:19.805+0000: shutting down, reason=crashed


This means, that qemu crashed. I'm not sure it's because of that warning it printed, but looking into the code this is XML snippet that enables kernel_irqchip:

  <features>
    <ioapic driver='qemu'/>
  </features>

I did not find anything for '-machine ic-mode=xics' and honestly, I have no idea what it does.

But I guess the reason that ovs disconnects from the socket (leaving NIC unusable from inside the VM) has something to do with (from attached withoutxics-_var_log_openvswitch_ovs-vswitchd-1.log):


2021-03-19T02:49:59.289Z|00020|dpdk|WARN|EAL: No available hugepages reported in hugepages-1048576kB

2021-03-19T02:52:43.055Z|00097|dpif_netdev|ERR|There is no available (non-isolated) pmd thread for port 'dpdk0' queue 0. This queue will not be polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other queues?
2021-03-19T02:52:43.055Z|00098|dpif_netdev|ERR|There is no available (non-isolated) pmd thread for port 'dpdk0' queue 1. This queue will not be polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other queues?

2021-03-19T02:57:50.206Z|00117|netdev_dpdk|ERR|Failed to create mempool "ovscc694b5f00021580016384" with a request of 8192 mbufs
2021-03-19T02:57:50.206Z|00118|netdev_dpdk|ERR|Failed to create memory pool for netdev vhost0, with MTU 1500 on socket 0: Cannot allocate memory
2021-03-19T02:57:50.206Z|00119|dpif_netdev|ERR|Failed to set interface vhost0 new configuration

Anyway, I don't think this is libvirt issue. I'm not sure what's the correct component to switch the bug to for further investigation.

BTW: I can see you enable 2MiB hugepages - are those available on ppc hosts? I had no idea.

Comment 11 Jianwen Ji 2021-03-24 02:38:53 UTC

(In reply to Michal Privoznik from comment #10)
> From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt:
> 
> 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev
> socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for
> connection on: disconnected:unix:/tmp/vhost-sock0,server
> char device redirected to /dev/pts/1 (label charserial0)
> 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave.
> 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but
> unavailable: IRQ_XIVE capability must be present for KVM
> Falling back to kernel-irqchip=off
> 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed
> 
> 
> This means, that qemu crashed. I'm not sure it's because of that warning it
> printed, but looking into the code this is XML snippet that enables
> kernel_irqchip:
> 
>   <features>
>     <ioapic driver='qemu'/>
>   </features>
> 
> I did not find anything for '-machine ic-mode=xics' and honestly, I have no
> idea what it does.

Hi Michal, the qemu would crash at first. Then with appending option '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to virt-install, it could boot successfully. Ping uploaded six attachments, I think you should focus on the attachments mentioned at comment 7, comment 8 and comment 9 which were captured when guest booted successfully with option '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"'.

After VM booted successfully, we failed to run 'testpmd' in VM. I suspect it is an issue related to dpdk or IOMMU type/group, please help check the output as below:

[root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0 --   --burst 64 -i --rxq=2 --txq=2   --rxd=4096 --txd=1024 --coremask=0x6 --auto-start   --port-topology=chained --log-level=0
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING! Base virtual address hint (0x100ab0000 != 0x7ffb9fe00000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x101720000 != 0x7ff79fc00000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x102390000 != 0x7ff39fa00000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x103000000 != 0x7fef9f800000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: PCI device 0001:00:01.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1000 net_virtio
EAL:   0001:00:01.0 failed to select IOMMU type
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't read from PCI bar (0) : offset (1e)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't read from PCI bar (0) : offset (14)
EAL: Can't read from PCI bar (0) : offset (18)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
virtio_init_queue(): virtqueue does not exist
EAL: fail to disable req notifier.
EAL: fail to disable req notifier.
EAL: Requested device 0001:00:01.0 cannot be used
testpmd: No probed ethernet devices
Interactive-mode selected
Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0
EAL: Error - exiting with code: 1
  Cause: rxq 2 invalid - must be >= 0 && <= 0

Comment 12 Michal Privoznik 2021-03-24 08:33:57 UTC

(In reply to Jianwen Ji from comment #11)
> (In reply to Michal Privoznik from comment #10)
> > From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt:
> > 
> > 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev
> > socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for
> > connection on: disconnected:unix:/tmp/vhost-sock0,server
> > char device redirected to /dev/pts/1 (label charserial0)
> > 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave.
> > 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but
> > unavailable: IRQ_XIVE capability must be present for KVM
> > Falling back to kernel-irqchip=off
> > 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed
> > 
> > 
> > This means, that qemu crashed. I'm not sure it's because of that warning it
> > printed, but looking into the code this is XML snippet that enables
> > kernel_irqchip:
> > 
> >   <features>
> >     <ioapic driver='qemu'/>
> >   </features>
> > 
> > I did not find anything for '-machine ic-mode=xics' and honestly, I have no
> > idea what it does.
> 
> Hi Michal, the qemu would crash at first. Then with appending option
> '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to
> virt-install, it could boot successfully. Ping uploaded six attachments, I
> think you should focus on the attachments mentioned at comment 7, comment 8
> and comment 9 which were captured when guest booted successfully with option
> '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"'.

Yeah, I'm not saying this isn't a bug. It clearly is. However, nothing indicates this is a libvirt bug. Let's switch over to qemu for further analysis.

> 
> After VM booted successfully, we failed to run 'testpmd' in VM. I suspect it
> is an issue related to dpdk or IOMMU type/group, please help check the
> output as below:
> 
> [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0
> --   --burst 64 -i --rxq=2 --txq=2   --rxd=4096 --txd=1024 --coremask=0x6
> --auto-start   --port-topology=chained --log-level=0

Where could one get this testpmd utility?

Comment 13 Jianwen Ji 2021-03-24 11:25:31 UTC

(In reply to Michal Privoznik from comment #12)
> (In reply to Jianwen Ji from comment #11)
> > (In reply to Michal Privoznik from comment #10)
> > > From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt:
> > > 
> > > 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev
> > > socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for
> > > connection on: disconnected:unix:/tmp/vhost-sock0,server
> > > char device redirected to /dev/pts/1 (label charserial0)
> > > 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave.
> > > 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but
> > > unavailable: IRQ_XIVE capability must be present for KVM
> > > Falling back to kernel-irqchip=off
> > > 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed
> > > 
> > > 
> > > This means, that qemu crashed. I'm not sure it's because of that warning it
> > > printed, but looking into the code this is XML snippet that enables
> > > kernel_irqchip:
> > > 
> > >   <features>
> > >     <ioapic driver='qemu'/>
> > >   </features>
> > > 
> > > I did not find anything for '-machine ic-mode=xics' and honestly, I have no
> > > idea what it does.
> > 
> > Hi Michal, the qemu would crash at first. Then with appending option
> > '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to
> > virt-install, it could boot successfully. Ping uploaded six attachments, I
> > think you should focus on the attachments mentioned at comment 7, comment 8
> > and comment 9 which were captured when guest booted successfully with option
> > '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"'.
> 
> Yeah, I'm not saying this isn't a bug. It clearly is. However, nothing
> indicates this is a libvirt bug. Let's switch over to qemu for further
> analysis.
> 
> > 
> > After VM booted successfully, we failed to run 'testpmd' in VM. I suspect it
> > is an issue related to dpdk or IOMMU type/group, please help check the
> > output as below:
> > 
> > [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0
> > --   --burst 64 -i --rxq=2 --txq=2   --rxd=4096 --txd=1024 --coremask=0x6
> > --auto-start   --port-topology=chained --log-level=0
> 
> Where could one get this testpmd utility?

This tool is provided by dpdk package. Please install dpdk, then testpmd or dpdk-testpmd will be installed under /usr/bin/ directory.

Comment 14 Jianwen Ji 2021-03-24 12:29:02 UTC

Let me put some background here. We are doing Open vSwitch PVP testing on Power9 machine with RHEL-8.3.0 by following the steps at [1](which describes the setup on x86_64 machine). Due to some limitation and difference between x86_64 and ppc64le, in order to make OVS/Qemu boot successfully, we applied few workarounds and specific configuration on ppc64le, please refer to bug Description for few workarounds we did.  Generally we followed [1].

Now we get stuck at the section 'You can quickly check if your VM is setup correctly by starting testpmd as follows' of chapter 'Create the loopback Virtual Machine' at [1], please see the testpmd output mentioned at comment 11. We can't identify what the root cause of testpmd failure is, maybe it is an dpdk issue, qemu issue, ppc64le issue or something else.

[1] https://github.com/chaudron/ovs_perf/blob/RHEL8/README.md#full-day-pvp-test

Comment 15 Ping Zhang 2021-03-24 14:14:11 UTC

(In reply to Michal Privoznik from comment #10)
> From attached withoutxics_var_log_libvirt_qemu_rhel_loopback.log.txt:
> 
> 2021-03-19T02:57:45.884305Z qemu-kvm: -chardev
> socket,id=charnet0,path=/tmp/vhost-sock0,server: info: QEMU waiting for
> connection on: disconnected:unix:/tmp/vhost-sock0,server
> char device redirected to /dev/pts/1 (label charserial0)
> 2021-03-19T02:57:46.809705Z qemu-kvm: Failed to read from slave.
> 2021-03-19T02:58:03.711792Z qemu-kvm: warning: kernel_irqchip allowed but
> unavailable: IRQ_XIVE capability must be present for KVM
> Falling back to kernel-irqchip=off
> 2021-03-19 02:58:19.805+0000: shutting down, reason=crashed
> 
> 
> This means, that qemu crashed. I'm not sure it's because of that warning it
> printed, but looking into the code this is XML snippet that enables
> kernel_irqchip:
> 
>   <features>
>     <ioapic driver='qemu'/>
>   </features>
> 
> I did not find anything for '-machine ic-mode=xics' and honestly, I have no
> idea what it does.
> 
> But I guess the reason that ovs disconnects from the socket (leaving NIC
> unusable from inside the VM) has something to do with (from attached
> withoutxics-_var_log_openvswitch_ovs-vswitchd-1.log):
> 
> 
> 2021-03-19T02:49:59.289Z|00020|dpdk|WARN|EAL: No available hugepages
> reported in hugepages-1048576kB
> 
> 2021-03-19T02:52:43.055Z|00097|dpif_netdev|ERR|There is no available
> (non-isolated) pmd thread for port 'dpdk0' queue 0. This queue will not be
> polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other
> queues?
> 2021-03-19T02:52:43.055Z|00098|dpif_netdev|ERR|There is no available
> (non-isolated) pmd thread for port 'dpdk0' queue 1. This queue will not be
> polled. Is pmd-cpu-mask set to zero? Or are all PMDs isolated to other
> queues?
> 
> 2021-03-19T02:57:50.206Z|00117|netdev_dpdk|ERR|Failed to create mempool
> "ovscc694b5f00021580016384" with a request of 8192 mbufs
> 2021-03-19T02:57:50.206Z|00118|netdev_dpdk|ERR|Failed to create memory pool
> for netdev vhost0, with MTU 1500 on socket 0: Cannot allocate memory
> 2021-03-19T02:57:50.206Z|00119|dpif_netdev|ERR|Failed to set interface
> vhost0 new configuration
> 
> Anyway, I don't think this is libvirt issue. I'm not sure what's the correct
> component to switch the bug to for further investigation.
> 
> BTW: I can see you enable 2MiB hugepages - are those available on ppc hosts?
> I had no idea.

For hugepages on power system,
as the doc[1] described, the sizes of static huge pages 
on IBM POWER8 systems are 16MiB and 16GiB, 
as opposed to 2MiB and 1GiB on AMD64 and Intel 64 and on IBM POWER9.
[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/appe-kvm_on_multiarch

Comment 16 Laurent Vivier 2021-03-25 13:06:39 UTC

Hi,

I'm trying to reproduce the problem on P9

(In reply to Ping Zhang from comment #0)
...
> 2.create a dpdkvhostuserclient on the ovs bridge

Could you provide the commands you use to:
- create the ovs bridge
- create the dpdkvhostuserclient on the bridge
- the host kernel command line
- the content of /proc/meminfo

Thanks

Comment 17 Yihuang Yu 2021-03-25 14:37:44 UTC

Thanks Laurent for quickly investing in this bug.

Let me clarify 2 points at first:
1. The warning "kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM" is in line with expectations. If the host firmware is too old, this message will always be reported when you launching a guest.
2. For "QEMU waiting for connection on: disconnected:unix:/tmp/vhost-sock0,server", you can refer to https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/?highlight=dpdk#vhost-user-client
> If the corresponding dpdkvhostuserclient port has not yet been configured in OVS with vhost-server-path=/path/to/socket, QEMU will print a log similar to the following:
>> QEMU waiting for connection on: disconnected:unix:/path/to/socket,server

I am also interested in what are the steps to configure the ovs bridge, I tried to reproduce it with the steps provided by BZ 1516114, host and guest can ping each other.

OVS configuration:
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
systemctl restart openvswitch
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 vhost-vm-1 -- set Interface vhost-vm-1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhost-vm-1
ip add add 192.168.2.1/24 dev br0; ip link set dev br0 up

ovs-vsctl show
1ffdc4e6-63e1-427d-b2eb-c69e2491be4d
    Bridge br0
        datapath_type: netdev
        Port vhost-vm-1
            Interface vhost-vm-1
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhost-vm-1"}
        Port br0
            Interface br0
                type: internal
        Port vhost-vm1
            Interface vhost-vm1
                type: dpdkvhostuser
    ovs_version: "2.13.4"

Launch guest with:
-chardev socket,id=charnet0,path=/tmp/vhost-vm-1,server \
-netdev vhost-user,chardev=charnet0,queues=2,id=hostnet0 \
-device virtio-net-pci,mq=on,vectors=6,netdev=hostnet0,id=net0,mac=52:54:00:98:d6:d7,bus=pci.0,addr=0x6

Comment 19 Jianwen Ji 2021-03-26 07:45:34 UTC

(In reply to Yihuang Yu from comment #17)
> Thanks Laurent for quickly investing in this bug.
> 
> Let me clarify 2 points at first:
> 1. The warning "kernel_irqchip allowed but unavailable: IRQ_XIVE capability
> must be present for KVM" is in line with expectations. If the host firmware
> is too old, this message will always be reported when you launching a guest.
> 2. For "QEMU waiting for connection on:
> disconnected:unix:/tmp/vhost-sock0,server", you can refer to
> https://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/
> ?highlight=dpdk#vhost-user-client
> > If the corresponding dpdkvhostuserclient port has not yet been configured in OVS with vhost-server-path=/path/to/socket, QEMU will print a log similar to the following:
> >> QEMU waiting for connection on: disconnected:unix:/path/to/socket,server
> 
> I am also interested in what are the steps to configure the ovs bridge, I
> tried to reproduce it with the steps provided by BZ 1516114, host and guest
> can ping each other.
> 

Hi Yihuang, as mentioned at comment 14, currently we are stuck with failing to run 'testpmd' in VM. Ping will reply the questions asked by Laurent at comment 16.

Comment 21 Ping Zhang 2021-03-26 16:27:01 UTC

(In reply to Laurent Vivier from comment #16)
> Hi,
> 
> I'm trying to reproduce the problem on P9
> 
> (In reply to Ping Zhang from comment #0)
> ...
> > 2.create a dpdkvhostuserclient on the ovs bridge
> 
> Could you provide the commands you use to:
> - create the ovs bridge
> - create the dpdkvhostuserclient on the bridge
> 
> Thanks

#Disable  SELinux
sed -i -e 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config
setenforce permissive

# Set the hugepage
# Because the total memory of P9 system is 16G,
# the the maximum hugepages of 1G size hugepage can be set is 16,
# the the maximum hugepages of 2M size hugepage can be set is 8192.
sed -i -e  's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="default_hugepagesz=2M  hugepagesz=2M hugepages=8192 /'  /etc/default/grub
grub2-mkconfig  -o /boot/grub2/grub.cfg

Our Power Systems are the multi-NUMA system, the cores we assign to both Open vSwitch and Qemu need to be one same NUMA node as the network card. For some more background information on this see the OVS-DPDK Parameters: Dealing with multi-NUMA blog post.

[root@netqe-p9-03 ~]#  lscpu |grep -E "^CPU\(s\)|On-line|Thread\(s\) per core"
CPU(s):          	128
On-line CPU(s) list: 0-127
Thread(s) per core:  4

[root@netqe-p9-03 ~]#  lstopo-no-graphics
Now we apply the cpu-partitioning profile, and configure the isolated core mask:
Machine (252GB total)

NUMANode L#0 (P#0 124GB)
    Package L#0
      L3 L#0 (10MB) + L2 L#0 (512KB)
        L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      	PU L#0 (P#0)
      	PU L#1 (P#1)
      	PU L#2 (P#2)
      	PU L#3 (P#3)
......
NUMANode L#1 (P#8 128GB)
    Package L#1
      L3 L#8 (10MB) + L2 L#8 (512KB)
        L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
      	PU L#64 (P#64)
      	PU L#65 (P#65)
      	PU L#66 (P#66)
      	PU L#67 (P#67)
    HostBridge L#9
      PCIBridge
        PCI 8086:1583
      	Net L#11 "enP48p1s0f0"
        PCI 8086:1583
      	Net L#12 "enP48p1s0f1"
the card i used for testing is the 40G XL410 on numa node 8 of this p9 system.
# Isolated_cpu
yum -y install driverctl tuned tuned-profiles-cpu-partitioning lshw  numactl rdma-core libibverbs

systemctl enable tuned
systemctl start tuned
echo isolated_cores=1-31,65-95 >> /etc/tuned/cpu-partitioning-variables.conf
tuned-adm profile cpu-partitioning

sed -i -e 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="isolcpus=1-31,65-95 /'  /etc/default/grub
grub2-editenv - unset kernelopts
grub2-mkconfig -o /boot/grub2/grub.cfg

reboot

#Setup Open vSwitch
yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/23.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-23.el8fdp.noarch.rpm

yum install -y http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.13/2.13.0/77.el8fdp/ppc64le/openvswitch2.13-2.13.0-77.el8fdp.ppc64le.rpm 

#XL710-Q2 40G
driverctl -v set-override 0030:01:00.0 vfio-pci 
driverctl -v set-override 0030:01:00.1 vfio-pci 

To make sure the dpdk and openvswitch run as the root user,
Modified config file the /etc/sysconfig/openvswitch
sed -i -e 's/OVS_USER_ID="openvswitch:hugetlbfs"/OVS_USER_ID="root:root"/' /etc/sysconfig/openvswitch

Then, 

Start Open vSwitch, and automatically start it after every reboot:
systemctl enable openvswitch
systemctl start openvswitch

Create 8192 hugepages for dpdk
the XL410 card is on numa node 8, so
# XL710-Q2 40G
echo 8192 > /sys/devices/system/node/node8/hugepages/hugepages-2048kB/nr_hugepages

# XL710-Q2 40G
ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem="0,0,0,0,0,0,0,0,2048"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1f80000000000000000
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x40000000000000000

# Do not forget this step after config dpdk
systemctl restart openvswitch

For the Physical to Virtual back to Physical(PVP) test we only need one bridge with two ports. In addition, we will configure our interfaces with 2 receive queues:

ovs-vsctl --if-exists del-br ovs_pvp_br0
ovs-vsctl add-br ovs_pvp_br0 -- \
      	set bridge ovs_pvp_br0 datapath_type=netdev

# XL710-Q2 40G
ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
          set Interface dpdk0 type=dpdk -- \
          set Interface dpdk0 options:dpdk-devargs=0030:01:00.0 -- \
          set interface dpdk0 options:n_rxq=2 \
            other_config:pmd-rxq-affinity="0:68,1:69" -- \
          set Interface dpdk0 ofport_request=1
ovs-vsctl add-port ovs_pvp_br0 vhost0 -- \
          set Interface vhost0 type=dpdkvhostuserclient -- \
          set Interface vhost0 options:vhost-server-path="/tmp/vhost-sock0" -- \
          set interface vhost0 options:n_rxq=2 \
            other_config:pmd-rxq-affinity="0:68,1:69" -- \
          set Interface vhost0 ofport_request=2

above all, those are the commands that I used to create an ovs bridge, 
and the dpdkvhostuserclient on the bridge
> - the host kernel command line
[root@netqe-p9-03 ~]# cat /proc/cmdline
root=/dev/mapper/rhel_netqe--p9--03-root ro isolcpus=1-31,65-95 default_hugepagesz=2M hugepagesz=2M hugepages=8192 crashkernel=auto rd.lvm.lv=rhel_netqe-p9-03/root rd.lvm.lv=rhel_netqe-p9-03/swap skew_tick=1 nohz=on nohz_full=1-31,65-95 rcu_nocbs=1-31,65-95 tuned.non_isolcpus=ffffffff,00000001,ffffffff,00000001 intel_pstate=disable nosoftlockup
> - the content of /proc/meminfo
[root@netqe-p9-03 ~]# cat /proc/meminfo
MemTotal:       263733120 kB
MemFree:        226488768 kB
MemAvailable:   225854720 kB
Buffers:            4352 kB
Cached:           368960 kB
SwapCached:            0 kB
Active:           343872 kB
Inactive:         244288 kB
Active(anon):     246336 kB
Inactive(anon):    18176 kB
Active(file):      97536 kB
Inactive(file):   226112 kB
Unevictable:       74560 kB
Mlocked:           74560 kB
SwapTotal:       4194240 kB
SwapFree:        4194240 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        290624 kB
Mapped:           154560 kB
Shmem:             30592 kB
KReclaimable:     183424 kB
Slab:            1670336 kB
SReclaimable:     183424 kB
SUnreclaim:      1486912 kB
KernelStack:       23552 kB
PageTables:         4096 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    127672192 kB
Committed_AS:     983744 kB
VmallocTotal:   549755813888 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:           188416 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:       13434880 kB
CmaFree:        13434880 kB
HugePages_Total:    8192
HugePages_Free:     7168
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        16777216 kB
[root@netqe-p9-03 ~]# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2166.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127

[root@netqe-p9-03 ~]# virsh freepages --all
Node 0:
64KiB: 1601892
2048KiB: 4096
1048576KiB: 0

Node 0:
64KiB: 1601892
2048KiB: 4096
1048576KiB: 0

Comment 22 Ping Zhang 2021-03-26 16:30:03 UTC

Created attachment 1766698 [details]
the detailed commands used to configure the test environment

Comment 23 Laurent Vivier 2021-03-26 19:42:50 UTC

Ping,

thank you for all the new details.


Could you check the kernel host logs to see if you have any KVM related error?

I'm trying to run some basic DPDK tests on a P9 and qemu exits with no reason but I have the following error in host kernel logs:

[17777.759100] CPU 2/KVM[14222]: unhandled signal 11 at 0000000000000028 nip 000000013482aefc lr 000000013482aef8 code 1

Comment 24 Laurent Vivier 2021-03-26 20:03:48 UTC

(In reply to Laurent Vivier from comment #23)
> Ping,
> 
> thank you for all the new details.
> 
> 
> Could you check the kernel host logs to see if you have any KVM related
> error?
> 
> I'm trying to run some basic DPDK tests on a P9 and qemu exits with no
> reason but I have the following error in host kernel logs:
> 
> [17777.759100] CPU 2/KVM[14222]: unhandled signal 11 at 0000000000000028 nip
> 000000013482aefc lr 000000013482aef8 code 1

Tested 5.12.0-rc4 host kernel, same result

Comment 25 Jianwen Ji 2021-03-29 01:51:32 UTC

(In reply to Laurent Vivier from comment #23)
> Ping,
> 
> thank you for all the new details.
> 
> 
> Could you check the kernel host logs to see if you have any KVM related
> error?
> 
> I'm trying to run some basic DPDK tests on a P9 and qemu exits with no
> reason but I have the following error in host kernel logs:
> 
> [17777.759100] CPU 2/KVM[14222]: unhandled signal 11 at 0000000000000028 nip
> 000000013482aefc lr 000000013482aef8 code 1

Laurent, 

We had similar host kernel error logs at first. After we appended option '-M pseries,ic-mode=xics,kernel-irqchip=on' to qemu-kvm or '--qemu-commandline=" -M pseries,ic-mode=xics,kernel-irqchip=on"' to virt-install, the qemu could boot successfully and no more above host kernel error logs appeared.

You may also need to do configuration:
sed -i -e 's/#group = "root"/group = "hugetlbfs"/' /etc/libvirt/qemu.conf

Comment 26 Laurent Vivier 2021-03-30 07:20:12 UTC

It seems the problem can depend on the host type.

With my simple test, I don't have the KVM error with an IBM 8335-GTW (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u).

Ping,

could you check your server type, and if it's not an IBM Witherspoon re-run you test on a Witerspoon (without the '-M pseries,ic-mode=xics,kernel-irqchip=on' parameter)?

Thanks

Comment 27 Laurent Vivier 2021-03-30 09:50:18 UTC

(In reply to Jianwen Ji from comment #11)
...
> [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0
> --   --burst 64 -i --rxq=2 --txq=2   --rxd=4096 --txd=1024 --coremask=0x6
> --auto-start   --port-topology=chained --log-level=0
> EAL: Detected 4 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: WARNING! Base virtual address hint (0x100ab0000 != 0x7ffb9fe00000) not
> respected!
> EAL:    This may cause issues with mapping memory into secondary processes
> EAL: WARNING! Base virtual address hint (0x101720000 != 0x7ff79fc00000) not
> respected!
> EAL:    This may cause issues with mapping memory into secondary processes
> EAL: WARNING! Base virtual address hint (0x102390000 != 0x7ff39fa00000) not
> respected!
> EAL:    This may cause issues with mapping memory into secondary processes
> EAL: WARNING! Base virtual address hint (0x103000000 != 0x7fef9f800000) not
> respected!
> EAL:    This may cause issues with mapping memory into secondary processes
> EAL: PCI device 0001:00:01.0 on NUMA socket -1
> EAL:   Invalid NUMA socket, default to 0
> EAL:   probe driver: 1af4:1000 net_virtio
> EAL:   0001:00:01.0 failed to select IOMMU type
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (0)
> EAL: Can't read from PCI bar (0) : offset (1e)
> EAL: Can't write to PCI bar (0) : offset (4)
> EAL: Can't read from PCI bar (0) : offset (14)
> EAL: Can't read from PCI bar (0) : offset (18)
> EAL: Can't write to PCI bar (0) : offset (e)
> EAL: Can't read from PCI bar (0) : offset (c)
> virtio_init_queue(): virtqueue does not exist
> EAL: fail to disable req notifier.
> EAL: fail to disable req notifier.
> EAL: Requested device 0001:00:01.0 cannot be used
> testpmd: No probed ethernet devices
> Interactive-mode selected
> Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0
> EAL: Error - exiting with code: 1
>   Cause: rxq 2 invalid - must be >= 0 && <= 0

Maxime,

any idea why we have

  "failed to select IOMMU type"

and

  "Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0"

?

Thanks

Comment 28 Maxime Coquelin 2021-03-30 10:10:23 UTC

(In reply to Laurent Vivier from comment #27)
> (In reply to Jianwen Ji from comment #11)
> ...
> > [root@localhost ~]# testpmd -c 0x7 -n 4 --socket-mem 1024,0 -w 0001:00:01.0
> > --   --burst 64 -i --rxq=2 --txq=2   --rxd=4096 --txd=1024 --coremask=0x6
> > --auto-start   --port-topology=chained --log-level=0
> > EAL: Detected 4 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> > EAL: Selected IOVA mode 'PA'
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL: WARNING! Base virtual address hint (0x100ab0000 != 0x7ffb9fe00000) not
> > respected!
> > EAL:    This may cause issues with mapping memory into secondary processes
> > EAL: WARNING! Base virtual address hint (0x101720000 != 0x7ff79fc00000) not
> > respected!
> > EAL:    This may cause issues with mapping memory into secondary processes
> > EAL: WARNING! Base virtual address hint (0x102390000 != 0x7ff39fa00000) not
> > respected!
> > EAL:    This may cause issues with mapping memory into secondary processes
> > EAL: WARNING! Base virtual address hint (0x103000000 != 0x7fef9f800000) not
> > respected!
> > EAL:    This may cause issues with mapping memory into secondary processes
> > EAL: PCI device 0001:00:01.0 on NUMA socket -1
> > EAL:   Invalid NUMA socket, default to 0
> > EAL:   probe driver: 1af4:1000 net_virtio
> > EAL:   0001:00:01.0 failed to select IOMMU type
> > EAL: Can't write to PCI bar (0) : offset (12)
> > EAL: Can't read from PCI bar (0) : offset (12)
> > EAL: Can't read from PCI bar (0) : offset (12)
> > EAL: Can't write to PCI bar (0) : offset (12)
> > EAL: Can't read from PCI bar (0) : offset (12)
> > EAL: Can't write to PCI bar (0) : offset (12)
> > EAL: Can't read from PCI bar (0) : offset (0)
> > EAL: Can't read from PCI bar (0) : offset (1e)
> > EAL: Can't write to PCI bar (0) : offset (4)
> > EAL: Can't read from PCI bar (0) : offset (14)
> > EAL: Can't read from PCI bar (0) : offset (18)
> > EAL: Can't write to PCI bar (0) : offset (e)
> > EAL: Can't read from PCI bar (0) : offset (c)
> > virtio_init_queue(): virtqueue does not exist
> > EAL: fail to disable req notifier.
> > EAL: fail to disable req notifier.
> > EAL: Requested device 0001:00:01.0 cannot be used
> > testpmd: No probed ethernet devices
> > Interactive-mode selected
> > Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0
> > EAL: Error - exiting with code: 1
> >   Cause: rxq 2 invalid - must be >= 0 && <= 0
> 
> Maxime,
> 
> any idea why we have
> 
>   "failed to select IOMMU type"

Are you trying to use it with or without a vIOMMU?
If with vIOMMU, are you enabling its support in ovs's other_config?

It is achieved with this (and can be done even if not using vIOMMU):
# ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

If no vIOMMU is involved, is the guest VFIO module probed with enabled_unsafe_noiommu_mode=Y parameter?

> 
> and
> 
>   "Fail: input rxq (2) can't be greater than max_rx_queues (0) of port 0"
> 
> ?

It might be because no port was successfully initialized.
If we solve first issue, this one may just disappear.



> Thanks

Comment 29 Laurent Vivier 2021-03-30 10:32:37 UTC

(In reply to Maxime Coquelin from comment #28)
> (In reply to Laurent Vivier from comment #27)
...
> > any idea why we have
> > 
> >   "failed to select IOMMU type"
> 
> Are you trying to use it with or without a vIOMMU?
> If with vIOMMU, are you enabling its support in ovs's other_config?

It's without vIOMMU

> It is achieved with this (and can be done even if not using vIOMMU):
> # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true

My testcase only uses testpmd in guest and host, It doesn't involve ovs.
(to simplify, I'm following https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk)

> If no vIOMMU is involved, is the guest VFIO module probed with
> enabled_unsafe_noiommu_mode=Y parameter?
> 

It's on POWER9, and it doesn't seem to have such parameter with vfio.

Comment 30 Laurent Vivier 2021-03-30 10:33:46 UTC

David,

do we support vfio _inside_ a pseries guest?

Comment 31 Maxime Coquelin 2021-03-30 10:43:07 UTC

(In reply to Laurent Vivier from comment #29)
> (In reply to Maxime Coquelin from comment #28)
> > (In reply to Laurent Vivier from comment #27)
> ...
> > > any idea why we have
> > > 
> > >   "failed to select IOMMU type"
> > 
> > Are you trying to use it with or without a vIOMMU?
> > If with vIOMMU, are you enabling its support in ovs's other_config?
> 
> It's without vIOMMU
> 
> > It is achieved with this (and can be done even if not using vIOMMU):
> > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
> 
> My testcase only uses testpmd in guest and host, It doesn't involve ovs.
> (to simplify, I'm following
> https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk)
> 
> > If no vIOMMU is involved, is the guest VFIO module probed with
> > enabled_unsafe_noiommu_mode=Y parameter?
> > 
> 
> It's on POWER9, and it doesn't seem to have such parameter with vfio.

It does not seem it can't be enabled on POWER9:

menuconfig VFIO_NOIOMMU
	bool "VFIO No-IOMMU support"
	depends on VFIO
	help
	  VFIO is built on the ability to isolate devices using the IOMMU.
	  Only with an IOMMU can userspace access to DMA capable devices be
	  considered secure.  VFIO No-IOMMU mode enables IOMMU groups for
	  devices without IOMMU backing for the purpose of re-using the VFIO
	  infrastructure in a non-secure mode.  Use of this mode will result
	  in an unsupportable kernel and will therefore taint the kernel.
	  Device assignment to virtual machines is also not possible with
	  this mode since there is no IOMMU to provide DMA translation.

	  If you don't know what to do here, say N.


Maybe it is not enabled in your Kernel?

Comment 32 Laurent Vivier 2021-03-30 11:09:47 UTC

(In reply to Maxime Coquelin from comment #31)
> (In reply to Laurent Vivier from comment #29)
> > (In reply to Maxime Coquelin from comment #28)
> > > (In reply to Laurent Vivier from comment #27)
> > ...
> > > > any idea why we have
> > > > 
> > > >   "failed to select IOMMU type"
> > > 
> > > Are you trying to use it with or without a vIOMMU?
> > > If with vIOMMU, are you enabling its support in ovs's other_config?
> > 
> > It's without vIOMMU
> > 
> > > It is achieved with this (and can be done even if not using vIOMMU):
> > > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
> > 
> > My testcase only uses testpmd in guest and host, It doesn't involve ovs.
> > (to simplify, I'm following
> > https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk)
> > 
> > > If no vIOMMU is involved, is the guest VFIO module probed with
> > > enabled_unsafe_noiommu_mode=Y parameter?
> > > 
> > 
> > It's on POWER9, and it doesn't seem to have such parameter with vfio.
> 
> It does not seem it can't be enabled on POWER9:
> 
> menuconfig VFIO_NOIOMMU
> 	bool "VFIO No-IOMMU support"
> 	depends on VFIO
> 	help
> 	  VFIO is built on the ability to isolate devices using the IOMMU.
> 	  Only with an IOMMU can userspace access to DMA capable devices be
> 	  considered secure.  VFIO No-IOMMU mode enables IOMMU groups for
> 	  devices without IOMMU backing for the purpose of re-using the VFIO
> 	  infrastructure in a non-secure mode.  Use of this mode will result
> 	  in an unsupportable kernel and will therefore taint the kernel.
> 	  Device assignment to virtual machines is also not possible with
> 	  this mode since there is no IOMMU to provide DMA translation.
> 
> 	  If you don't know what to do here, say N.
> 
> 
> Maybe it is not enabled in your Kernel?

Yes, you're right:

# grep CONFIG_VFIO_NOIOMMU /boot/config-4.18.0-240.el8.ppc64le 
# CONFIG_VFIO_NOIOMMU is not set

I'm going to try to build a kernel with that option enabled.

But I think this also means we can't support this in RHEL 8 because we don't enable new option, and we will not in RHEL 9 as we don't support KVM on POWER anymore.

Thanks

Comment 33 Jianwen Ji 2021-03-30 12:21:44 UTC

(In reply to Laurent Vivier from comment #26)
> It seems the problem can depend on the host type.
> 
> With my simple test, I don't have the KVM error with an IBM 8335-GTW
> (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u).
> 
> Ping,
> 
> could you check your server type, and if it's not an IBM Witherspoon re-run
> you test on a Witerspoon (without the '-M
> pseries,ic-mode=xics,kernel-irqchip=on' parameter)?
> 
> Thanks

The model of P9 we are running tests on is 9006-22P (supermicro,p9dsu2u). For more details, please refer to https://beaker.engineering.redhat.com/view/netqe-p9-03.lab3.eng.bos.redhat.com#details . We'll try our tests on a Witherspoon.

Comment 34 Laurent Vivier 2021-03-30 13:49:00 UTC

(In reply to Maxime Coquelin from comment #31)
> (In reply to Laurent Vivier from comment #29)
> > (In reply to Maxime Coquelin from comment #28)
> > > (In reply to Laurent Vivier from comment #27)
> > ...
> > > > any idea why we have
> > > > 
> > > >   "failed to select IOMMU type"
> > > 
> > > Are you trying to use it with or without a vIOMMU?
> > > If with vIOMMU, are you enabling its support in ovs's other_config?
> > 
> > It's without vIOMMU
> > 
> > > It is achieved with this (and can be done even if not using vIOMMU):
> > > # ovs-vsctl set Open_vSwitch . other_config:vhost-iommu-support=true
> > 
> > My testcase only uses testpmd in guest and host, It doesn't involve ovs.
> > (to simplify, I'm following
> > https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk)
> > 
> > > If no vIOMMU is involved, is the guest VFIO module probed with
> > > enabled_unsafe_noiommu_mode=Y parameter?
> > > 
> > 
> > It's on POWER9, and it doesn't seem to have such parameter with vfio.
> 
> It does not seem it can't be enabled on POWER9:
> 
> menuconfig VFIO_NOIOMMU
> 	bool "VFIO No-IOMMU support"
> 	depends on VFIO
> 	help
> 	  VFIO is built on the ability to isolate devices using the IOMMU.
> 	  Only with an IOMMU can userspace access to DMA capable devices be
> 	  considered secure.  VFIO No-IOMMU mode enables IOMMU groups for
> 	  devices without IOMMU backing for the purpose of re-using the VFIO
> 	  infrastructure in a non-secure mode.  Use of this mode will result
> 	  in an unsupportable kernel and will therefore taint the kernel.
> 	  Device assignment to virtual machines is also not possible with
> 	  this mode since there is no IOMMU to provide DMA translation.
> 
> 	  If you don't know what to do here, say N.
> 
> 
> Maybe it is not enabled in your Kernel?

I've built a kernel with this option but the result is the same:

# modprobe vfio enable_unsafe_noiommu_mode=Y
[it seems the parameter is ignored:]
# cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 
N
# echo Y > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
# cat /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 
Y
# sudo modprobe vfio-pci

# testpmd -l 0,1,2 --socket-mem 1024 -n 4     --proc-type auto --file-prefix pg --     --portmask=3 --forward-mode=macswap --port-topology=chained     --disable-rss -i --rxq=1 --txq=1     --rxd=256 --txd=256 --nb-cores=2 --auto-start
EAL: Detected 3 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Auto-detected process type: PRIMARY
EAL: Multi-process socket /var/run/dpdk/pg/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING! Base virtual address hint (0x180050000 != 0x1c0000000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x2c0060000 != 0x7ff780000000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x3000b0000 != 0x7fff8f840000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x4400c0000 != 0x7fef40000000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x480110000 != 0x7fff8d880000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: WARNING! Base virtual address hint (0x5c0120000 != 0x7fe700000000) not respected!
EAL:    This may cause issues with mapping memory into secondary processes
EAL: PCI device 0000:00:01.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1000 net_virtio
EAL: PCI device 0001:00:07.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1000 net_virtio
EAL:   0001:00:07.0 failed to select IOMMU type
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't read from PCI bar (0) : offset (1e)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't read from PCI bar (0) : offset (14)
EAL: Can't read from PCI bar (0) : offset (18)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
virtio_init_queue(): virtqueue does not exist
EAL: fail to disable req notifier.
EAL: fail to disable req notifier.
EAL: Requested device 0001:00:07.0 cannot be used
EAL: PCI device 0002:00:08.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1000 net_virtio
EAL:   0002:00:08.0 failed to select IOMMU type
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't read from PCI bar (0) : offset (1e)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't read from PCI bar (0) : offset (14)
EAL: Can't read from PCI bar (0) : offset (18)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
virtio_init_queue(): virtqueue does not exist
EAL: fail to disable req notifier.
EAL: fail to disable req notifier.
EAL: Requested device 0002:00:08.0 cannot be used
testpmd: No probed ethernet devices
Set macswap packet forwarding mode
Interactive-mode selected
Fail: input rxq (1) can't be greater than max_rx_queues (0) of port 0
EAL: Error - exiting with code: 1
  Cause: rxq 1 invalid - must be >= 0 && <= 0

Comment 35 Laurent Vivier 2021-03-30 13:58:20 UTC

I've checked with strace the reason of the ENODEV:

openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11
ioctl(11, VFIO_GET_API_VERSION, 0)      = 0
ioctl(11, VFIO_CHECK_EXTENSION, 0x1)    = 0
ioctl(11, VFIO_CHECK_EXTENSION, 0x7)    = 1
ioctl(11, VFIO_CHECK_EXTENSION, 0x8)    = 1
...
readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group", "../../../kernel/iommu_groups/1", 4096) = 30
openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25
ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
write(1, "EAL:   0001:00:07.0 failed to select IOMMU type\n", 48) = 48
...
readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group", "../../../kernel/iommu_groups/2", 4096) = 30
openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25
ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
write(1, "EAL:   0002:00:08.0 failed to select IOMMU type\n", 48) = 48

Comment 36 Laurent Vivier 2021-03-30 14:28:39 UTC

(In reply to Laurent Vivier from comment #35)
> I've checked with strace the reason of the ENODEV:
> 
> openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11
> ioctl(11, VFIO_GET_API_VERSION, 0)      = 0
> ioctl(11, VFIO_CHECK_EXTENSION, 0x1)    = 0
> ioctl(11, VFIO_CHECK_EXTENSION, 0x7)    = 1
> ioctl(11, VFIO_CHECK_EXTENSION, 0x8)    = 1
> ...
> readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group",
> "../../../kernel/iommu_groups/1", 4096) = 30
> openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25
> ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
> ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
> ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
> ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
> ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
> write(1, "EAL:   0001:00:07.0 failed to select IOMMU type\n", 48) = 48
> ...
> readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group",
> "../../../kernel/iommu_groups/2", 4096) = 30
> openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25
> ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
> ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
> ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
> ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
> ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
> write(1, "EAL:   0002:00:08.0 failed to select IOMMU type\n", 48) = 48

Alex, do you know why we have the ENODEV and EPERM (it's in a KVM guest on POWER9).
I've enabled the CONFIG_VFIO_NOIOMMU option in a RHEL8 kernel.

Thanks

Comment 37 Alex Williamson 2021-03-30 15:01:48 UTC

(In reply to Laurent Vivier from comment #36)
> (In reply to Laurent Vivier from comment #35)
> > I've checked with strace the reason of the ENODEV:
> > 
> > openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11
> > ioctl(11, VFIO_GET_API_VERSION, 0)      = 0
> > ioctl(11, VFIO_CHECK_EXTENSION, 0x1)    = 0
> > ioctl(11, VFIO_CHECK_EXTENSION, 0x7)    = 1
> > ioctl(11, VFIO_CHECK_EXTENSION, 0x8)    = 1
> > ...
> > readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group",
> > "../../../kernel/iommu_groups/1", 4096) = 30
> > openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25
> > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
> > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
> > ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
> > ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
> > ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
> > write(1, "EAL:   0001:00:07.0 failed to select IOMMU type\n", 48) = 48
> > ...
> > readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group",
> > "../../../kernel/iommu_groups/2", 4096) = 30
> > openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25
> > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
> > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
> > ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
> > ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
> > ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
> > write(1, "EAL:   0002:00:08.0 failed to select IOMMU type\n", 48) = 48
> 
> Alex, do you know why we have the ENODEV and EPERM (it's in a KVM guest on
> POWER9).
> I've enabled the CONFIG_VFIO_NOIOMMU option in a RHEL8 kernel.

A device bound to vfio-pci making use of no-iommu will create a noiommu vfio group file, ex. /dev/vfio/noiommu-1.  Only these groups can be used with the no-iommu drive and is meant to make sure that no-iommu is not a directly fungible iommu backend, the userspace driver needs to be aware of the difference.  The openat() calls are successfully opening a regular vfio group file, which suggests there is some sort of vIOMMU support in the VM.  These groups cannot be used with no-iommu.

Comment 38 Laurent Vivier 2021-03-30 15:23:51 UTC

(In reply to Alex Williamson from comment #37)
> (In reply to Laurent Vivier from comment #36)
> > (In reply to Laurent Vivier from comment #35)
> > > I've checked with strace the reason of the ENODEV:
> > > 
> > > openat(AT_FDCWD, "/dev/vfio/vfio", O_RDWR) = 11
> > > ioctl(11, VFIO_GET_API_VERSION, 0)      = 0
> > > ioctl(11, VFIO_CHECK_EXTENSION, 0x1)    = 0
> > > ioctl(11, VFIO_CHECK_EXTENSION, 0x7)    = 1
> > > ioctl(11, VFIO_CHECK_EXTENSION, 0x8)    = 1
> > > ...
> > > readlink("/sys/bus/pci/devices/0001:00:07.0/iommu_group",
> > > "../../../kernel/iommu_groups/1", 4096) = 30
> > > openat(AT_FDCWD, "/dev/vfio/1", O_RDWR) = 25
> > > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
> > > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
> > > ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
> > > ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
> > > ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
> > > write(1, "EAL:   0001:00:07.0 failed to select IOMMU type\n", 48) = 48
> > > ...
> > > readlink("/sys/bus/pci/devices/0002:00:08.0/iommu_group",
> > > "../../../kernel/iommu_groups/2", 4096) = 30
> > > openat(AT_FDCWD, "/dev/vfio/2", O_RDWR) = 25
> > > ioctl(25, VFIO_GROUP_GET_STATUS, 0x7fffd7d94410) = 0
> > > ioctl(25, VFIO_GROUP_SET_CONTAINER, 0x7fffd7d94408) = 0
> > > ioctl(11, VFIO_SET_IOMMU, 0x1)          = -1 ENODEV (No such device)
> > > ioctl(11, VFIO_SET_IOMMU, 0x7)          = -1 EPERM (Operation not permitted)
> > > ioctl(11, VFIO_SET_IOMMU, 0x8)          = -1 ENODEV (No such device)
> > > write(1, "EAL:   0002:00:08.0 failed to select IOMMU type\n", 48) = 48
> > 
> > Alex, do you know why we have the ENODEV and EPERM (it's in a KVM guest on
> > POWER9).
> > I've enabled the CONFIG_VFIO_NOIOMMU option in a RHEL8 kernel.
> 
> A device bound to vfio-pci making use of no-iommu will create a noiommu vfio
> group file, ex. /dev/vfio/noiommu-1.  Only these groups can be used with the
> no-iommu drive and is meant to make sure that no-iommu is not a directly
> fungible iommu backend, the userspace driver needs to be aware of the
> difference.  The openat() calls are successfully opening a regular vfio
> group file, which suggests there is some sort of vIOMMU support in the VM. 
> These groups cannot be used with no-iommu.

Thank you Alex.

This explains why ioctl(VFIO_SET_IOMMU) with VFIO_NOIOMMU_IOMMU (8) returns ENODEV and why ioctl(VFIO_SET_IOMMU) with VFIO_SPAPR_TCE_v2_IOMMU (7) returns EPERM.

So I think the vIOMMU support we have here is the TCE v2.

Comment 39 David Gibson 2021-03-31 01:10:01 UTC

> do we support vfio _inside_ a pseries guest?

AFAIK, yes.

I'm confused as to why noiommu is coming into this discussion.  pseries guests *always* have a (paravirtualized) vIOMMU - it's part of the PAPR spec.

Comment 40 Laurent Vivier 2021-03-31 06:30:24 UTC

(In reply to David Gibson from comment #39)
> > do we support vfio _inside_ a pseries guest?
> 
> AFAIK, yes.
> 
> I'm confused as to why noiommu is coming into this discussion.  pseries

It's my fault, I misunderstood the use of vIOMMU in pseries.

> guests *always* have a (paravirtualized) vIOMMU - it's part of the PAPR spec.

So the question now is why PAPR vIOMMU doesn't work with testpmd (DPDK)

Comment 41 Ping Zhang 2021-03-31 16:57:01 UTC

(In reply to Laurent Vivier from comment #26)
> It seems the problem can depend on the host type.
> 
> With my simple test, I don't have the KVM error with an IBM 8335-GTW
> (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u).
> 
> Ping,
> 
> could you check your server type, and if it's not an IBM Witherspoon re-run
> you test on a Witerspoon (without the '-M
> pseries,ic-mode=xics,kernel-irqchip=on' parameter)?
> 
> Thanks

My Server used to test is 9006-22P (supermicro,p9dsu2u),
I re-run the virt-install command on the 8335-GTC (ibm,witherspoon) without the 
 '-M pseries,ic-mode=xics,kernel-irqchip=on' parameter,
it works well, no the issues on boston type.

Maybe, you are right, it seems depend on host type.

Comment 42 Ping Zhang 2021-04-01 06:58:24 UTC

(In reply to Laurent Vivier from comment #26)
> It seems the problem can depend on the host type.
> 
> With my simple test, I don't have the KVM error with an IBM 8335-GTW
> (witherspoon), but I have the KVM error with a SuperMicro 9006-22P (p9dsu2u).
> 
> Ping,
> 
> could you check your server type, and if it's not an IBM Witherspoon re-run
> you test on a Witerspoon (without the '-M
> pseries,ic-mode=xics,kernel-irqchip=on' parameter)?
> 
> Thanks

My Server used to test is 9006-22P (supermicro,p9dsu2u),
I re-run the virt-install command on the 8335-GTC (ibm,witherspoon) without the 
 '-M pseries,ic-mode=xics,kernel-irqchip=on' parameter,
it works well, no the issues on boston type.

Maybe, you are right, it seems depend on host type.

Comment 46 Laurent Vivier 2021-04-06 13:23:47 UTC

vhost-user is not supported on PPC64LE, so close as WONTFIX

Note You need to log in before you can comment on or make changes to this bug.