1516114 – RHEL8.0 (was: RHEL7.5) [Power] Guest can't send out packets with dpdkvhostuser backend and rx_mrgbuff=on - Fast Train

Bug 1516114 - RHEL8.0 (was: RHEL7.5) [Power] Guest can't send out packets with dpdkvhostuser backend and rx_mrgbuff=on - Fast Train

Summary: RHEL8.0 (was: RHEL7.5) [Power] Guest can't send out packets with dpdkvhostuse...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.0
Hardware:	ppc64le
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	8.1
Assignee:	Virtualization Maintenance
QA Contact:	Yihuang Yu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1451450 1586275 1624641 1756269
TreeView+	depends on / blocked

Reported:	2017-11-22 05:22 UTC by Zhengtong
Modified:	2020-02-14 11:34 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-26 07:21:18 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Example to boot ovs-dpdk in x86 (1.99 KB, application/x-shellscript) 2017-12-01 08:30 UTC, Pei Zhang	no flags	Details
XML file to boot guest with dpdkvhostuser ports (1.50 KB, text/plain) 2018-01-03 22:03 UTC, Mick Tarsel	no flags	Details
OVS Logs and Debugging (5.10 KB, application/x-gzip) 2018-02-19 19:49 UTC, Mick Tarsel	no flags	Details
ovs vswitchd log file (198.50 KB, text/plain) 2019-08-28 04:44 UTC, Yihuang Yu	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	162979	0	None	None	None	2019-08-05 02:55:31 UTC

Description Zhengtong 2017-11-22 05:22:18 UTC

Description of problem:
Guest can't send out package with dpdkvhostuser backend

Version-Release number of selected component (if applicable):
Host kernel: 3.10.0-796.el7.ppc64le
Guest kernel: 3.10.0-784.el7.ppc64le
qemu-kvm-rhev-2.10.0-6.el7.ppc64le
openvswitch-2.7.3-2.git20171010.el7fdp.ppc64le
SLOF-20170724-2.git89f519f.el7.noarch

How reproducible:
Mostly

Steps to Reproduce:
1. Configure hugepage on host and mount to /mnt/hugetlbfs
[root@ibm-p8-rhevm-14 ~]# echo 768 > /proc/sys/vm/nr_hugepages
[root@ibm-p8-rhevm-14 home]# mount -t hugetlbfs none /mnt/hugetlbfs

2. Add dpdk-init=true for ovs
[root@ibm-p8-rhevm-14 ~]# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

3. Start openvswitch deamon by "systemctl start openvswitch"

4. create a ovs bridge, and assign an IP to the ovs-br0 interface
[root@ibm-p8-rhevm-14 ~]# ovs-vsctl add-br ovs-br0 -- set bridge ovs-br0 datapath_type=netdev
[root@ibm-p8-rhevm-14 ~]# ifconfig ovs-br0 192.168.1.2/24 up

5. add a ovs-port to the bridge:
[root@ibm-p8-rhevm-14 ~]# ovs-vsctl add-port ovs-br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser

[root@ibm-p8-rhevm-14 ~]# ovs-vsctl show
f0f66fea-e8d3-4215-8735-e071d3f27e78
    Bridge "ovs-br0"
        Port "ovs-br0"
            Interface "ovs-br0"
                type: internal
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
    ovs_version: "2.7.3"

6. Boot a guest with the vhost-user1 interface as the backend of virtio-net-pci device
/usr/libexec/qemu-kvm ... \
 ...
 -chardev socket,id=char1,path=/run/openvswitch/vhost-user1 \
 -netdev vhost-user,id=mynet0,chardev=char1,vhostforce=on \
 -device virtio-net-pci,mac=9a:54:55:56:57:58,id=idMCKaId,netdev=mynet0,bus=pci.0,addr=0x5 \
 ...

7. Assign IP for the nic device inside guest
[root@localhost ~]# ifconfig eth0 192.168.1.10/24 up

8. Ping test to the ovs-br0 interface in host.

Actual results:
Can't ping through
[root@localhost ~]# ping 192.168.1.2 -c 5
ping 192.168.1.2 -c 5
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
From 192.168.1.10 icmp_seq=1 Destination Host Unreachable
From 192.168.1.10 icmp_seq=2 Destination Host Unreachable
From 192.168.1.10 icmp_seq=3 Destination Host Unreachable
From 192.168.1.10 icmp_seq=4 Destination Host Unreachable
From 192.168.1.10 icmp_seq=5 Destination Host Unreachable

--- 192.168.1.2 ping statistics ---
5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4007ms
pipe 3


Expected results:
Should ping through the guest.

Additional info:

Comment 2 Zhengtong 2017-11-22 05:23:15 UTC

Full command to boot the guest:
[root@ibm-p8-rhevm-14 home]# cat 1guest.sh 
/usr/libexec/qemu-kvm \
 -name avocado-vt-vm1 \
 -sandbox off \
 -machine pseries \
 -m 2048 \
 -mem-path /mnt/hugetlbfs \
 -nodefaults \
 -vga std \
 -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_9K/monitor-qmpmonitor1-20171120-050813-zs9gqHXL,server,nowait \
 -mon chardev=qmp_id_qmpmonitor1,mode=control \
 -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_9K/monitor-catch_monitor-20171120-050813-zs9gqHXL,server,nowait \
 -mon chardev=qmp_id_catch_monitor,mode=control \
 -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_9K/serial-serial0-20171120-050813-zs9gqHXL,server,nowait \
 -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
 -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
 -drive id=drive_image1,if=none,snapshot=on,aio=threads,cache=none,format=qcow2,file=/home/staf-kvm-devel/vt_test_images/rhel75-ppc64le-virtio-scsi.qcow2 \
 -device scsi-hd,id=image1,drive=drive_image1 \
 -smp 4 \
 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
 -device usb-kbd \
 -device usb-mouse \
 -vnc :0 \
 -rtc base=utc,clock=host \
 -monitor stdio \
 -enable-kvm \
 -global spapr-pci-host-bridge.pgsz=0x1011000 \
 -chardev socket,id=char1,path=/run/openvswitch/vhost-user1 \
 -netdev vhost-user,id=mynet0,chardev=char1,vhostforce=on \
 -device virtio-net-pci,mac=9a:54:55:56:57:58,id=idMCKaId,netdev=mynet0,bus=pci.0,addr=0x5 \

Comment 3 Zhengtong 2017-11-22 05:24:18 UTC

the TX and RX statics for the eth0 inside guest is still 0 after doing ping test.


[root@localhost ~]# ifconfig eth0 192.168.1.10/24 up
ifconfig eth0 192.168.1.10/24 up
[root@localhost ~]# ping 192.168.1.2 -c 5
ping 192.168.1.2 -c 5
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
From 192.168.1.10 icmp_seq=1 Destination Host Unreachable
From 192.168.1.10 icmp_seq=2 Destination Host Unreachable
From 192.168.1.10 icmp_seq=3 Destination Host Unreachable
From 192.168.1.10 icmp_seq=4 Destination Host Unreachable
From 192.168.1.10 icmp_seq=5 Destination Host Unreachable

--- 192.168.1.2 ping statistics ---
5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4007ms
pipe 3
[root@localhost ~]# ifconfig
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.10  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::9854:55ff:fe56:5758  prefixlen 64  scopeid 0x20<link>
        ether 9a:54:55:56:57:58  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Comment 4 Zhengtong 2017-11-23 00:56:05 UTC

I haven't get a chance to try it on x86 host. because I hit a problem in the below while start ovs service on x86. 

Nov 23 08:50:13 dhcp-9-122.nay.redhat.com dhclient[24144]: DHCPREQUEST on eno1 to 10.72.17.5 port 67 (xid=0x415f057b)
Nov 23 08:50:30 dhcp-9-122.nay.redhat.com dhclient[24144]: DHCPREQUEST on eno1 to 10.72.17.5 port 67 (xid=0x415f057b)
Nov 23 08:50:39 dhcp-9-122.nay.redhat.com systemd[1]: ovs-vswitchd.service start operation timed out. Terminating.
Nov 23 08:50:39 dhcp-9-122.nay.redhat.com systemd[1]: Failed to start Open vSwitch Forwarding Unit.


I haven't figure out a solution on this issue on x86 yet.

Comment 5 David Gibson 2017-11-28 00:25:33 UTC

This probably is a POWER specific problem.  However, I know so little about DPDK and/or OVS that I don't really understand what's supposed to be happening.

I'm hoping someone who knows about OVS can look at this first, and tell me what's going wrong in lower-level terms I can then debug on the POWER side.

Comment 7 Pei Zhang 2017-12-01 08:30:29 UTC

Created attachment 1361469 [details]
Example to boot ovs-dpdk in x86

(In reply to Amnon Ilan from comment #6)
> (In reply to Zhengtong from comment #4)
> > I haven't get a chance to try it on x86 host. because I hit a problem in the
> > below while start ovs service on x86. 
> > 
> > Nov 23 08:50:13 dhcp-9-122.nay.redhat.com dhclient[24144]: DHCPREQUEST on
> > eno1 to 10.72.17.5 port 67 (xid=0x415f057b)
> > Nov 23 08:50:30 dhcp-9-122.nay.redhat.com dhclient[24144]: DHCPREQUEST on
> > eno1 to 10.72.17.5 port 67 (xid=0x415f057b)
> > Nov 23 08:50:39 dhcp-9-122.nay.redhat.com systemd[1]: ovs-vswitchd.service
> > start operation timed out. Terminating.
> > Nov 23 08:50:39 dhcp-9-122.nay.redhat.com systemd[1]: Failed to start Open
> > vSwitch Forwarding Unit.
> > 
> > 
> > I haven't figure out a solution on this issue on x86 yet.
> 
> Pei, can you have a look?
> Can we also try it without OVS? (PVP)

In x86, we boot ovs like this: besides "ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true", we also need below options:

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1"

I have attached full scripts to this Comment.


Also, seems memory share option is lost, it's needed for vhost-user.

-m 8G \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8G,host-nodes=0,policy=bind \
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \
...


Best Regards,
Pei

Comment 8 Zhengtong 2017-12-04 08:24:25 UTC

I am trying with the steps provides by pezhang. But till now, i still can't get the packets send out. 

I hit the "Cannot get a virtual area: Cannot allocate memory" ERROR, while run the scripts(I modified some part to let it suitable to run on my senario).

[root@ibm-p8-09 home]# sh boot_ovs_client.sh 
killing old ovs process
probing ovs kernel module
clean env
init ovs db and boot db server
2017-12-04T08:20:06Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
start ovs vswitch daemon
2017-12-04T08:20:06Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2017-12-04T08:20:06Z|00002|ovs_numa|INFO|Discovered 48 CPU cores on NUMA node 0
2017-12-04T08:20:06Z|00003|ovs_numa|INFO|Discovered 48 CPU cores on NUMA node 1
2017-12-04T08:20:06Z|00004|ovs_numa|INFO|Discovered 2 NUMA nodes and 96 CPU cores
2017-12-04T08:20:06Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2017-12-04T08:20:06Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2017-12-04T08:20:06Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2017-12-04T08:20:06Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2017-12-04T08:20:06Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --socket-mem 1024,1024
2017-12-04T08:20:06Z|00010|dpdk|INFO|EAL: Detected 24 lcore(s)
2017-12-04T08:20:06Z|00011|dpdk|INFO|EAL: Probing VFIO support...
2017-12-04T08:20:06Z|00012|dpdk|INFO|EAL: VFIO support initialized
2017-12-04T08:20:07Z|00013|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00014|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00015|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00016|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00017|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00018|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00019|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00020|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00021|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:07Z|00022|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
...

2017-12-04T08:20:08Z|01035|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:08Z|01036|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2017-12-04T08:20:08Z|01037|dpdk|INFO|DPDK pdump packet capture enabled
2017-12-04T08:20:08Z|01038|dpdk|INFO|DPDK Enabled - initialized
2017-12-04T08:20:08Z|01039|timeval|WARN|Unreasonably long 2337ms poll interval (79ms user, 2147ms system)
2017-12-04T08:20:08Z|01040|timeval|WARN|faults: 523 minor, 0 major
2017-12-04T08:20:08Z|01041|timeval|WARN|disk: 0 reads, 128 writes
2017-12-04T08:20:08Z|01042|timeval|WARN|context switches: 1226 voluntary, 18 involuntary
2017-12-04T08:20:08Z|01043|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=65a3af2a:
2017-12-04T08:20:08Z|01044|coverage|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 1
2017-12-04T08:20:08Z|01045|coverage|INFO|cmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 9
2017-12-04T08:20:08Z|01046|coverage|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0000/sec   total: 11
2017-12-04T08:20:08Z|01047|coverage|INFO|hmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 384
2017-12-04T08:20:08Z|01048|coverage|INFO|txn_unchanged              0.0/sec     0.000/sec        0.0000/sec   total: 2
2017-12-04T08:20:08Z|01049|coverage|INFO|txn_incomplete             0.0/sec     0.000/sec        0.0000/sec   total: 1
2017-12-04T08:20:08Z|01050|coverage|INFO|poll_create_node           0.0/sec     0.000/sec        0.0000/sec   total: 40
2017-12-04T08:20:08Z|01051|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 54
2017-12-04T08:20:08Z|01052|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 1
2017-12-04T08:20:08Z|01053|coverage|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2017-12-04T08:20:08Z|01054|coverage|INFO|util_xalloc                0.0/sec     0.000/sec        0.0000/sec   total: 9445
2017-12-04T08:20:08Z|01055|coverage|INFO|netdev_get_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 2
2017-12-04T08:20:08Z|01056|coverage|INFO|netlink_received           0.0/sec     0.000/sec        0.0000/sec   total: 3
2017-12-04T08:20:08Z|01057|coverage|INFO|netlink_sent               0.0/sec     0.000/sec        0.0000/sec   total: 1
2017-12-04T08:20:08Z|01058|coverage|INFO|90 events never hit
creating bridge and ports


At last , the ovs service started successfully with the that ERR. But I can not get the packets send out.

Modified script:
-----------------------
[root@ibm-p8-09 home]# cat boot_ovs_client.sh 
#!/bin/bash

set -e

echo "killing old ovs process"
pkill -f ovs- || true
pkill -f ovsdb || true

echo "probing ovs kernel module"
modprobe -r openvswitch || true
modprobe openvswitch

echo "clean env"
DB_FILE=/etc/openvswitch/conf.db
rm -rf /var/run/openvswitch
mkdir /var/run/openvswitch
rm -f $DB_FILE

echo "init ovs db and boot db server"
export DB_SOCK=/var/run/openvswitch/db.sock
ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema
ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file
ovs-vsctl --no-wait init

echo "start ovs vswitch daemon"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1"
ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log

echo "creating bridge and ports"

ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser 
ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser 
ovs-ofctl del-flows ovsbr0
ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"
--------------------------


Debug still going on....

Comment 9 Amnon Ilan 2017-12-04 11:46:04 UTC

(In reply to Zhengtong from comment #8)

> 
> Debug still going on....

Might be useful to try the PVP setup (with testpmd in host instead of OVS)

Comment 10 Pei Zhang 2017-12-05 10:40:47 UTC

(In reply to Amnon Ilan from comment #9)
> (In reply to Zhengtong from comment #8)
> 
> > 
> > Debug still going on....
> 
> Might be useful to try the PVP setup (with testpmd in host instead of OVS) 

Testing vhost-user without ovs: 

Note: As there is no physical network cards which dpdk support, so maybe we don't call below testing as PVP, however it's still a way to test vhost-user without ovs. 

Thanks Maxime for providing the correct way to boot testpmd in this scenario.
 

1. In host, boot testpmd with 2 vhost-user ports, and create an IO loopback.
# testpmd -l 1,3,5,7 \
--socket-mem=1024,1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user0' \
--vdev 'net_vhost1,iface=/tmp/vhost-user1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

testpmd> set fwd io
Set io packet forwarding mode
testpmd> start tx_first 64


2. In host, boot VM1 using one vhost-user port
# /usr/libexec/qemu-kvm \
-name guest=rhel7.5_nonrt_1 \
-cpu host \
-m 4G \
-smp 4,sockets=1,cores=4,threads=1 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=4G,host-nodes=0,policy=bind \
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.5_nonrt_1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 \
-chardev socket,id=charnet1,path=/tmp/vhost-user0 \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02 \
-monitor stdio \
-vnc :2


3. In host, boot VM2 using another vhost-user port
# /usr/libexec/qemu-kvm \
-name guest=rhel7.5_nonrt_2 \
-cpu host \
-m 4G \
-smp 4,sockets=1,cores=4,threads=1 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=4G,host-nodes=0,policy=bind \
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.5_nonrt_2.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 \
-chardev socket,id=charnet1,path=/tmp/vhost-user1 \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:03 \
-monitor stdio \
-vnc :3

4. Start your ping testing between VM1 and VM2.


Zhengtong, could you please have a try? And please let me know if I could be further help.

Best Regards,
Pei

Comment 11 Zhengtong 2017-12-06 07:29:34 UTC

Thanks for the steps suggestion, Pei.

But I still hit the "EAL: Cannot get a virtual area: Cannot allocate memory" issue while starting up testpmd deamon.

[root@ibm-p8-09 home]# testpmd -l 48,56,64,72 --socket-mem=1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user10' --vdev 'net_vhost1,iface=/tmp/vhost-user11' -- --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=2 --forward-mode=io

...
EAL: Cannot get a virtual area: Cannot allocate memory
EAL: Cannot get a virtual area: Cannot allocate memory
EAL: Cannot get a virtual area: Cannot allocate memory
EAL: WARNING: Master core has no memory on local socket!
PMD: Initializing pmd_vhost for net_vhost0
PMD: Creating VHOST-USER backend on numa socket 16
PMD: Initializing pmd_vhost for net_vhost1
PMD: Creating VHOST-USER backend on numa socket 16
EAL: No probed ethernet devices
Invalid port 0
Interactive-mode selected
Set io packet forwarding mode
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
Done
testpmd> set fwd io
Set io packet forwarding mode
testpmd> start tx_first 64
io packet forwarding - ports=0 - cores=0 - streams=0 - NUMA support disabled, MP over anonymous pages disabled

  io packet forwarding - CRC stripping enabled - packets/burst=32
  nb forwarding cores=2 - nb forwarding ports=0
  RX queues=1 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0
testpmd> 



the vhost-user was not generated after the testpmd starting up. Hugepages was set before the testpmd starting.

I am trying with modefying the value of the params of testpmd

Comment 12 Zhengtong 2017-12-06 11:11:20 UTC

No matter with testpmd or with ovs-dpdk. I always come accross the error:

"EAL: Cannot get a virtual area: Cannot allocate memory"

Till now, I haven't get the ping packet through b/w two VMs.

I have get the testpmd application cmd like this:


# testpmd -l  0,8,16,24 --socket-mem=1024,1024,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user10' --vdev 'net_vhost1,iface=/tmp/vhost-user11' -- --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=2 --forward-mode=io


This application cat start up with the error above , and failed to ping though b/w 2 Vms.

With simple checking on the dpdk source code and compile flag, there are some flags related with PowerPC was no set while compiling, for example: RTE_ARCH_PPC_64..  I am not sure if these flag has any relationship with the issue.

Comment 13 Zhengtong 2017-12-08 08:18:02 UTC

Amnon, 
have any ideas on this error ? or any suggestions on setting up the evironment ?

Comment 14 Pei Zhang 2017-12-12 10:16:16 UTC

Hi Kevin, 

Seems the dpdk's testpmd can not boot up in the PowerPC host. 

In Zhengtong's testing(see Comment 12), the error shows that this is a memory issue. In x86, we test with 1G/2M hugepage size, however in PowerPC it's 16M. So we don't know the correct way to boot testpmd (especially the memory) in PowerPC host. 

Do you have any suggestions or ideas? Thanks.


Best Regards,
Pei

Comment 15 Kevin Traynor 2017-12-13 14:04:33 UTC

hi Pei, I haven't done any testing on PowerPC. Josh or John may be best to comment on this. I agree it seems to be a memory issue and nothing really to do with VM's. I suggest to check memory:

cat /proc/meminfo | grep Huge
cat /sys/devices/system/node/node*/meminfo | fgrep Huge
grep hugepages /proc/mounts

and try to achieve basic testpmd in the host with 2 physical nics and no error message, something like

testpmd -c 0x1f -n 4 --socket-mem 1024,1024 -w 0000:00:03.0 -w 0000:00:04.0 -- -i --disable-hw-vlan --rxq=1 --txq=1 --rxd=256 --txd=256 --forward-mode=io --auto-start

thanks,
Kevin.

Comment 16 John W. Linville 2017-12-13 19:17:42 UTC

It has been a while since I tested with such an environment. IIRC, there was a particular qemu command line option that had to be used to make things work. I'll see if I can find it...

Comment 17 John W. Linville 2017-12-13 19:50:16 UTC

http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/

I think these lines are key:

    <numa>
      <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
    </numa>

In particular, the "memAccess='shared'" option. IIRC, that made the dpdkvhostuser ports start working for me on ppc64le.

Comment 18 Zhengtong 2017-12-14 01:42:42 UTC

John. Did you hit the "EAL: Cannot get a virtual area: Cannot allocate memory"
 issue while setting up the ENV on host ? not matter with testpmd or dpdp-ovs ?

I found the testpmd/dpdk-ovs deamon can still start up with that issue.
I want to confirm this issue have no effect with the packet send/receive function.

thanks
Zhengtong

Comment 19 John W. Linville 2017-12-14 16:01:37 UTC

Honestly, I'm not sure. That sounds familiar, but I don't recall if it impacted functionality or not.

Comment 20 Zhengtong 2017-12-28 09:15:04 UTC

(In reply to John W. Linville from comment #17)
> http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/
> 
> I think these lines are key:
> 
>     <numa>
>       <cell id='0' cpus='0-1' memory='4194304' unit='KiB'
> memAccess='shared'/>
>     </numa>
> 
> In particular, the "memAccess='shared'" option. IIRC, that made the
> dpdkvhostuser ports start working for me on ppc64le.

I tested with the command line in qemu:
"
 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=1G,host-nodes=0,policy=bind \
 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \

"
("share=yes"here is the same as the memAccess=shared in libvirt xml file)

The guests  still can not get through b/w each other.


Besides server mode , I also tried client mode with  "dpdkvhostuserclient" interface type in ovs side. and the target server-path didn't even generated.


[root@ibm-p8-rhevm-07 ~]# ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock
[root@ibm-p8-rhevm-07 ~]# ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser2.sock

[root@ibm-p8-rhevm-07 ~]# ls /tmp/vhost*
ls: cannot access /tmp/vhost*: No such file or directory

I think the function still has some issues for that.


the openvswitch version I used : openvswitch-2.9.0-0.2.20171212git6625e43.el7fdb.ppc64le

Comment 21 Zhengtong 2017-12-28 11:25:00 UTC

Again with "dpdkvhostuserclient":

I found the steps in comment #20 should be adjusted.

1. Boot up guest with :
...
 -chardev socket,id=char1,path=/tmp/vhostuser1.sock,server\
 -netdev vhost-user,id=mynet0,chardev=char1,vhostforce=on\
 -device virtio-net-pci,mac=9a:54:55:56:57:12,id=idMCKaId,netdev=mynet0,bus=pci.0,addr=0x5
...


...
 -chardev socket,id=char1,path=/tmp/vhostuser2.sock,server\
 -netdev vhost-user,id=mynet0,chardev=char1,vhostforce=on\
 -device virtio-net-pci,mac=9a:54:55:56:57:13,id=idMCKaId,netdev=mynet0,bus=pci.0,addr=0x5
...

2. then set up ovs-port in ovs bridge:
#ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
#ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock
#ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser2.sock
#ovs-ofctl del-flows ovsbr0
#ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
#ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"

3. After guest boot up . configure IP for each of them

4. The guest can not ping through.

Comment 24 John W. Linville 2018-01-03 15:49:55 UTC

Mick Tarsel, do you have any input on what it takes to get dpdkvhostuser ports working on ppc64le hosts? It seems like we may be missing some specific qemu option or something like that. I had this working months ago, but I don't recall the specific combination now.

Comment 25 Mick Tarsel 2018-01-03 22:02:09 UTC

I tried this on my ppc box about a dozen times (with a reboot) and was unable to reproduce, however I'm running RHEL 7.4 GA. I will upgrade to 7.5 and try again later. Thought I'd share my results before I upgrade. I'm sticking with a single guest as per the first comment when bug was submitted. Also, I have NUMA turned off with 1024 hugepages (size of 16mb).

# rpm -qa | grep openv
openvswitch-2.7.3-2.git20171010.el7fdp.ppc64le

Machine info:
# uname -r
3.10.0-693.1.1.el7.ppc64le

# cat /proc/cmdline 
root=UUID=7e27ea60-9338-4fbf-8cdf-ea4d7a39102b ro crashkernel=auto hugepages=1024 isolcpus=40-43,48-51,56-59 numa=off

(after some tests)
# grep Huge /proc/meminfo 
AnonHugePages:         0 kB
HugePages_Total:    1024
HugePages_Free:      768
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:      16384 kB

# cat /etc/os-release 
NAME="Red Hat Enterprise Linux Server"
VERSION="7.4 (Maipo)"

Here are the steps on host:
# ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
# ovs-vsctl add-port br0 vhost-vm1 -- set Interface vhost-vm1 type=dpdkvhostuser 
# ip add add 192.168.2.1/24 dev br0; ip link set dev br0 up

# ovs-vsctl show
d023a6c0-b5ab-4aa2-9b64-fee4a5e8b23f
    Bridge "br0"
        Port "br0"
            Interface "br0"
                type: internal
        Port "vhost-vm1"
            Interface "vhost-vm1"
                type: dpdkvhostuser
    ovs_version: "2.7.3"

# ip addr show br0
17: br0: <BROADCAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether aa:81:5a:97:84:4c brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.1/24 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::a881:5aff:fe97:844c/64 scope link 
       valid_lft forever preferred_lft forever

Make changes to XML file attached in order to boot on your machine. (qcow2 location should be the only change??)
# virsh define vhost-guest.xml
# virsh start v1 --console

Inside guest:
# ip add add 192.168.2.10/24 dev eth0; ip link set dev eth0 up
# ip route add default via 192.168.2.1
# ping 192.168.2.1

VM output. MAC address is from XML file.
[root@nwpoktuleta124 PVP-ppc64le]# virsh console v1
Connected to domain v1
Escape character is ^]

:/# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.10/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::200:ff:fe00:1/64 scope link 
       valid_lft forever preferred_lft forever
:/# ping -c3 192.168.2.1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=0.129 ms
64 bytes from 192.168.2.1: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 192.168.2.1: icmp_seq=3 ttl=64 time=0.088 ms

--- 192.168.2.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.046/0.087/0.129/0.035 ms

Additionally, I can see packets on the host with this command:
# watch -n.5 'ovs-ofctl dump-ports br0 vhost-vm1'

Try this XML file along with the ip commands rather than ifconfig. I'm not so sure RHEL 7.5 will make much difference for _this_ setup but I'll install it and see what happens. Nothing in your QEMU command sticks out to me as a problem so I'd agree with John and maybe you're missing something. Don't worry about the EAL memory issue for now.

Comment 26 Mick Tarsel 2018-01-03 22:03:24 UTC

Created attachment 1376622 [details]
XML file to boot guest with dpdkvhostuser ports

Comment 27 Mick Tarsel 2018-01-03 22:06:57 UTC

Please reply with your results when using attached XML file.

Comment 28 Tony Breeds 2018-01-04 00:13:27 UTC

Based on Mick's summary and looking at the OpenStack code this will impact OpenStack on ppc64le so if this is a KVM bug it'd be good to get this fixed in 7.5 as that's what the next RHOS release will be based on

Comment 29 Zhengtong 2018-01-04 10:56:31 UTC

(In reply to Mick Tarsel from comment #27)
> Please reply with your results when using attached XML file.

Finally. I can get through the packets between guest and host with dpdkvhostuser port. 


1. I am not sure if this is a bug . The packet can only get though while we set "mrg_rxbuf=off" for virtio-net-pci device.  If "mrg_rxbuf=on", the arp reply packet would be dropped by OVS bridge port br0, that's why I can not get through previously

2. While I set : 
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="4096"

and #systemctl restart openvswitch.
It will be an error saying "dpdk|ERR|EAL: unsupported cpu type", which blocked the service from starting.

But I can  start the ovs-vswitchd service manually by the script supplied by peizhang. and the packets can get through if set "mrg_rxbuf=off".


Version:
openvswitch-2.8.0-4.el7fdb.ppc64le  /  3.10.0-823.el7.ppc64le

Comment 30 John W. Linville 2018-01-04 18:52:14 UTC

"dpdk|ERR|EAL: unsupported cpu type" certainly seems suspicious, like you are running on the wrong kind of hardware. Could you attach your /proc/cpuinfo?

Comment 31 Zhengtong 2018-01-05 02:25:39 UTC

[root@ibm-p8-garrison-05 home]# LD_SHOW_AUXV=1 /bin/true 
AT_DCACHEBSIZE:  0x80
AT_ICACHEBSIZE:  0x80
AT_UCACHEBSIZE:  0x0
AT_SYSINFO_EHDR: 0x3fffaf960000
AT_HWCAP:        true_le archpmu vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0x10000040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0x3fffaf980000
AT_FLAGS:        0x0
AT_ENTRY:        0x1000147c
AT_UID:          0
AT_EUID:         0
AT_GID:          0
AT_EGID:         0
AT_SECURE:       0
AT_RANDOM:       0x3fffcd1ca8b2
AT_HWCAP2:       htm-nosc vcrypto tar isel ebb dscr htm arch_2_07
AT_EXECFN:       /bin/true
AT_PLATFORM:     power8
AT_BASE_PLATFORM:power8
[root@ibm-p8-garrison-05 home]# cat /proc/cpuinfo 
processor	: 0
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 8
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 16
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 24
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 32
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 40
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 48
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 56
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 64
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 72
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 80
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 88
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 96
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 104
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 112
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 120
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

timebase	: 512000000
platform	: PowerNV
model		: 8335-GTB        
machine		: PowerNV 8335-GTB        
firmware	: OPAL v3

Comment 32 Zhengtong 2018-01-08 03:00:52 UTC

Mick, 

the packets can only be transmitted while "mrg_rxbuff=off" . Is this expected? 

Thanks
Zhengtong

Comment 33 Mick Tarsel 2018-01-08 18:43:50 UTC

Yes merge-able RX buffers should be turned off for this setup - I cannot get it to ping any other way. OVS documentation recommends disabling merge-able buffers for dpdkvhostuser ports in order to increase performance compared to 'out of the box' performance.

Comment 34 Zhengtong 2018-01-09 04:50:14 UTC

Per comment https://bugzilla.redhat.com/show_bug.cgi?id=1451450#c25

I think we can close this bug, if mrg_rxbuff=off is expected

Comment 35 Jens Freimann 2018-01-15 10:01:21 UTC

So according to OVS documentation mrg_rxbuff=off is only required to improve performance but it should actually also work with mrg_rxbuff=on? 

Do we know if it is the same on x86?

Comment 36 Zhengtong 2018-01-16 04:55:52 UTC

Hi pei,

Could you help check it on x86 environment?

Thanks
Zhengtong

Comment 37 Pei Zhang 2018-01-17 03:54:11 UTC

Hi Zhengtong, Jens,

With x86, mrg_rxbuff=off/on both work well.

Tests:
(1)Ping work well.
(2)dpdk's testpmd work well.

Versions:
3.10.0-829.el7.x86_64
qemu-kvm-rhev-2.10.0-16.el7.x86_64
dpdk-17.11-5.el7.x86_64
openvswitch-2.9.0-0.3.20171212git6625e43.el7fdb.x86_64


Best Regards,
Pei

Comment 39 Mick Tarsel 2018-02-02 21:10:28 UTC

I have continued to try and get to the bottom of this with no luck :( I have tried different settings with the guest, host machine, and the OVS port and still cannot ping ovs bridge from guest with mrg_rxbuf=on.

First of all, according to DPDK commit message 284ae3e9ff9a92575c28c858efd2c85c8de6d440,

On IBM POWER system, the nr_overcommit_hugepages should be set to the same value as nr_hugepages.
+ For example, if the required page number is 128, the following commands are used::
+
+ echo 128 > /sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages
+ echo 128 > /sys/kernel/mm/hugepages/hugepages-16384kB/nr_overcommit_hugepages

This will remove the EAL memory warning when starting openvswitch via systemctl.

Using the same XML file attached to this bz with 2 queues for the virtio host driver.

I've also added this option to ovs bridge:
# ovs-vsctl set Interface vhost0 options:n_rxq=2

And inside the guest, I do
# ethtool -L eth0 combined 2

The QEMU process has mq=on (multi queue) with appropriate number of vectors and queues.

OVS reports dropped TX packets when ping is running in guest. The guest has zero RX bytes/packets received, so the ARP exchange will be incomplete in guest. I've tried this with Network Manager disabled in guest as well... in all the cases the guest still cannot ping the bridge.

The only suspicious error I see is when the guest is shutting down, every time the guest shuts down. Even if mrg_rxbuff=on or off I see the following in /var/log/messages, (reported from lib/librte_vhost/socket.c)

ovs-vswitchd[10627]: ovs|01918|dpdk(vhost_thread1)|ERR|VHOST_CONFIG: recvmsg failed
libvirtd: 2018-02-02 20:30:28.153+0000: 5065: error : qemuMonitorIO:697 : internal error: End of file from qemu monitor

perhaps this is unrelated because it happens every time the guest shuts down but could be a clue for dpdkvhostuser sockets. Other than this error, I don't have any other serious leads right now.

Still using
openvswitch-2.7.3-2.git20171010.el7fdp.ppc64le
which is using dpdk 16.11.3

Anyone have any recommendations?

Comment 40 Mick Tarsel 2018-02-09 02:08:31 UTC

A couple updates here:
1. Without dpdkvhostuser ports, but with mrg_rxbuff=on with OVS connecting guests the ping works
==========

Adding this to XML file will make ping work, could be another route to work backwards from and compare why this works and dpdkvhostuser ports do not. 
  
<interface type='bridge'>
      <source bridge='ovsbr'/>
      <virtualport type='openvswitch'>
      </virtualport>
      <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='on'/>
      </driver>
    </interface>

Once VM is booted, then the vnet devices will appear in ovs-vsctl show output under the ovsbr bridge. This helps to narrow down the problem a little more, specifically to dpdk and the use of dpdkvhostuser ports.

2. With dpdkvhostuser ports and mrg_rxbuff=on it appears TX drops are due to is_vhost_running() returning false.
===========

I traced backwards where the TX drops were reported by ovs via the ovs-ofctl dump-ports command. I attached gdb to ovs, ' gdb -p $(pgrep ovs-vswitchd) ' and then had the guest continuously ping the bridge. I found TX drops reported in __netdev_dpdk_vhost_send() and specifically in the first if statement because of is_vhost_running(). With mrg_rxbuff=on I can see the return value of is_vhost_running() is false and so tx_dropped++. I think is_vhost_running() failed because of netdev_dpdk_get_vid() but by this point in my gdb session all packets will drop because I have "paused" the ovs-vswitchd process while it is loaded in gdb. Guest will report Destination Host Unreachable no matter mrg_rxbuff=on or off.

With mrg_rxbuff=off, the pings work until I attach gdb to ovs-vswitchd. If I can set my breakpoints fast enough, I can see is_vhost_running() returns true and so TX drops do not increment at all.

Not sure if the return value of is_vhost_running() is the root cause of the problem or just a symptom of a bigger problem. I tried running " ovs-ofctl mod-port br0 vhost0 up " but does not make a difference and pings still fail with mrg_rxbuff=on. I'll admit it's a sketchy debug environment because I need to be pumping data thru OVS to get gdb output but I cannot add print messages to an rpm so gdb is all I have right now.

Comment 41 David Gibson 2018-02-13 00:46:21 UTC

Sorry, I'm not sure what question you want me to answer here.  The problems still seem to be only described in terms of OVS terminology which means nothing to me.

The eof on the qemu monitor is interesting, but I'm not sure quite what to make of it.  I might have guessed that was due to qemu hitting some error and exiting, but the other information here suggests that's not happening.

Can we have a look at the last libvirt logs (especially the log libvirt forwards from qemu) leading up to that mointor EOF?

Comment 42 Mick Tarsel 2018-02-16 00:20:30 UTC

After I was able to build the rpm on my machine I have _finally_ found out where the issue is stemming from using some debug print messages.

With mergeable RX buffers turned on using DPDK 16.11.3-stable, it fails to get any descriptors from the vring. It will try to get all available ring items yet still cannot get a buffer to use. 

The reason it works with mergeable buffers turned on is due to rte_vhost_enqueue_burst(), a DPDK function, which is called from __netdev_dpdk_vhost_send(), an OVS function.

       if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)){....

As stated before, the problem is with DPDK. The pings do not complete because of virtio_dev_merge_rx() call to reserve_avail_buf_mergeable(). Inside this function there is:

                /*
                 * if we tried all available ring items, and still
                 * can't get enough buf, it means something abnormal
                 * happened.
                 */
                if (unlikely(tries >= vq->size)){
                        return -1;
                }

This failure will trickle back to rte_vhost_enqueue_burst() which will not enqueue any packets and so no packets are RX'd by the guest.

dpdkvhostuser ports are deprecated in future OVS releases (not in v2.7) however this could impact OpenStack if rx_mrgbuff=on (the default setting) in guests using OVS version 2.7 as a switch.

What are the next steps?

Comment 43 Jens Freimann 2018-02-19 14:29:20 UTC

Thanks for debugging Mick! It would be helpful to get the DPDK logs. It can be enabled with 'ovs-appctl vlog/set dpdk:file:dbg'

Comment 44 Mick Tarsel 2018-02-19 19:49:30 UTC

Created attachment 1398007 [details]
OVS Logs and Debugging

Attached are ovs logs from /var/log/openvswitch and some helpful debug print messages. Here are my commands on host:

# systemctl start openvswitch
# ovs-appctl vlog/set dpdk:file:dbg
# ovs-vsctl show
7975c6e8-b0dd-44b6-9045-3fa20811b988
    ovs_version: "2.7.3"
# ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
# ovs-vsctl add-port br0 vhost1 -- set Interface vhost1 type=dpdkvhostuser
# ip add add 192.168.1.1/24 dev br0
# ip link set dev br0 up

# virsh dumpxml dpdk1 | grep mrg
        <host mrg_rxbuf='on'/>


In the guest
==============
dpdk1 login: root
Password: 
Last login: Mon Feb 19 11:05:35 on hvc0
[root@dpdk1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 52:54:00:1d:23:09 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe1d:2309/64 scope link 
       valid_lft forever preferred_lft forever
[root@dpdk1 ~]# ping -c 1 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
From 192.168.1.2 icmp_seq=1 Destination Host Unreachable

--- 192.168.1.1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

[root@dpdk1 ~]# 

At this point I copied the logs from ovs.

Comment 46 IBM Bug Proxy 2018-08-29 17:03:41 UTC

------- Comment From seg.com 2018-08-29 12:50 EDT-------
Where are we with this bug? I am not at all clear who is expected to take what next step.

Comment 47 Jens Freimann 2018-08-30 09:00:51 UTC

(In reply to IBM Bug Proxy from comment #46)
> ------- Comment From seg.com 2018-08-29 12:50 EDT-------
> Where are we with this bug? I am not at all clear who is expected to take
> what next step.

I will get back to work on this very soon and make an update with the next steps.

Comment 48 IBM Bug Proxy 2018-10-11 07:30:23 UTC

------- Comment From urjawere.com 2018-10-11 03:21 EDT-------
Any Updates on this bug ?

Comment 51 IBM Bug Proxy 2019-05-10 17:40:32 UTC

------- Comment From drc 2019-05-10 13:38 EDT-------
According to libvirt documentation (https://libvirt.org/formatdomain.html#elementsDriverBackendOptions):

"Offloading options for the host and guest can be configured using the following sub-elements:

host
The csum, gso, tso4, tso6, ecn and ufo attributes with possible values on and off can be used to turn off host offloading options. By default, the supported offloads are enabled by QEMU. Since 1.2.9 (QEMU only) The mrg_rxbuf attribute can be used to control mergeable rx buffers on the host side. Possible values are on (default) and off. Since 1.2.13 (QEMU only)"

I have successfully used DPDK (v18.11.1) on RHEL 7.6 with testpmd/vhost as the backend for virtio in the guest (also RHEL 7.6).  In fact, setting "mrg_rxbuf=off" is not currently supported in DPDK v18.11.1 on ppc_64 architecture and produces the following error message:

testpmd> start
PANIC in virtio_recv_pkts_vec():
Wrong weak function linked by linker

Comment 54 Yihuang Yu 2019-08-28 04:44:34 UTC

Created attachment 1608818 [details]
ovs vswitchd log file

Update status:

host environment:
# rpm -qa | grep -P 'openvswitch|qemu-kvm-\d'
qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le
openvswitch-2.9.0-3.el8+7.ppc64le

2 issues:

1) 'add/del-br/port' executed via 'ovs-vsctl' will hang (after 'ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true')
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

# ovs-vsctl add-br ovs-br0 -- set Bridge ovs-br0 datapath_type=netdev
^C2019-08-28T02:57:14Z|00002|fatal_signal|WARN|terminating with signal 2 (Interrupt)

# ovs-vsctl add-port ovs-br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhost-user1
^C2019-08-28T02:58:33Z|00002|fatal_signal|WARN|terminating with signal 2 (Interrupt)

# ovs-vsctl show
3040dbd4-1092-4194-b8b0-251dd0fef1bc
    Bridge "ovs-br0"
        Port "ovs-br0"
            Interface "ovs-br0"
                type: internal
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhost-user1"}
    ovs_version: "2.9.0"

# /usr/libexec/qemu-kvm  -name avocado-vt-vm1  -machine pseries  -m 2048  -mem-path /mnt/hugetlbfs  -nodefaults  -vga std  -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait  -mon chardev=qmp_id_qmpmonitor1,mode=control  -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait  -mon chardev=qmp_id_catch_monitor,mode=control  -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial,server,nowait  -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0  -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3  -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4  -drive id=drive_image1,if=none,snapshot=on,aio=threads,cache=none,format=qcow2,file=/home/rhel810-ppc64le-virtio-scsi.qcow2  -device scsi-hd,id=image1,drive=drive_image1  -smp 4  -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  -device usb-kbd  -device usb-mouse  -vnc :0  -rtc base=utc,clock=host  -monitor stdio  -enable-kvm  -global spapr-pci-host-bridge.pgsz=0x1011000  -chardev socket,id=char1,path=/tmp/vhost-user1,server  -netdev vhost-user,id=mynet0,chardev=char1,vhostforce=on  -device virtio-net-pci,mac=9a:54:55:56:57:58,id=idMCKaId,netdev=mynet0,bus=pci.0,addr=0x5
qemu-kvm: -chardev socket,id=char1,path=/tmp/vhost-user1,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-user1,server
QEMU 4.1.0 monitor - type 'help' for more information
(qemu) qemu-kvm: Failed to read msg header. Read -1 instead of 12. Original request 11.
qemu-kvm: vhost VQ 0 ring restore failed: -1: Input/output error (5)
qemu-kvm: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu-kvm: vhost VQ 1 ring restore failed: -1: Invalid argument (22)
qemu-kvm: Failed to read from slave.
qemu-kvm: Failed to set msg fds.
qemu-kvm: Failed to set msg fds.
qemu-kvm: Failed to set msg fds.


2) Still hit 'dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory' while running the provided script
2019-08-28T03:31:16Z|02059|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2019-08-28T03:31:16Z|02060|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2019-08-28T03:31:16Z|02061|dpdk|ERR|EAL: Cannot get a virtual area: Cannot allocate memory
2019-08-28T03:31:16Z|02062|dpdk|INFO|EAL: PCI device 0002:01:00.0 on NUMA socket 0
2019-08-28T03:31:16Z|02063|dpdk|INFO|EAL:   probe driver: 8086:1589 net_i40e
2019-08-28T03:31:16Z|02064|dpdk|INFO|EAL: PCI device 0002:01:00.1 on NUMA socket 0
2019-08-28T03:31:16Z|02065|dpdk|INFO|EAL:   probe driver: 8086:1589 net_i40e
2019-08-28T03:31:16Z|02066|dpdk|INFO|EAL: PCI device 0002:01:00.2 on NUMA socket 0
2019-08-28T03:31:16Z|02067|dpdk|INFO|EAL:   probe driver: 8086:1589 net_i40e
2019-08-28T03:31:16Z|02068|dpdk|INFO|EAL: PCI device 0002:01:00.3 on NUMA socket 0
2019-08-28T03:31:16Z|02069|dpdk|INFO|EAL:   probe driver: 8086:1589 net_i40e
2019-08-28T03:31:16Z|02070|dpdk|INFO|DPDK Enabled - initialized
2019-08-28T03:31:16Z|02071|timeval|WARN|Unreasonably long 6951ms poll interval (259ms user, 6692ms system)
2019-08-28T03:31:16Z|02072|timeval|WARN|faults: 300 minor, 0 major
2019-08-28T03:31:16Z|02073|timeval|WARN|disk: 0 reads, 256 writes
2019-08-28T03:31:16Z|02074|timeval|WARN|context switches: 10 voluntary, 10 involuntary
2019-08-28T03:31:16Z|02075|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=1d9a65c8:
2019-08-28T03:31:16Z|02076|coverage|INFO|bridge_reconfigure         0.2/sec     0.017/sec        0.0003/sec   total: 1
2019-08-28T03:31:16Z|02077|coverage|INFO|cmap_expand                1.8/sec     0.150/sec        0.0025/sec   total: 9
2019-08-28T03:31:16Z|02078|coverage|INFO|miniflow_malloc            2.4/sec     0.200/sec        0.0033/sec   total: 12
2019-08-28T03:31:16Z|02079|coverage|INFO|hmap_expand               76.6/sec     6.383/sec        0.1064/sec   total: 383
2019-08-28T03:31:16Z|02080|coverage|INFO|txn_unchanged              0.4/sec     0.033/sec        0.0006/sec   total: 2
2019-08-28T03:31:16Z|02081|coverage|INFO|txn_incomplete             0.2/sec     0.017/sec        0.0003/sec   total: 1
2019-08-28T03:31:16Z|02082|coverage|INFO|poll_create_node           8.0/sec     0.667/sec        0.0111/sec   total: 40
2019-08-28T03:31:16Z|02083|coverage|INFO|seq_change                 9.0/sec     0.750/sec        0.0125/sec   total: 45
2019-08-28T03:31:16Z|02084|coverage|INFO|pstream_open               0.2/sec     0.017/sec        0.0003/sec   total: 1
2019-08-28T03:31:16Z|02085|coverage|INFO|stream_open                0.2/sec     0.017/sec        0.0003/sec   total: 1
2019-08-28T03:31:16Z|02086|coverage|INFO|util_xalloc              2283.8/sec   190.317/sec        3.1719/sec   total: 11419
2019-08-28T03:31:16Z|02087|coverage|INFO|netdev_get_hwaddr          0.2/sec     0.017/sec        0.0003/sec   total: 1
2019-08-28T03:31:16Z|02088|coverage|INFO|netlink_received           0.6/sec     0.050/sec        0.0008/sec   total: 3
2019-08-28T03:31:16Z|02089|coverage|INFO|netlink_sent               0.2/sec     0.017/sec        0.0003/sec   total: 1
2019-08-28T03:31:16Z|02090|coverage|INFO|90 events never hit
2019-08-28T03:31:16Z|02091|poll_loop|INFO|wakeup due to [POLLIN] on fd 11 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% CPU usage)
creating bridge and ports

Due to the above two issues, I can not try to reproduce this bug on RHEL8.

Comment 62 IBM Bug Proxy 2019-09-06 22:30:23 UTC

------- Comment From wilder.com 2019-09-06 18:28 EDT-------
(In reply to comment #31)
> Created attachment 137332 [details]
> ovs vswitchd log file
> ------- Comment on attachment From yihyu 2019-08-28 04:44:34
> EDT-------
> Update status:
> host environment:
> # rpm -qa | grep -P 'openvswitch|qemu-kvm-\d'
> qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le
> openvswitch-2.9.0-3.el8+7.ppc64le

openvswitch-2.9.0-3.el8+7.ppc64le is using dpdk v17.11, The "Cannot get a virtual area: Cannot allocate memory" error and the hang when setting dpdk-init=true on ppc64le are known issues with this version of dpdk..

Please try openvswitch2.11-2.11.0-18.el8fdp,  this version of ovs is using dpdk v18.11.2 and should not have these issue.

Also please insure you are testing on Power9,  dpdk is not supported on Power8 or earler.

Thank you for testing.

Comment 63 Yihuang Yu 2019-09-09 03:24:51 UTC

(In reply to IBM Bug Proxy from comment #62)
> ------- Comment From wilder.com 2019-09-06 18:28 EDT-------
> (In reply to comment #31)
> > Created attachment 137332 [details]
> > ovs vswitchd log file
> > ------- Comment on attachment From yihyu 2019-08-28 04:44:34
> > EDT-------
> > Update status:
> > host environment:
> > # rpm -qa | grep -P 'openvswitch|qemu-kvm-\d'
> > qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le
> > openvswitch-2.9.0-3.el8+7.ppc64le
> 
> openvswitch-2.9.0-3.el8+7.ppc64le is using dpdk v17.11, The "Cannot get a
> virtual area: Cannot allocate memory" error and the hang when setting
> dpdk-init=true on ppc64le are known issues with this version of dpdk..
> 
> Please try openvswitch2.11-2.11.0-18.el8fdp,  this version of ovs is using
> dpdk v18.11.2 and should not have these issue.
> 
> Also please insure you are testing on Power9,  dpdk is not supported on
> Power8 or earler.
> 
> Thank you for testing.

Hi IBM,
I used the latest openvswitch but also this problem still occurs but not quite the same.
# rpm -qa | grep -E 'openvswitch|dpdk'
dpdk-18.11.2-2.el8.ppc64le
openvswitch2.11-2.11.0-21.el8fdp.ppc64le
dpdk-tools-18.11.2-2.el8.ppc64le
openvswitch-selinux-extra-policy-1.0-18.el8fdp.noarch
openvswitch2.11-devel-2.11.0-21.el8fdp.ppc64le
dpdk-devel-18.11.2-2.el8.ppc64le

# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9, altivec supported

# export DB_SOCK=/var/run/openvswitch/db.sock
# ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema
# ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file
# ovs-vsctl --no-wait init
2019-09-09T03:14:31Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1"
# ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log
2019-09-09T03:16:36Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2019-09-09T03:16:36Z|00002|ovs_numa|INFO|Discovered 72 CPU cores on NUMA node 0
2019-09-09T03:16:36Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 72 CPU cores
2019-09-09T03:16:36Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2019-09-09T03:16:36Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2019-09-09T03:16:36Z|00006|dpdk|INFO|Using DPDK 18.11.2
2019-09-09T03:16:36Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2019-09-09T03:16:36Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2019-09-09T03:16:36Z|00009|dpdk|INFO|IOMMU support for vhost-user-client disabled.
2019-09-09T03:16:36Z|00010|dpdk|INFO|Per port memory for DPDK devices disabled.
2019-09-09T03:16:36Z|00011|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --socket-mem 1024,1024 --socket-limit 1024,1024.
2019-09-09T03:16:36Z|00012|dpdk|INFO|EAL: Detected 144 lcore(s)
2019-09-09T03:16:36Z|00013|dpdk|INFO|EAL: Detected 2 NUMA nodes
2019-09-09T03:16:36Z|00014|dpdk|INFO|EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
2019-09-09T03:16:36Z|00015|dpdk|INFO|EAL: Selected IOVA mode 'PA'
2019-09-09T03:16:36Z|00016|dpdk|WARN|EAL: No free hugepages reported in hugepages-1048576kB
2019-09-09T03:16:36Z|00017|dpdk|INFO|EAL: Probing VFIO support...
2019-09-09T03:16:36Z|00018|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x100aa0000 != 0x7ffbb3e00000) not respected!
2019-09-09T03:16:36Z|00019|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00020|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x101710000 != 0x7ff7b3c00000) not respected!
2019-09-09T03:16:36Z|00021|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00022|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x102380000 != 0x7ff3b3a00000) not respected!
2019-09-09T03:16:36Z|00023|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00024|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x102ff0000 != 0x7fefb3800000) not respected!
2019-09-09T03:16:36Z|00025|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00026|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x103c60000 != 0x7febb3600000) not respected!
2019-09-09T03:16:36Z|00027|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00028|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x1048d0000 != 0x7fe7b3400000) not respected!
2019-09-09T03:16:36Z|00029|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00030|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x105540000 != 0x7fe3b3200000) not respected!
2019-09-09T03:16:36Z|00031|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00032|dpdk|WARN|EAL: WARNING! Base virtual address hint (0x1061b0000 != 0x7fdfb3000000) not respected!
2019-09-09T03:16:36Z|00033|dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes
2019-09-09T03:16:36Z|00034|dpdk|ERR|EAL: Not enough memory available on socket 1! Requested: 1024MB, available: 0MB
EAL: FATAL: Cannot init memory
2019-09-09T03:16:36Z|00035|dpdk|ERR|EAL: Cannot init memory
2019-09-09T03:16:36Z|00036|dpdk|EMER|Unable to initialize DPDK: Cannot allocate memory
ovs-vswitchd: Cannot init EAL (Cannot allocate memory)
2019-09-09T03:16:36Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Aborted), core dumped)
ovs-vswitchd: could not detach from foreground session

I don't know much about the use of openvswitch+dpdk on ppc64le, I am not sure that the configuration is correct.

Thanks for your notification,
Yihuang

Comment 67 IBM Bug Proxy 2019-09-19 17:30:23 UTC

------- Comment From wilder.com 2019-09-19 13:21 EDT-------
(In reply to comment #33)

> I don't know much about the use of openvswitch+dpdk on ppc64le, I am not
> sure that the configuration is correct.
> Thanks for your notification,
> Yihuang

Thank you for testing this,  I will attempt to set this up and post the configuration if/when get it working..

From the error log you provided I suspect an issue with hugepage configuration.

Comment 69 IBM Bug Proxy 2019-09-23 21:30:25 UTC

------- Comment From wilder.com 2019-09-23 17:24 EDT-------
Sorry for the delay..

I am seeing the same issue with openvswitch2.11-2.11.0-18.el8fdp.  When initilizing dpdk.

2019-09-20T21:01:04.052Z|00017|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
2019-09-20T21:01:04.052Z|00018|dpdk|ERR|EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
2019-09-20T21:01:04.116Z|00019|dpdk|ERR|EAL: Cannot init memory
2019-09-20T21:01:04.116Z|00020|dpdk|EMER|Unable to initialize DPDK: Cannot allocate memory

It is occurring on both x86_64 and ppc64.

I am looking into it.

Comment 70 Jens Freimann 2019-09-24 09:40:35 UTC

Are you running one of the components as non-root? Can you please check what user qemu is running as? 
And what is the "user =" setting in /etc/libvirt/qemu.conf?

I found this BZ and upstream fix which seem to be related:
https://bugzilla.redhat.com/show_bug.cgi?id=1478791#c10
https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333423.html

Comment 71 David J. Wilder 2019-09-24 21:14:38 UTC

(In reply to Jens Freimann from comment #70)
> Are you running one of the components as non-root? Can you please check what
> user qemu is running as? 
> And what is the "user =" setting in /etc/libvirt/qemu.conf?
> 
> I found this BZ and upstream fix which seem to be related:
> https://bugzilla.redhat.com/show_bug.cgi?id=1478791#c10
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333423.html

Thanks for the pointer Jens.  This is dealing with permissions on the vhostuser socket, but the issue I am seeing occurs before I get to that point, during the init of dpdk.

If I run ovs as a non-root user (the default configuration of this rpm) then I see the error:
EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied

I am seeing this error on both ppc64le and x86_64  I see no issues if I run ovs as the root user. 

I found this in the dpdk documentation: 
http://doc.dpdk.org/guides/linux_gsg/enable_func.html#running-dpdk-applications-without-root-privileges

"The instructions below will allow running DPDK as non-root with older Linux kernel versions. However, since version 4.0, the kernel does not allow unprivileged processes to read the physical address information from the pagemaps file, making it impossible for those processes to use HW devices which require physical addresses."

I first suspected the MLX5 driver is causing the issue as both my x86 and power systems have CX5 cards,  but I did not find MLX5 was enabled in the ppc build (it is enabled in x86), so I am a little confused.

Continuing to investigate.

Comment 72 Jens Freimann 2019-09-25 09:54:58 UTC

(In reply to David J. Wilder from comment #71)
> (In reply to Jens Freimann from comment #70)
> > Are you running one of the components as non-root? Can you please check what
> > user qemu is running as? 
> > And what is the "user =" setting in /etc/libvirt/qemu.conf?
> > 
> > I found this BZ and upstream fix which seem to be related:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1478791#c10
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-June/333423.html
> 
> Thanks for the pointer Jens.  This is dealing with permissions on the
> vhostuser socket, but the issue I am seeing occurs before I get to that
> point, during the init of dpdk.
> 
> If I run ovs as a non-root user (the default configuration of this rpm) then
> I see the error:
> EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied
> 
> I am seeing this error on both ppc64le and x86_64  I see no issues if I run
> ovs as the root user. 

So the goal is to run dpdk as non-root?

> I found this in the dpdk documentation: 
> http://doc.dpdk.org/guides/linux_gsg/enable_func.html#running-dpdk-
> applications-without-root-privileges
> 
> "The instructions below will allow running DPDK as non-root with older Linux
> kernel versions. However, since version 4.0, the kernel does not allow
> unprivileged processes to read the physical address information from the
> pagemaps file, making it impossible for those processes to use HW devices
> which require physical addresses."
> I first suspected the MLX5 driver is causing the issue as both my x86 and
> power systems have CX5 cards,  but I did not find MLX5 was enabled in the
> ppc build (it is enabled in x86), so I am a little confused.

I talked to my colleague Maxime. I think the problem is that the spapr iommu 
does not support iova as va mode, only as pa.
And that's why we see: 2019-09-09T03:16:36Z|00015|dpdk|INFO|EAL: Selected IOVA mode 'PA'
and also: 
  WARNING! Base virtual address hint (0x100aa0000 != 0x7ffbb3e00000) not respected!
  dpdk|WARN|EAL:    This may cause issues with mapping memory into secondary processes

There's a patch in dpdk that ensures this mode is selected on power:

commit b48e0e2d9cb471941703eb26dc0dbd4fb9840d40
Author: Jonas Pfefferle <jpf.com>
Date:   Fri Nov 3 13:05:19 2017 +0100

    bus/pci: fix IOMMU class for sPAPR
    
    PPC64 sPAPR iommu does not support iova as va.
    Use pa mode instead.
    
    Fixes: 815c7deaed2d ("pci: get IOMMU class on Linux")
    
    Signed-off-by: Jonas Pfefferle <jpf.com>


So with IOVA as PA mode the addresses for all memory areas in DPDK are physical addresses.
This will always need access to the systems page map /proc/self/pagemap which requires root access.

That means for the moment no non-root dpdk on ppc.

Comment 73 Maxime Coquelin 2019-09-25 10:20:02 UTC

Hi,

I think running as non-root is the way to go, but would need specific new developments for PPC64 in DPDK.
More specifically, we would need to implement a new IOVA mode in DPDK which would make use of an IOVA allocator.

However, it is not related to the initial problem reported in this BZ.
I think a new Bz should be created for supporting running OVS-DPDK as non-root on PPC64.

Thanks,
Maxime

Comment 74 David J. Wilder 2019-09-25 19:33:40 UTC

Hi-
Thank you for the response,  I agree that running  non-root dpdk is a separate issue from this bug, and should be addressed in a new bugzilla. I will open a new bug to address running dpdk as no-root user. To be clear its ovs that is running as a non-root user and dpdk is linked into openvswitchd.  Using the openvswitch2.11-2.11.0-18.el8.ppc64le.rpm I have been unable to configure ovs to run as the root user (using systemd to manage ovs).  I can run ovs as root if I start it by hand (sudo /usr/share/openvswitch/scripts/ovs-ctl start).

Regarding the issues in the bug:  "Guest can't send out packets with dpdkvhostuser backend with rx_mrgbuff=on"

Using RHEL8 and a RHEL8 VM I can now run the simple test described in the bug (ping the VM over vhostuser interface) with both on and off settings of Rx Mergeable Buffers:

   <interface type='vhostuser'>
      <mac address='52:54:00:2a:9f:eb'/>
      <source type='unix' path='/tmp/vhost-vm-1' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2' rx_queue_size='1024'>
        <host mrg_rxbuf='on'/>
      </driver>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </interface>

Details of my testing:
openvswitchd is run as root:root
qemu is run as root:root
openvswitch2.11-2.11.0-18.el8.ppc64le
kernel: both host and vm: 4.18.0-80.el8.ppc64le

I suspect the original reported problem with mrg_rxbuf='on' was due to a lack of support for Mergeable RX Buffers in the virtio driver in the RHEL7.5 ppc64le VM.  This appearers to be resolve in the RHEL8 kernel.

Unfortunately due to the issue with running dpdk as a non-root user, it is currently not possible to test dpdk with ovs using the current packaging of ovs.

The original issue in this bug has been addressed, therefore I suggest we close this bug, and pick up the "run dpdk as a non-root user" issue in a new bug (that I will open shortly).

Please advise if you disagree about closing this bug.

Regards
  Dave

Comment 75 Jens Freimann 2019-09-26 07:21:18 UTC

As per Dave's suggestion we close this bug since the original problem is fixed with a newer version.
Dave will open a new BZ for the problem of running dpdk as non-root on ppc64.

Thanks Dave!

regards
Jens

Comment 76 IBM Bug Proxy 2019-10-02 18:00:29 UTC

------- Comment From wilder.com 2019-10-02 13:55 EDT-------
This is a follow up to document how I configured ovs to validate dpdkvhostuser interface on ppc64.
This is not a test of running dpdk in the VM.

Host config:
4.18.0-80.el8.ppc64le

# rpm -qa | grep openvswitch
openvswitch2.11-2.11.0-21.el8.ppc64le

Host hugepage configuration
root=UUID=8f07c025-880c-4fb3-9714-f9c4ae1025b9 ro crashkernel=auto ipv6.disable=1 default_hugepagesz=1G hugepagesz=1G hugepages=128

# fgrep Huge /proc/meminfo
ShmemHugePages:        0 kB
HugePages_Total:     128
HugePages_Free:      127
Hugepagesize:    1048576 kB
Hugetlb:        134217728 kB

VM Interface configuration:
#    <interface type='vhostuser'>
#      <mac address='52:54:00:2a:9f:eb'/>
#      <source type='unix' path='/tmp/vhost-vm-1' mode='server'/>
#      <model type='virtio'/>

Run OVS as root:
Edit  /etc/sysconfig/openvswitch
+  OVS_USER_ID="root:root"

Start ovs and verify ovs-vswitchd is running as the root user.
$ sudo systemctl start openvswitch

configure ovs:
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 vhost-vm1 -- set Interface vhost-vm1 type=dpdkvhostuser
VHOST_USER_SOCKET_PATH=/tmp/vhost-vm-1
ovs-vsctl add-port br0 vhost-vm-1 \
-- set Interface vhost-vm-1 type=dpdkvhostuserclient \
options:vhost-server-path=$VHOST_USER_SOCKET_PATH

$ ovs-vsctl show
16f19714-dd2e-45c6-8860-4eb754677539
Port "vhost-vm-1"
Interface "vhost-vm-1"
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhost-vm-1"}
ovs_version: "2.11.0"

Bring up the br0 interface:
ip add add 192.168.2.1/24 dev br0; ip link set dev br0 up

Boot the VM, configure VM's eth0 interface and ping the bridge.

(run this in the VM)
$ ip addr add 192.168.2.2/24 dev eth0; ip link set dev eth0 up

$ ping 192.168.2.1
# PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
# 64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=2.32 ms
<...>

Note You need to log in before you can comment on or make changes to this bug.

ailan
bugproxy
dmarchan
dzheng
fnovak
hannsj_uhl
hhuang
jfreiman
jkachuck
jwboyer
knoel
ktraynor
linville
magadagi
maxime.coquelin
mjtarsel
pezhang
qzhang
rbalakri
tonyb
virt-maint
wilder
yihyu