Bug 1526948

Summary: Do PVP testing with vIOMMU and Q35 pcie multifunction, guest dpdk's testpmd boot up with errors
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Maxime Coquelin <maxime.coquelin>
Status: CLOSED DUPLICATE QA Contact: Pei Zhang <pezhang>
Severity: high Docs Contact:
Priority: high    
Version: 7.5CC: chayang, jasowang, jinzhao, juzhang, knoel, marcel, maxime.coquelin, michen, virt-maint, yfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-06 09:02:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pei Zhang 2017-12-18 09:17:12 UTC
Description of problem:
This is doing PVP testing with vIOMMU: 
- Host dpdk's testpmd using "iommu-support=1"
- Boot qemu with vIOMMU and add iommu option to vhost-user ports
- In guest, load vfio (not noiommu-vfio)

Boot qemu with Q35 multifunction, guest's dpdk's testpmd will boot with error 
"EAL:   0000:03:00.0 VFIO group is not viable!
EAL: Can't write to PCI bar (0) : offset (12)". 


Version-Release number of selected component (if applicable):
kernel-3.10.0-820.el7.x86_64
qemu-kvm-rhev-2.10.0-12.el7.x86_64
dpdk-17.11-4.el7.x86_64


How reproducible:
100%


Steps to Reproduce:
1. In host, boot testpmd with 2 vhost-user ports and "iommu-support=1". Full commands refer to [1].

--vdev net_vhost0,iface=/tmp/vhost-user1,client=0,iommu-support=1 \
--vdev net_vhost1,iface=/tmp/vhost-user2,client=0,iommu-support=1 \

2. In host, boot qemu with 2 vhost-user ports and set pcie "multifunction=on". Full commands refer to[2].

-device pcie-root-port,id=root.1,chassis=1,multifunction=on,addr=0x2.0 \
-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
-chardev socket,id=charnet1,path=/tmp/vhost-user1 \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,iommu_platform=on,ats=on,bus=root.3 \
-chardev socket,id=charnet2,path=/tmp/vhost-user2 \
-netdev vhost-user,chardev=charnet2,id=hostnet2 \
-device virtio-net-pci,netdev=hostnet2,id=net2,mac=18:66:da:5f:dd:03,iommu_platform=on,ats=on,bus=root.4 \

3. Check pci topology in guest.
# lspci -vvv -t
-[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
           +-01.0  Device 1234:1111
           +-02.0-[01]----00.0  Red Hat, Inc. Virtio block device
           +-02.1-[02]----00.0  Red Hat, Inc. Virtio network device
           +-02.2-[03]----00.0  Red Hat, Inc. Virtio network device
           +-02.3-[04]----00.0  Red Hat, Inc. Virtio network device
           +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
           +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
           \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller


4. In guest, load vfio module. Reserve hugepage and bind 2 vhost-user ports to vfio.
# echo 3 >  /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind --vfio-pci

# dpdk-devbind --bind=vfio-pci 0000:03:00.0
# dpdk-devbind --bind=vfio-pci 0000:04:00.0

5. In guest start dpdk's testpmd, fail with error "Can't write to PCI bar (0) : offset (12)" like below.

EAL: Detected 4 lcore(s)
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:03:00.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1041 net_virtio
EAL:   0000:03:00.0 VFIO group is not viable!
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't write to PCI bar (0) : offset (14)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
EAL: Requested device 0000:03:00.0 cannot be used
EAL: PCI device 0000:04:00.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1041 net_virtio
EAL:   0000:04:00.0 VFIO group is not viable!
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't write to PCI bar (0) : offset (14)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
EAL: Requested device 0000:04:00.0 cannot be used
EAL: No probed ethernet devices
Interactive-mode selected
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176, socket=0
Done

testpmd> show port stats all 
(empty here)


Actual results:
In guest, dpdk's testpmd boot up with errors, vhost-user ports can not accessed by testpmd.


Expected results:
dpdk's testpmd should boot up successfully in guest.


Additional info:
1. Probably this is Q35 multifunction issue or vIOMMU issue. As:

(1)  multifunction with no vIOMMU, works well.

(2)  vIOMMU with no multifuction, works well, like below:

-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-device pcie-root-port,id=root.4,chassis=4 \


Reference:
[1] 
/usr/bin/testpmd \
-l 1,3,5,7,9 \
--socket-mem 1024,1024 \
-n 4 \
--vdev net_vhost0,iface=/tmp/vhost-user1,client=0,iommu-support=1 \
--vdev net_vhost1,iface=/tmp/vhost-user2,client=0,iommu-support=1 \
-- \
--portmask=f \
--disable-hw-vlan \
-i \
--rxq=1 --txq=1 \
--nb-cores=4 \
--forward-mode=io

[2]
/usr/libexec/qemu-kvm -name rhel7.5_nonrt \
-M q35,kernel-irqchip=split \
-cpu host -m 8G \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,chassis=1,multifunction=on,addr=0x2.0 \
-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.5_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1,iommu_platform=on,ats=on \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=root.2 \
-chardev socket,id=charnet1,path=/tmp/vhost-user1 \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,iommu_platform=on,ats=on,bus=root.3 \
-chardev socket,id=charnet2,path=/tmp/vhost-user2 \
-netdev vhost-user,chardev=charnet2,id=hostnet2 \
-device virtio-net-pci,netdev=hostnet2,id=net2,mac=18:66:da:5f:dd:03,iommu_platform=on,ats=on,bus=root.4 \
-vnc :2 \
-monitor stdio \

Comment 2 Pei Zhang 2017-12-18 09:24:21 UTC
There is also a possibility that this bug belongs to dpdk. Please correct me if I set the wrong component. Thanks.

Best Regards,
Pei

Comment 3 Maxime Coquelin 2017-12-18 11:18:24 UTC
Hi Pei,

Thanks for reporting the issue.

Have you tried with using the virtio-net driver in guest instead of DPDK?
Also, have you tried with using vhost-kernel backend on host side?

Doing these two tests would help to narrow down the possibilities.

Thanks,
Maxime

Comment 4 Pei Zhang 2017-12-19 10:53:29 UTC
(In reply to Maxime Coquelin from comment #3)
> Hi Pei,
> 
> Thanks for reporting the issue.
> 
> Have you tried with using the virtio-net driver in guest instead of DPDK?

Hi Maxime,

The virtio-net driver doesn't work well.

Here is my testing:
1. In host, boot testpmd with 2 vhost-user ports and "iommu-support=1". Same as above step 1. Also need to start it.

testpmd> set portlist 0,2,1,3
testpmd> start 


2. In host, boot qemu with 2 vhost-user ports and set pcie "multifunction=on". Same as above step2.

3. In guest, set ip for eth1 and eth2 for ping testing.
# ifconfig eth1 192.168.1.1/24
# ifconfig eth2 192.168.2.1/24


4. In another host B, set ip.
# ifconfig p2p1 192.168.1.2/24
# ifconfig p2p2 192.168.2.2/24

5. Start ping testing from Host B to guest. 
# ping 192.168.1.1 -i 0.001

6. After 10 seconds, start ping testing from guest to Host B
# ping 192.168.2.2 -i 0.001

7. Several issues are found:
(1) After several minutes(less then 5 minutes), both guest and qemu terminal will lost response for a while (around 10s), however they can recover. ping log Looks like below.

...
64 bytes from 192.168.1.1: icmp_seq=23582 ttl=64 time=0.043 ms
64 bytes from 192.168.1.1: icmp_seq=23583 ttl=64 time=0.042 ms
64 bytes from 192.168.1.1: icmp_seq=23584 ttl=64 time=0.043 ms
64 bytes from 192.168.1.1: icmp_seq=23585 ttl=64 time=0.043 ms
64 bytes from 192.168.1.1: icmp_seq=23586 ttl=64 time=0.042 ms
64 bytes from 192.168.1.1: icmp_seq=23587 ttl=64 time=0.044 ms
64 bytes from 192.168.1.1: icmp_seq=23588 ttl=64 time=0.042 ms
64 bytes from 192.168.1.1: icmp_seq=23589 ttl=64 time=112 ms
64 bytes from 192.168.1.1: icmp_seq=23590 ttl=64 time=105 ms
64 bytes from 192.168.1.1: icmp_seq=23591 ttl=64 time=95.4 ms
64 bytes from 192.168.1.1: icmp_seq=23592 ttl=64 time=85.3 ms
64 bytes from 192.168.1.1: icmp_seq=23593 ttl=64 time=75.3 ms
64 bytes from 192.168.1.1: icmp_seq=23594 ttl=64 time=65.2 ms
64 bytes from 192.168.1.1: icmp_seq=23595 ttl=64 time=55.2 ms
64 bytes from 192.168.1.1: icmp_seq=23596 ttl=64 time=45.1 ms
64 bytes from 192.168.1.1: icmp_seq=23597 ttl=64 time=35.1 ms
64 bytes from 192.168.1.1: icmp_seq=23598 ttl=64 time=25.0 ms
64 bytes from 192.168.1.1: icmp_seq=23599 ttl=64 time=15.0 ms
64 bytes from 192.168.1.1: icmp_seq=23600 ttl=64 time=5037 ms
64 bytes from 192.168.1.1: icmp_seq=23601 ttl=64 time=15141 ms
64 bytes from 192.168.1.1: icmp_seq=23602 ttl=64 time=15147 ms
64 bytes from 192.168.1.1: icmp_seq=23603 ttl=64 time=15137 ms
64 bytes from 192.168.1.1: icmp_seq=23604 ttl=64 time=15127 ms
64 bytes from 192.168.1.1: icmp_seq=23605 ttl=64 time=15117 ms
64 bytes from 192.168.1.1: icmp_seq=23606 ttl=64 time=15107 ms
64 bytes from 192.168.1.1: icmp_seq=23607 ttl=64 time=15097 ms
64 bytes from 192.168.1.1: icmp_seq=23608 ttl=64 time=15087 ms
64 bytes from 192.168.1.1: icmp_seq=23609 ttl=64 time=15077 ms
64 bytes from 192.168.1.1: icmp_seq=23610 ttl=64 time=15067 ms
64 bytes from 192.168.1.1: icmp_seq=23611 ttl=64 time=15057 ms
64 bytes from 192.168.1.1: icmp_seq=23612 ttl=64 time=15049 ms
64 bytes from 192.168.1.1: icmp_seq=23613 ttl=64 time=15039 ms
64 bytes from 192.168.1.1: icmp_seq=23614 ttl=64 time=15029 ms
64 bytes from 192.168.1.1: icmp_seq=23615 ttl=64 time=15019 ms
64 bytes from 192.168.1.1: icmp_seq=23616 ttl=64 time=15009 ms
64 bytes from 192.168.1.1: icmp_seq=23617 ttl=64 time=14999 ms
64 bytes from 192.168.1.1: icmp_seq=23618 ttl=64 time=14989 ms
64 bytes from 192.168.1.1: icmp_seq=23619 ttl=64 time=14978 ms
64 bytes from 192.168.1.1: icmp_seq=23620 ttl=64 time=14968 ms
64 bytes from 192.168.1.1: icmp_seq=23621 ttl=64 time=14958 ms
64 bytes from 192.168.1.1: icmp_seq=23622 ttl=64 time=14952 ms
64 bytes from 192.168.1.1: icmp_seq=23623 ttl=64 time=14953 ms
64 bytes from 192.168.1.1: icmp_seq=23624 ttl=64 time=15536 ms
64 bytes from 192.168.1.1: icmp_seq=23625 ttl=64 time=15526 ms
From 192.168.1.2 icmp_seq=24944 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24945 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24946 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24947 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24948 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24949 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24950 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24951 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24952 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24953 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24954 Destination Host Unreachable
From 192.168.1.2 icmp_seq=24955 Destination Host Unreachable
64 bytes from 192.168.1.1: icmp_seq=23626 ttl=64 time=16755 ms
64 bytes from 192.168.1.1: icmp_seq=23627 ttl=64 time=16759 ms
64 bytes from 192.168.1.1: icmp_seq=23628 ttl=64 time=16749 ms
64 bytes from 192.168.1.1: icmp_seq=23629 ttl=64 time=17739 ms


(2) After lost response and then recover, reboot guest by "(qemu) system_reset", sometimes qemu will quit with "Segmentation", about 2/5 reproduced.

(3) Sometimes the dpdk's testpmd in host will quit with "Segmentation" after reboot guest.


> Also, have you tried with using vhost-kernel backend on host side?

vhost-kernel backend hit same issue, qemu cmd looks like below.  Guest dpdk's testpmd still hit issue.
"EAL:   0000:04:00.0 VFIO group is not viable!
EAL: Can't write to PCI bar (0) : offset (12)".

...
-device pcie-root-port,id=root.1,chassis=1,multifunction=on,addr=0x2.0 \
-device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
-device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
-device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=root.2,iommu_platform=on,ats=on \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup1,downscript=/etc/qemu-ifdown1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=root.3,iommu_platform=on,ats=on \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-device virtio-net-pci,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=root.4,iommu_platform=on,ats=on \


Thanks,
Pei

> Doing these two tests would help to narrow down the possibilities.
> 
> Thanks,
> Maxime

Comment 5 Maxime Coquelin 2018-02-01 12:25:12 UTC
This is still to be confirmed, but it looks like a duplicate of Bz1540964.

Comment 6 Maxime Coquelin 2018-02-06 09:02:43 UTC

*** This bug has been marked as a duplicate of bug 1540964 ***