Bug 1402222
Summary: | Device IOTLB support in qemu | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | jason wang <jasowang> | ||||||||
Component: | qemu-kvm-rhev | Assignee: | Wei <wexu> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Pei Zhang <pezhang> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 7.4 | CC: | ailan, chayang, hannsj_uhl, jasowang, juzhang, mrezanin, mst, pezhang, virt-maint, wexu, xiywang | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-08-01 23:39:45 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1283104, 1395265 | ||||||||||
Attachments: |
|
Description
jason wang
2016-12-07 05:33:20 UTC
Three parts: - Device IOTLB support in intel IOMMU - Address Translation Service for PCI and virtio-pci - Device IOTLB API support for vhost (In reply to jason wang from comment #1) > Three parts: > > - Device IOTLB support in intel IOMMU > - Address Translation Service for PCI and virtio-pci > - Device IOTLB API support for vhost For the vhost part we have bug#1283257 Hi Wei, QE is verifying this bug. Could you please give some check points? And is below command line correct? Any options missed? Thanks. qemu command line: # /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \ -device intel-iommu,device-iotlb=on,intremap=true,caching-mode=true \ -cpu host -m 8G -numa node \ -smp 4,sockets=1,cores=4,threads=1 \ -device pcie-root-port,id=root.1,slot=1 \ -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:01,iommu_platform=on,ats=on \ -device pcie-root-port,id=root.2,slot=2 \ -drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_rt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.2 \ -vnc :2 \ -monitor stdio \ Best Regards, Pei Hi Pei, Your cli looks good overall, my test was based on the upstream qemu synced to the latest code about 3 months ago, seems the only difference is the option for root device, here is my qemu cli. /home/src/qemu/x86_64-softmmu/qemu-system-x86_64 /sdc/home/VMs/rhel7.3.qcow2 \ -netdev tap,id=hn1,script=/etc/qemu-ifup-wei,vhost=on\ -device virtio-net-pci,netdev=hn1,mac=52:54:00:11:35:10 \ -netdev tap,id=hn2,script=/etc/qemu-ifup-private1,vhost=on \ -device ioh3420,id=root.1,chassis=1 \ -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=52:54:00:11:e3:11,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \ -netdev tap,id=hn3,vhost=on,script=/etc/qemu-ifup-private2 \ -device ioh3420,id=root.2,chassis=2 \ -device virtio-net-pci,netdev=hn3,id=v1,mq=off,mac=52:54:00:11:e3:12,bus=root.2,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \ -smp 3 -m 6G -enable-kvm -cpu host -vnc 0.0.0.0:3 \ -M q35,kernel-irqchip=split \ -device intel-iommu,device-iotlb=on,intremap Other check points: 1. This feature needs host and guest kernel support, please update both of them to the latest 7.4 build. 2. Enable iommu in guest by indicate the grub parameter, and the host kernel doesn't need it. 3. Run dpdk/l2fwd/testpmd inside the guest with vfio passing through virtio-net devices. Created attachment 1279558 [details]
testing topology of testing vhost_net with dpdk
Verification:
3.10.0-666.el7.x86_64
qemu-kvm-rhev-2.9.0-5.el7.x86_64
dpdk-16.11-4.el7fdp.x86_64(in guest)
Environment:
Please refer to the attachment.
Steps:
1. Stop NetworkManager.
2. Boot guest with '-device intel-iommu,device-iotlb=on' and 2 vhost_net network devices(the third one is used for get access to the guest by ssh).
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap \
-cpu host -m 8G -numa node \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-device pcie-root-port,id=root.4,slot=4 \
-netdev tap,id=hostnet0,vhost=on \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:11,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13,iommu_platform=on,ats=on \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.4 \
-vnc :2 \
-monitor stdio \
3. Bind these two vhost_net devices to vfio in guest
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind --bind=vfio-pci 02:00.0
# dpdk-devbind --bind=vfio-pci 03:00.0
4. Reserve hugepage in guest
# echo 3 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
5. Start MoonGen in another host. I'll upload the lua file later.
Default parameter:
Packet Size: 64 Byte
Running time: 60s
Stream Rate: 0.35Mpps
(Note: I choose 0.35Mpps, because in my several testing, seems this is the max throughput in this case)
# ./build/MoonGen rfc1242.lua
6. Start testpmd with '--forward-mode=macswap' in guest. Results looks good.
/usr/bin/testpmd \
-l 1,2,3 \
-n 4 \
-d /usr/lib64/librte_pmd_virtio.so.1 \
-w 0000:02:00.0 -w 0000:03:00.0 \
-- \
--nb-cores=2 \
--disable-hw-vlan \
-i \
--disable-rss \
--rxq=1 --txq=1 \
--forward-mode=macswap
testpmd> quit
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 10499485 RX-dropped: 0 RX-total: 10499485
TX-packets: 10365296 TX-dropped: 0 TX-total: 10365296
----------------------------------------------------------------------------
---------------------- Forward statistics for port 1 ----------------------
RX-packets: 10365296 RX-dropped: 0 RX-total: 10365296
TX-packets: 10499485 TX-dropped: 0 TX-total: 10499485
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 20864781 RX-dropped: 0 RX-total: 20864781
TX-packets: 20864781 TX-dropped: 0 TX-total: 20864781
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Done.
Created attachment 1279561 [details] lua file used in step 5 in Comment 8 (In reply to Wei from comment #7) > Hi Pei, > Your cli looks good overall, my test was based on the upstream qemu synced > to the latest code about 3 months ago, seems the only difference is the > option for root device, here is my qemu cli. Hi Wei, I confirmed with Q35 QE, the usage of pcie has been updated as '-device pcie-root-port,id=root.1,slot=1' in rhel7.4. > Other check points: > 1. This feature needs host and guest kernel support, please update both of > them to the latest 7.4 build. > 2. Enable iommu in guest by indicate the grub parameter, and the host kernel > doesn't need it. > 3. Run dpdk/l2fwd/testpmd inside the guest with vfio passing through > virtio-net devices. Could you please check if Comment 8 can verify this bug? Thanks. Pei, Can you please compare the throughput with and without IOTLB/IOMMU? similar to what you did here: https://bugzilla.redhat.com/show_bug.cgi?id=1335808#c11 (In reply to Pei Zhang from comment #10) > (In reply to Wei from comment #7) > > Hi Pei, > > Your cli looks good overall, my test was based on the upstream qemu synced > > to the latest code about 3 months ago, seems the only difference is the > > option for root device, here is my qemu cli. > > Hi Wei, I confirmed with Q35 QE, the usage of pcie has been updated as > '-device pcie-root-port,id=root.1,slot=1' in rhel7.4. > It is good to use this option as well. > > > > Other check points: > > 1. This feature needs host and guest kernel support, please update both of > > them to the latest 7.4 build. > > 2. Enable iommu in guest by indicate the grub parameter, and the host kernel > > doesn't need it. > > 3. Run dpdk/l2fwd/testpmd inside the guest with vfio passing through > > virtio-net devices. > > Could you please check if Comment 8 can verify this bug? Thanks. Yes, it works, and it is also good to do a benchmark comparison as Amnon suggested in comment 11. Created attachment 1279953 [details] testing topology of testing throught of vhost_net (In reply to Amnon Ilan from comment #11) > Pei, > Can you please compare the throughput with and without IOTLB/IOMMU? > similar to what you did here: > https://bugzilla.redhat.com/show_bug.cgi?id=1335808#c11 Summary: The throughput value are same with and without '-device intel-iommu,device-iotlb=on' - Default Parameters when testing throughput: Traffic Generator: MoonGen Acceptable Loss: 0.002% Frame Size: 64Byte Unidirectional: No Search run time:10s Validation run time: 30s Virtio features: default CPU: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz NIC: 10-Gigabit X540-AT2 ==Results== - With '-device intel-iommu,device-iotlb=on' No Throughput(Mpps) packets_loss_rate 1 0.377479 0.000000% 2 0.377479 0.000009% - Without '-device intel-iommu,device-iotlb=on' No Throughput(Mpps) packets_loss_rate 1 0.377479 0.000000% 2 0.377479 0.000000% ==Some Highlight== (1)Testing the bidirectional throughput, please refer to attachment of this Comment to get the topology chart. (2)With hugepage, the performance will be better, so test with 1G hugepage. - throughput value of without hugepages is less than 0.12 Mpps(Last 0.12Mpps validation FAILED) (3)Where the memory and cores used by guest locates in host seems doesn't affect the testing results. So I didn't set any cup pin. - throughput value with memory and cores in NUMA1 is even a bit lower, it's about 0.230184Mpps. Key steps: ==Qemu command line with '-device intel-iommu,device-iotlb=on': # /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \ -device intel-iommu,device-iotlb=on,intremap \ -cpu host -m 8G \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -smp 4,sockets=1,cores=4,threads=1 \ -device pcie-root-port,id=root.1,slot=1 \ -device pcie-root-port,id=root.2,slot=2 \ -device pcie-root-port,id=root.3,slot=3 \ -device pcie-root-port,id=root.4,slot=4 \ -netdev tap,id=hostnet0,vhost=on \ -netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup0,downscript=/etc/qemu-ifdown0 \ -netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup1,downscript=/etc/qemu-ifdown1 \ -device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:11,iommu_platform=on,ats=on \ -device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12,iommu_platform=on,ats=on \ -device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13,iommu_platform=on,ats=on \ -drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.4 \ -vnc :2 \ -monitor stdio \ ==Qemu command line without '-device intel-iommu,device-iotlb=on': # /usr/libexec/qemu-kvm -name rhel7.4 -M q35 \ -cpu host -m 8G \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -smp 4,sockets=1,cores=4,threads=1 \ -device pcie-root-port,id=root.1,slot=1 \ -device pcie-root-port,id=root.2,slot=2 \ -device pcie-root-port,id=root.3,slot=3 \ -device pcie-root-port,id=root.4,slot=4 \ -netdev tap,id=hostnet0,vhost=on \ -netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup0,downscript=/etc/qemu-ifdown0 \ -netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup1,downscript=/etc/qemu-ifdown1 \ -device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:11 \ -device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12 \ -device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13 \ -drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.4 \ -vnc :2 \ -monitor stdio \ Thanks, Pei Some additional info: In Comment 8, I was testing with 1 NIC. And in Comment 13, I was testing with 2 NICs, and testing the throughput value using this lua file. https://github.com/atheurer/MoonGen/blob/opnfv-dev/examples/opnfv-vsperf.lua @Jason, @Wei, is the throughput low? and how come we do not see any difference with and without vIOMMU? (In reply to Amnon Ilan from comment #15) > @Jason, @Wei, is the throughput low? and how come we do not see > any difference with and without vIOMMU? I suspect there's some misconfiguration in the setup. Pei Zhang, several questions, let's try not using IOMMU first: - Can you measure and report pps in host interface? - Are you using a 40G or 10G nic to do the testing? - Which NIC are you used in host? - Is the number be better if you use macvtap instead of tap? - What's the number e.g just using pktgen to inject traffic in tap0? Thanks (In reply to jason wang from comment #16) > (In reply to Amnon Ilan from comment #15) > > @Jason, @Wei, is the throughput low? and how come we do not see > > any difference with and without vIOMMU? > > I suspect there's some misconfiguration in the setup. > > Pei Zhang, several questions, let's try not using IOMMU first: Hi Jason, thanks for your questions. > - Can you measure and report pps in host interface? What do you mean by "measure and report pps in host interface", could you please explain a little more? > - Are you using a 40G or 10G nic to do the testing? 10G nic. > - Which NIC are you used in host? Two 10-Gigabit X540-AT2 cards. > - Is the number be better if you use macvtap instead of tap? No, I didn't test macvtap. I can test it if needed. > - What's the number e.g just using pktgen to inject traffic in tap0? This number is throughput value tested by MoonGen. MoonGen generate packets from one port, after packets forwarding by dpdk'testpmd in guest(nic0 -> switch -> tap0 -> virtio_net -> testpmd), finally receive the packets from another port. Please see attachment of Comment 13. So I didn't use pktgen. Thanks, Pei > Thanks (In reply to Pei Zhang from comment #17) > (In reply to jason wang from comment #16) > > (In reply to Amnon Ilan from comment #15) > > > @Jason, @Wei, is the throughput low? and how come we do not see > > > any difference with and without vIOMMU? > > > > I suspect there's some misconfiguration in the setup. > > > > Pei Zhang, several questions, let's try not using IOMMU first: > > Hi Jason, thanks for your questions. > > > - Can you measure and report pps in host interface? > > What do you mean by "measure and report pps in host interface", could you > please explain a little more? I mean e.g if your host interface is enp0s3/enp0s4, please measure its pps when you are doing the test. > > > - Are you using a 40G or 10G nic to do the testing? > > 10G nic. > > > - Which NIC are you used in host? > > Two 10-Gigabit X540-AT2 cards. Can you use ethtool -i $interface to see its driver? > > > - Is the number be better if you use macvtap instead of tap? > > No, I didn't test macvtap. I can test it if needed. Yes please. > > > - What's the number e.g just using pktgen to inject traffic in tap0? > > This number is throughput value tested by MoonGen. MoonGen generate packets > from one port, after packets forwarding by dpdk'testpmd in guest(nic0 -> > switch -> tap0 -> virtio_net -> testpmd), finally receive the packets from > another port. Please see attachment of Comment 13. I see, what I want is e.g run pktgen on tap0 directly: pktgen -> tap0 -> virtio_net ->testpmd -> tap1 Thanks > > So I didn't use pktgen. > > > > Thanks, > Pei > > > Thanks (In reply to jason wang from comment #18) > (In reply to Pei Zhang from comment #17) > > (In reply to jason wang from comment #16) > > > (In reply to Amnon Ilan from comment #15) > > > > @Jason, @Wei, is the throughput low? and how come we do not see > > > > any difference with and without vIOMMU? > > > > > > I suspect there's some misconfiguration in the setup. > > > > > > Pei Zhang, several questions, let's try not using IOMMU first: > > > > Hi Jason, thanks for your questions. > > > > > - Can you measure and report pps in host interface? > > > > What do you mean by "measure and report pps in host interface", could you > > please explain a little more? > > I mean e.g if your host interface is enp0s3/enp0s4, please measure its pps > when you are doing the test. Ok. I'll test this with pktgen. > > > > > - Are you using a 40G or 10G nic to do the testing? > > > > 10G nic. > > > > > - Which NIC are you used in host? > > > > Two 10-Gigabit X540-AT2 cards. > > Can you use ethtool -i $interface to see its driver? # ethtool -i p1p1 driver: ixgbe version: 4.4.0-k-rh7.4 firmware-version: 0x8000059e expansion-rom-version: bus-info: 0000:81:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no # ethtool -i p1p2 driver: ixgbe version: 4.4.0-k-rh7.4 firmware-version: 0x8000059e expansion-rom-version: bus-info: 0000:81:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no > > > > > - Is the number be better if you use macvtap instead of tap? > > > > No, I didn't test macvtap. I can test it if needed. > > Yes please. OK. > > > > > - What's the number e.g just using pktgen to inject traffic in tap0? > > > > This number is throughput value tested by MoonGen. MoonGen generate packets > > from one port, after packets forwarding by dpdk'testpmd in guest(nic0 -> > > switch -> tap0 -> virtio_net -> testpmd), finally receive the packets from > > another port. Please see attachment of Comment 13. > > I see, what I want is e.g run pktgen on tap0 directly: > > pktgen -> tap0 -> virtio_net ->testpmd -> tap1 OK. Thanks, Pei > Thanks > > > > > So I didn't use pktgen. > > > > > > > > Thanks, > > Pei > > > > > Thanks Summary: The performance seems very close between with and without vIOMMU. Flow chart: pktgen -> tap0 -> virtio_net ->testpmd -> tap1 ==Results with vIOMMU== [Run 1] TX tap0: 1702552 pkts/s RX tap0: 0 pkts/s TX tap0: 1667841 pkts/s RX tap0: 0 pkts/s TX tap0: 1612958 pkts/s RX tap0: 0 pkts/s TX tap0: 1620435 pkts/s RX tap0: 0 pkts/s TX tap0: 1663647 pkts/s RX tap0: 0 pkts/s TX tap1: 0 pkts/s RX tap1: 678904 pkts/s TX tap1: 0 pkts/s RX tap1: 678458 pkts/s TX tap1: 0 pkts/s RX tap1: 678194 pkts/s TX tap1: 0 pkts/s RX tap1: 677958 pkts/s TX tap1: 0 pkts/s RX tap1: 677852 pkts/s [Run 2] TX tap0: 1531125 pkts/s RX tap0: 0 pkts/s TX tap0: 1411951 pkts/s RX tap0: 0 pkts/s TX tap0: 1511287 pkts/s RX tap0: 0 pkts/s TX tap0: 1609893 pkts/s RX tap0: 0 pkts/s TX tap0: 1467514 pkts/s RX tap0: 0 pkts/s TX tap1: 0 pkts/s RX tap1: 685235 pkts/s TX tap1: 0 pkts/s RX tap1: 685570 pkts/s TX tap1: 0 pkts/s RX tap1: 685947 pkts/s TX tap1: 0 pkts/s RX tap1: 685732 pkts/s TX tap1: 0 pkts/s RX tap1: 685521 pkts/s ==Results without vIOMMU== [Run 1] TX tap0: 1318424 pkts/s RX tap0: 0 pkts/s TX tap0: 1317694 pkts/s RX tap0: 0 pkts/s TX tap0: 1317627 pkts/s RX tap0: 0 pkts/s TX tap0: 1316501 pkts/s RX tap0: 0 pkts/s TX tap0: 1316072 pkts/s RX tap0: 0 pkts/s TX tap0: 1316133 pkts/s RX tap0: 0 pkts/s TX tap1: 0 pkts/s RX tap1: 684954 pkts/s TX tap1: 0 pkts/s RX tap1: 684990 pkts/s TX tap1: 0 pkts/s RX tap1: 685265 pkts/s TX tap1: 0 pkts/s RX tap1: 662117 pkts/s TX tap1: 0 pkts/s RX tap1: 648610 pkts/s TX tap1: 0 pkts/s RX tap1: 648114 pkts/s [Run 2] TX tap0: 1433924 pkts/s RX tap0: 0 pkts/s TX tap0: 1434691 pkts/s RX tap0: 0 pkts/s TX tap0: 1433702 pkts/s RX tap0: 0 pkts/s TX tap0: 1435838 pkts/s RX tap0: 0 pkts/s TX tap0: 1431688 pkts/s RX tap0: 0 pkts/s TX tap1: 0 pkts/s RX tap1: 676788 pkts/s TX tap1: 0 pkts/s RX tap1: 676351 pkts/s TX tap1: 0 pkts/s RX tap1: 676295 pkts/s TX tap1: 0 pkts/s RX tap1: 676208 pkts/s TX tap1: 0 pkts/s RX tap1: 676281 pkts/s Steps: 1. Boot VM. With vIOMMU, refer to[1]. Without vIOMMU, refer to[2] 2. Pin vhost threads to the cores(core 9,11) which are in same NUMA node with network device. # ps -ef | grep vhost ... root 50322 2 0 02:45 ? 00:00:00 [vhost-50310] root 50330 2 0 02:45 ? 00:00:00 [vhost-50310] # taskset -cp 9 50322 # taskset -cp 11 50330 3. Pin vCPUs to cores(1,3,5,7) which are in same NUMA node with network device. (qemu) info cpus * CPU #0: pc=0xffffffff816a7596 (halted) thread_id=50429 CPU #1: pc=0xffffffff816a7596 (halted) thread_id=50430 CPU #2: pc=0xffffffff816a7596 (halted) thread_id=50431 CPU #3: pc=0xffffffff816a7596 (halted) thread_id=50432 # taskset -cp 1 50429 # taskset -cp 3 50430 # taskset -cp 5 50431 # taskset -cp 7 50432 4. In VM, load vfio With vIOMMU: # modprobe vfio # modprobe vfio-pci Without vIOMMU: # modprobe vfio enable_unsafe_noiommu_mode=Y # modprobe vfio-pci 5. In VM, bind NICs to vfio and reserve hugepage # dpdk-devbind --bind=vfio-pci 0000:02:00.0 # dpdk-devbind --bind=vfio-pci 0000:03:00.0 # echo 4 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages 5. Start testpmd, refer to [3] 6. Start pktgen in tap0, refer to[4] # sh pktgen.sh tap0 7. Monitor pps in tap0 and tap1, refer to[5] # sh pps.sh tap0 # sh pps.sh tap1 [1] Boot VM with vIOMMU # /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \ -device intel-iommu,device-iotlb=on,intremap \ -cpu host -m 8G \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -smp 4,sockets=1,cores=4,threads=1 \ -device pcie-root-port,id=root.1,slot=1 \ -device pcie-root-port,id=root.2,slot=2 \ -device pcie-root-port,id=root.3,slot=3 \ -netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \ -netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup3,downscript=/etc/qemu-ifdown3 \ -device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12,iommu_platform=on,ats=on \ -device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13,iommu_platform=on,ats=on \ -drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1 \ -vnc :2 \ -monitor stdio \ [2] Boot VM without vIOMMU # /usr/libexec/qemu-kvm -name rhel7.4 -M q35 \ -cpu host -m 8G \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -smp 4,sockets=1,cores=4,threads=1 \ -device pcie-root-port,id=root.1,slot=1 \ -device pcie-root-port,id=root.2,slot=2 \ -device pcie-root-port,id=root.3,slot=3 \ -netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \ -netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup3,downscript=/etc/qemu-ifdown3 \ -device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12 \ -device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13 \ -drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1 \ -vnc :2 \ -monitor stdio \ [3] Boot testpmd with macswap # /usr/bin/testpmd \ -l 1,2,3 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so.1 \ -w 0000:02:00.0 -w 0000:03:00.0 \ -- \ --nb-cores=2 \ --disable-hw-vlan \ -i \ --disable-rss \ --rxq=1 --txq=1 \ --forward-mode=macswap [4] script of pktgen.sh # cat pktgen.sh #!/bin/sh # usage sh pktgen.sh $device $queues modprobe -r pktgen modprobe pktgen echo reset > /proc/net/pktgen/pgctrl ifconfig $1 up function pgset() { local result echo $1 > $PGDEV result=`cat $PGDEV | fgrep "Result: OK:"` if [ "$result" = "" ]; then cat $PGDEV | fgrep Result: fi } function pg() { echo inject > $PGDEV cat $PGDEV } for i in 0 `seq $(($2-1))` do echo "Adding queue 0 of $1" dev=$1@$i PGDEV=/proc/net/pktgen/kpktgend_$i pgset "rem_device_all" pgset "add_device $dev" pgset "max_before_softirq 100000" # Configure the individual devices echo "Configuring devices $dev" PGDEV=/proc/net/pktgen/$dev pgset "queue_map_min $i" pgset "queue_map_max $i" pgset "count 10000000" pgset "min_pkt_size 60" pgset "max_pkt_size 60" pgset "dst $DST_system_ip" pgset "dst_mac 88:66:da:5f:dd:12" pgset "udp_src_min 0" pgset "udp_src_max 65535" pgset "udp_dst_min 0" pgset "udp_dst_max 65535" done # Time to run PGDEV=/proc/net/pktgen/pgctrl echo "Running... ctrl^C to stop" pgset "start" echo "Done" [5]script of pps.sh # cat pps.sh #!/bin/bash INTERVAL="1" # update interval in seconds if [ -z "$1" ]; then echo echo usage: $0 [network-interface] echo echo e.g. $0 eth0 echo echo shows packets-per-second exit fi IF=$1 while true do R1=`cat /sys/class/net/$1/statistics/rx_packets` T1=`cat /sys/class/net/$1/statistics/tx_packets` sleep $INTERVAL R2=`cat /sys/class/net/$1/statistics/rx_packets` T2=`cat /sys/class/net/$1/statistics/tx_packets` TXPPS=`expr $T2 - $T1` RXPPS=`expr $R2 - $R1` echo "TX $1: $TXPPS pkts/s RX $1: $RXPPS pkts/s" done Versions of above Comment 21: 3.10.0-675.el7.x86_64 qemu-kvm-rhev-2.9.0-7.el7.x86_64 dpdk-16.11-4.el7fdp.x86_64(in guest) (In reply to Pei Zhang from comment #20) > Summary: The performance seems very close between with and without vIOMMU. > > Flow chart: pktgen -> tap0 -> virtio_net ->testpmd -> tap1 > > ==Results with vIOMMU== > [Run 1] > TX tap0: 1702552 pkts/s RX tap0: 0 pkts/s > TX tap0: 1667841 pkts/s RX tap0: 0 pkts/s > TX tap0: 1612958 pkts/s RX tap0: 0 pkts/s > TX tap0: 1620435 pkts/s RX tap0: 0 pkts/s > TX tap0: 1663647 pkts/s RX tap0: 0 pkts/s > > TX tap1: 0 pkts/s RX tap1: 678904 pkts/s > TX tap1: 0 pkts/s RX tap1: 678458 pkts/s > TX tap1: 0 pkts/s RX tap1: 678194 pkts/s > TX tap1: 0 pkts/s RX tap1: 677958 pkts/s > TX tap1: 0 pkts/s RX tap1: 677852 pkts/s Thanks for the testing. Just to confirm, are you saying vIOMMU is faster? Thanks (In reply to jason wang from comment #22) > (In reply to Pei Zhang from comment #20) > > Summary: The performance seems very close between with and without vIOMMU. > > > > Flow chart: pktgen -> tap0 -> virtio_net ->testpmd -> tap1 > > > > ==Results with vIOMMU== > > [Run 1] > > TX tap0: 1702552 pkts/s RX tap0: 0 pkts/s > > TX tap0: 1667841 pkts/s RX tap0: 0 pkts/s > > TX tap0: 1612958 pkts/s RX tap0: 0 pkts/s > > TX tap0: 1620435 pkts/s RX tap0: 0 pkts/s > > TX tap0: 1663647 pkts/s RX tap0: 0 pkts/s > > > > TX tap1: 0 pkts/s RX tap1: 678904 pkts/s > > TX tap1: 0 pkts/s RX tap1: 678458 pkts/s > > TX tap1: 0 pkts/s RX tap1: 678194 pkts/s > > TX tap1: 0 pkts/s RX tap1: 677958 pkts/s > > TX tap1: 0 pkts/s RX tap1: 677852 pkts/s > > Thanks for the testing. > > Just to confirm, are you saying vIOMMU is faster? No, I can not say vIOMMU is faster. It's just very close. As the results is not that stable, sometimes vIOMMU is faster, but not always. I did another 5 runs to confirm this question. (With each run, I reboot the VM then do testing) http://pastebin.test.redhat.com/490772 Thanks, Pei > Thanks Hi Amnon, Jason, Based on Comment 13, Comment 20, Comment 23, the throughput performance is very close between with and without vIOMMU. So can QE verify this bug? Thanks, Pei (In reply to Pei Zhang from comment #24) > > Based on Comment 13, Comment 20, Comment 23, the throughput performance is > very close between with and without vIOMMU. > > So can QE verify this bug? I think it can be verified now (keeping the needinfo for Jason to comment on that) (In reply to Amnon Ilan from comment #25) > (In reply to Pei Zhang from comment #24) > > > > Based on Comment 13, Comment 20, Comment 23, the throughput performance is > > very close between with and without vIOMMU. > > > > So can QE verify this bug? > > I think it can be verified now (keeping the needinfo for Jason to comment > on that) Yes I think so. Thanks Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |