Bug 1402222

Summary: Device IOTLB support in qemu
Product: Red Hat Enterprise Linux 7 Reporter: jason wang <jasowang>
Component: qemu-kvm-rhevAssignee: Wei <wexu>
Status: CLOSED ERRATA QA Contact: Pei Zhang <pezhang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: ailan, chayang, hannsj_uhl, jasowang, juzhang, mrezanin, mst, pezhang, virt-maint, wexu, xiywang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 23:39:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1283104, 1395265    
Attachments:
Description Flags
testing topology of testing vhost_net with dpdk
none
lua file used in step 5 in Comment 8
none
testing topology of testing throught of vhost_net none

Description jason wang 2016-12-07 05:33:20 UTC
Description of problem:

Qemu need support device IOTLB API to let vhost can co-operate with userspace emulated IOMMU.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 jason wang 2016-12-07 05:35:46 UTC
Three parts:

- Device IOTLB support in intel IOMMU
- Address Translation Service for PCI and virtio-pci
- Device IOTLB API support for vhost

Comment 3 Amnon Ilan 2017-01-25 09:43:36 UTC
(In reply to jason wang from comment #1)
> Three parts:
> 
> - Device IOTLB support in intel IOMMU
> - Address Translation Service for PCI and virtio-pci
> - Device IOTLB API support for vhost

For the vhost part we have bug#1283257

Comment 4 Wei 2017-02-20 16:02:39 UTC
See Also:
https://bugzilla.redhat.com/show_bug.cgi?id=1425127

Comment 6 Pei Zhang 2017-05-12 07:36:37 UTC
Hi Wei,

QE is verifying this bug. Could you please give some check points? And is below command line correct? Any options missed? Thanks.

qemu command line:
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap=true,caching-mode=true \
-cpu host -m 8G -numa node \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:01,iommu_platform=on,ats=on \
-device pcie-root-port,id=root.2,slot=2 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_rt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.2 \
-vnc :2 \
-monitor stdio \



Best Regards,
Pei

Comment 7 Wei 2017-05-14 16:14:04 UTC
Hi Pei,
Your cli looks good overall, my test was based on the upstream qemu synced to the latest code about 3 months ago, seems the only difference is the option for root device, here is my qemu cli. 

/home/src/qemu/x86_64-softmmu/qemu-system-x86_64 /sdc/home/VMs/rhel7.3.qcow2 \
-netdev tap,id=hn1,script=/etc/qemu-ifup-wei,vhost=on\
-device virtio-net-pci,netdev=hn1,mac=52:54:00:11:35:10 \
-netdev tap,id=hn2,script=/etc/qemu-ifup-private1,vhost=on \
-device ioh3420,id=root.1,chassis=1 \
-device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=52:54:00:11:e3:11,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
-netdev tap,id=hn3,vhost=on,script=/etc/qemu-ifup-private2 \
-device ioh3420,id=root.2,chassis=2 \
-device virtio-net-pci,netdev=hn3,id=v1,mq=off,mac=52:54:00:11:e3:12,bus=root.2,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
-smp 3 -m 6G -enable-kvm -cpu host -vnc 0.0.0.0:3 \
-M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap

Other check points:
1. This feature needs host and guest kernel support, please update both of them to the latest 7.4 build.
2. Enable iommu in guest by indicate the grub parameter, and the host kernel doesn't need it.
3. Run dpdk/l2fwd/testpmd inside the guest with vfio passing through virtio-net devices.

Comment 8 Pei Zhang 2017-05-17 07:51:36 UTC
Created attachment 1279558 [details]
testing topology of testing vhost_net with dpdk

Verification:
3.10.0-666.el7.x86_64
qemu-kvm-rhev-2.9.0-5.el7.x86_64
dpdk-16.11-4.el7fdp.x86_64(in guest)

Environment:
Please refer to the attachment.

Steps:
1. Stop NetworkManager.

2. Boot guest with '-device intel-iommu,device-iotlb=on' and 2 vhost_net network devices(the third one is used for get access to the guest by ssh).

# /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap \
-cpu host -m 8G -numa node \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-device pcie-root-port,id=root.4,slot=4 \
-netdev tap,id=hostnet0,vhost=on \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:11,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13,iommu_platform=on,ats=on \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.4 \
-vnc :2 \
-monitor stdio \

3. Bind these two vhost_net devices to vfio in guest

# modprobe vfio
# modprobe vfio-pci

# dpdk-devbind --bind=vfio-pci 02:00.0
# dpdk-devbind --bind=vfio-pci 03:00.0

4. Reserve hugepage in guest
# echo 3 >  /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

5. Start MoonGen in another host. I'll upload the lua file later.
Default parameter:
Packet Size: 64 Byte
Running time: 60s
Stream Rate: 0.35Mpps
(Note: I choose 0.35Mpps, because in my several testing, seems this is the max throughput in this case)

# ./build/MoonGen rfc1242.lua

6. Start testpmd with '--forward-mode=macswap' in guest. Results looks good.
/usr/bin/testpmd \
-l 1,2,3 \
-n 4 \
-d /usr/lib64/librte_pmd_virtio.so.1 \
-w 0000:02:00.0 -w 0000:03:00.0 \
-- \
--nb-cores=2 \
--disable-hw-vlan \
-i \
--disable-rss \
--rxq=1 --txq=1 \
--forward-mode=macswap

testpmd> quit 
Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 10499485       RX-dropped: 0             RX-total: 10499485
  TX-packets: 10365296       TX-dropped: 0             TX-total: 10365296
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 10365296       RX-dropped: 0             RX-total: 10365296
  TX-packets: 10499485       TX-dropped: 0             TX-total: 10499485
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 20864781       RX-dropped: 0             RX-total: 20864781
  TX-packets: 20864781       TX-dropped: 0             TX-total: 20864781
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

Comment 9 Pei Zhang 2017-05-17 07:54:49 UTC
Created attachment 1279561 [details]
lua file used in step 5 in Comment 8

Comment 10 Pei Zhang 2017-05-17 07:58:23 UTC
(In reply to Wei from comment #7)
> Hi Pei,
> Your cli looks good overall, my test was based on the upstream qemu synced
> to the latest code about 3 months ago, seems the only difference is the
> option for root device, here is my qemu cli. 

Hi Wei, I confirmed with Q35 QE, the usage of pcie has been updated as '-device pcie-root-port,id=root.1,slot=1' in rhel7.4.


> Other check points:
> 1. This feature needs host and guest kernel support, please update both of
> them to the latest 7.4 build.
> 2. Enable iommu in guest by indicate the grub parameter, and the host kernel
> doesn't need it.
> 3. Run dpdk/l2fwd/testpmd inside the guest with vfio passing through
> virtio-net devices.

Could you please check if Comment 8 can verify this bug? Thanks.

Comment 11 Amnon Ilan 2017-05-17 08:19:30 UTC
Pei, 
Can you please compare the throughput with and without IOTLB/IOMMU? 
similar to what you did here:
https://bugzilla.redhat.com/show_bug.cgi?id=1335808#c11

Comment 12 Wei 2017-05-17 08:27:23 UTC
(In reply to Pei Zhang from comment #10)
> (In reply to Wei from comment #7)
> > Hi Pei,
> > Your cli looks good overall, my test was based on the upstream qemu synced
> > to the latest code about 3 months ago, seems the only difference is the
> > option for root device, here is my qemu cli. 
> 
> Hi Wei, I confirmed with Q35 QE, the usage of pcie has been updated as
> '-device pcie-root-port,id=root.1,slot=1' in rhel7.4.
> 

It is good to use this option as well.

> 
> 
> > Other check points:
> > 1. This feature needs host and guest kernel support, please update both of
> > them to the latest 7.4 build.
> > 2. Enable iommu in guest by indicate the grub parameter, and the host kernel
> > doesn't need it.
> > 3. Run dpdk/l2fwd/testpmd inside the guest with vfio passing through
> > virtio-net devices.
> 
> Could you please check if Comment 8 can verify this bug? Thanks.

Yes, it works, and it is also good to do a benchmark comparison as Amnon suggested in comment 11.

Comment 13 Pei Zhang 2017-05-18 10:12:19 UTC
Created attachment 1279953 [details]
testing topology of testing throught of vhost_net

(In reply to Amnon Ilan from comment #11)
> Pei, 
> Can you please compare the throughput with and without IOTLB/IOMMU? 
> similar to what you did here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1335808#c11

Summary: The throughput value are same with and without '-device intel-iommu,device-iotlb=on'

- Default Parameters when testing throughput:
  Traffic Generator: MoonGen
  Acceptable Loss: 0.002%
  Frame Size: 64Byte
  Unidirectional: No 
  Search run time:10s
  Validation run time: 30s
  Virtio features: default
  CPU: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz
  NIC: 10-Gigabit X540-AT2


==Results==
- With '-device intel-iommu,device-iotlb=on'
No   Throughput(Mpps)    packets_loss_rate
1    0.377479            0.000000%
2    0.377479            0.000009%

- Without '-device intel-iommu,device-iotlb=on'
No   Throughput(Mpps)    packets_loss_rate
1    0.377479            0.000000%
2    0.377479            0.000000%


==Some Highlight==
(1)Testing the bidirectional throughput, please refer to attachment of this Comment to get the topology chart.

(2)With hugepage, the performance will be better, so test with 1G hugepage.
   - throughput value of without hugepages is less than 0.12 Mpps(Last 0.12Mpps validation FAILED)

(3)Where the memory and cores used by guest locates in host seems doesn't affect the testing results. So I didn't set any cup pin.
   - throughput value with memory and cores in NUMA1 is even a bit lower, it's about 0.230184Mpps. 


Key steps:
==Qemu command line with '-device intel-iommu,device-iotlb=on':
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap \
-cpu host -m 8G \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-device pcie-root-port,id=root.4,slot=4 \
-netdev tap,id=hostnet0,vhost=on \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup0,downscript=/etc/qemu-ifdown0 \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup1,downscript=/etc/qemu-ifdown1 \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:11,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13,iommu_platform=on,ats=on \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.4 \
-vnc :2 \
-monitor stdio \

==Qemu command line without '-device intel-iommu,device-iotlb=on':
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35 \
-cpu host -m 8G \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-device pcie-root-port,id=root.4,slot=4 \
-netdev tap,id=hostnet0,vhost=on \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup0,downscript=/etc/qemu-ifdown0 \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup1,downscript=/etc/qemu-ifdown1 \
-device virtio-net-pci,netdev=hostnet0,id=net0,bus=root.1,mac=88:66:da:5f:dd:11 \
-device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12 \
-device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.4 \
-vnc :2 \
-monitor stdio \


Thanks,
Pei

Comment 14 Pei Zhang 2017-05-18 10:17:30 UTC
Some additional info:

In Comment 8, I was testing with 1 NIC. 

And in Comment 13, I was testing with 2 NICs, and testing the throughput value using this lua file. https://github.com/atheurer/MoonGen/blob/opnfv-dev/examples/opnfv-vsperf.lua

Comment 15 Amnon Ilan 2017-05-18 14:15:40 UTC
@Jason, @Wei, is the throughput low? and how come we do not see 
any difference with and without vIOMMU?

Comment 16 jason wang 2017-05-22 07:39:53 UTC
(In reply to Amnon Ilan from comment #15)
> @Jason, @Wei, is the throughput low? and how come we do not see 
> any difference with and without vIOMMU?

I suspect there's some misconfiguration in the setup.

Pei Zhang, several questions, let's try not using IOMMU first:

- Can you measure and report pps in host interface?
- Are you using a 40G or 10G nic to do the testing?
- Which NIC are you used in host?
- Is the number be better if you use macvtap instead of tap?
- What's the number e.g just using pktgen to inject traffic in tap0?

Thanks

Comment 17 Pei Zhang 2017-05-22 08:43:19 UTC
(In reply to jason wang from comment #16)
> (In reply to Amnon Ilan from comment #15)
> > @Jason, @Wei, is the throughput low? and how come we do not see 
> > any difference with and without vIOMMU?
> 
> I suspect there's some misconfiguration in the setup.
> 
> Pei Zhang, several questions, let's try not using IOMMU first:

Hi Jason, thanks for your questions.

> - Can you measure and report pps in host interface?

What do you mean by "measure and report pps in host interface", could you please explain a little more? 

> - Are you using a 40G or 10G nic to do the testing?

10G nic.

> - Which NIC are you used in host?

Two 10-Gigabit X540-AT2 cards.

> - Is the number be better if you use macvtap instead of tap?

No, I didn't test macvtap. I can test it if needed.

> - What's the number e.g just using pktgen to inject traffic in tap0?

This number is throughput value tested by MoonGen. MoonGen generate packets from one port, after packets forwarding by dpdk'testpmd in guest(nic0 -> switch -> tap0 -> virtio_net -> testpmd), finally receive the packets from another port. Please see attachment of Comment 13. 

So I didn't use pktgen. 



Thanks,
Pei

> Thanks

Comment 18 jason wang 2017-05-22 08:48:19 UTC
(In reply to Pei Zhang from comment #17)
> (In reply to jason wang from comment #16)
> > (In reply to Amnon Ilan from comment #15)
> > > @Jason, @Wei, is the throughput low? and how come we do not see 
> > > any difference with and without vIOMMU?
> > 
> > I suspect there's some misconfiguration in the setup.
> > 
> > Pei Zhang, several questions, let's try not using IOMMU first:
> 
> Hi Jason, thanks for your questions.
> 
> > - Can you measure and report pps in host interface?
> 
> What do you mean by "measure and report pps in host interface", could you
> please explain a little more? 

I mean e.g if your host interface is enp0s3/enp0s4, please measure its pps when you are doing the test.

> 
> > - Are you using a 40G or 10G nic to do the testing?
> 
> 10G nic.
> 
> > - Which NIC are you used in host?
> 
> Two 10-Gigabit X540-AT2 cards.

Can you use ethtool -i $interface to see its driver?

> 
> > - Is the number be better if you use macvtap instead of tap?
> 
> No, I didn't test macvtap. I can test it if needed.

Yes please.

> 
> > - What's the number e.g just using pktgen to inject traffic in tap0?
> 
> This number is throughput value tested by MoonGen. MoonGen generate packets
> from one port, after packets forwarding by dpdk'testpmd in guest(nic0 ->
> switch -> tap0 -> virtio_net -> testpmd), finally receive the packets from
> another port. Please see attachment of Comment 13. 

I see, what I want is e.g run pktgen on tap0 directly:

pktgen -> tap0 -> virtio_net ->testpmd -> tap1

Thanks

> 
> So I didn't use pktgen. 
> 
> 
> 
> Thanks,
> Pei
> 
> > Thanks

Comment 19 Pei Zhang 2017-05-22 09:08:14 UTC
(In reply to jason wang from comment #18)
> (In reply to Pei Zhang from comment #17)
> > (In reply to jason wang from comment #16)
> > > (In reply to Amnon Ilan from comment #15)
> > > > @Jason, @Wei, is the throughput low? and how come we do not see 
> > > > any difference with and without vIOMMU?
> > > 
> > > I suspect there's some misconfiguration in the setup.
> > > 
> > > Pei Zhang, several questions, let's try not using IOMMU first:
> > 
> > Hi Jason, thanks for your questions.
> > 
> > > - Can you measure and report pps in host interface?
> > 
> > What do you mean by "measure and report pps in host interface", could you
> > please explain a little more? 
> 
> I mean e.g if your host interface is enp0s3/enp0s4, please measure its pps
> when you are doing the test.

Ok. I'll test this with pktgen. 

> > 
> > > - Are you using a 40G or 10G nic to do the testing?
> > 
> > 10G nic.
> > 
> > > - Which NIC are you used in host?
> > 
> > Two 10-Gigabit X540-AT2 cards.
> 
> Can you use ethtool -i $interface to see its driver?

# ethtool -i p1p1
driver: ixgbe
version: 4.4.0-k-rh7.4
firmware-version: 0x8000059e
expansion-rom-version: 
bus-info: 0000:81:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

# ethtool -i p1p2
driver: ixgbe
version: 4.4.0-k-rh7.4
firmware-version: 0x8000059e
expansion-rom-version: 
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no



> > 
> > > - Is the number be better if you use macvtap instead of tap?
> > 
> > No, I didn't test macvtap. I can test it if needed.
> 
> Yes please.

OK. 

> > 
> > > - What's the number e.g just using pktgen to inject traffic in tap0?
> > 
> > This number is throughput value tested by MoonGen. MoonGen generate packets
> > from one port, after packets forwarding by dpdk'testpmd in guest(nic0 ->
> > switch -> tap0 -> virtio_net -> testpmd), finally receive the packets from
> > another port. Please see attachment of Comment 13. 
> 
> I see, what I want is e.g run pktgen on tap0 directly:
> 
> pktgen -> tap0 -> virtio_net ->testpmd -> tap1

OK.

Thanks,
Pei


> Thanks
> 
> > 
> > So I didn't use pktgen. 
> > 
> > 
> > 
> > Thanks,
> > Pei
> > 
> > > Thanks

Comment 20 Pei Zhang 2017-06-05 08:59:18 UTC
Summary: The performance seems very close between with and without vIOMMU.

Flow chart:  pktgen -> tap0 -> virtio_net ->testpmd -> tap1

==Results with vIOMMU==
[Run 1]
TX tap0: 1702552 pkts/s RX tap0: 0 pkts/s
TX tap0: 1667841 pkts/s RX tap0: 0 pkts/s
TX tap0: 1612958 pkts/s RX tap0: 0 pkts/s
TX tap0: 1620435 pkts/s RX tap0: 0 pkts/s
TX tap0: 1663647 pkts/s RX tap0: 0 pkts/s

TX tap1: 0 pkts/s RX tap1: 678904 pkts/s
TX tap1: 0 pkts/s RX tap1: 678458 pkts/s
TX tap1: 0 pkts/s RX tap1: 678194 pkts/s
TX tap1: 0 pkts/s RX tap1: 677958 pkts/s
TX tap1: 0 pkts/s RX tap1: 677852 pkts/s

[Run 2]
TX tap0: 1531125 pkts/s RX tap0: 0 pkts/s
TX tap0: 1411951 pkts/s RX tap0: 0 pkts/s
TX tap0: 1511287 pkts/s RX tap0: 0 pkts/s
TX tap0: 1609893 pkts/s RX tap0: 0 pkts/s
TX tap0: 1467514 pkts/s RX tap0: 0 pkts/s

TX tap1: 0 pkts/s RX tap1: 685235 pkts/s
TX tap1: 0 pkts/s RX tap1: 685570 pkts/s
TX tap1: 0 pkts/s RX tap1: 685947 pkts/s
TX tap1: 0 pkts/s RX tap1: 685732 pkts/s
TX tap1: 0 pkts/s RX tap1: 685521 pkts/s


==Results without vIOMMU==
[Run 1] 
TX tap0: 1318424 pkts/s RX tap0: 0 pkts/s
TX tap0: 1317694 pkts/s RX tap0: 0 pkts/s
TX tap0: 1317627 pkts/s RX tap0: 0 pkts/s
TX tap0: 1316501 pkts/s RX tap0: 0 pkts/s
TX tap0: 1316072 pkts/s RX tap0: 0 pkts/s
TX tap0: 1316133 pkts/s RX tap0: 0 pkts/s

TX tap1: 0 pkts/s RX tap1: 684954 pkts/s
TX tap1: 0 pkts/s RX tap1: 684990 pkts/s
TX tap1: 0 pkts/s RX tap1: 685265 pkts/s
TX tap1: 0 pkts/s RX tap1: 662117 pkts/s
TX tap1: 0 pkts/s RX tap1: 648610 pkts/s
TX tap1: 0 pkts/s RX tap1: 648114 pkts/s

[Run 2]
TX tap0: 1433924 pkts/s RX tap0: 0 pkts/s
TX tap0: 1434691 pkts/s RX tap0: 0 pkts/s
TX tap0: 1433702 pkts/s RX tap0: 0 pkts/s
TX tap0: 1435838 pkts/s RX tap0: 0 pkts/s
TX tap0: 1431688 pkts/s RX tap0: 0 pkts/s

TX tap1: 0 pkts/s RX tap1: 676788 pkts/s
TX tap1: 0 pkts/s RX tap1: 676351 pkts/s
TX tap1: 0 pkts/s RX tap1: 676295 pkts/s
TX tap1: 0 pkts/s RX tap1: 676208 pkts/s
TX tap1: 0 pkts/s RX tap1: 676281 pkts/s


Steps:
1. Boot VM. With vIOMMU, refer to[1]. Without vIOMMU, refer to[2]

2. Pin vhost threads to the cores(core 9,11) which are in same NUMA node with network device.
# ps -ef | grep vhost
...
root      50322      2  0 02:45 ?        00:00:00 [vhost-50310]
root      50330      2  0 02:45 ?        00:00:00 [vhost-50310]

# taskset -cp 9 50322
# taskset -cp 11 50330

3. Pin vCPUs to cores(1,3,5,7) which are in same NUMA node with network device.
(qemu) info cpus
* CPU #0: pc=0xffffffff816a7596 (halted) thread_id=50429
  CPU #1: pc=0xffffffff816a7596 (halted) thread_id=50430
  CPU #2: pc=0xffffffff816a7596 (halted) thread_id=50431
  CPU #3: pc=0xffffffff816a7596 (halted) thread_id=50432

# taskset -cp 1 50429
# taskset -cp 3 50430
# taskset -cp 5 50431
# taskset -cp 7 50432

4. In VM, load vfio
With vIOMMU:
# modprobe vfio
# modprobe vfio-pci

Without vIOMMU:
# modprobe vfio enable_unsafe_noiommu_mode=Y
# modprobe vfio-pci

5. In VM, bind NICs to vfio and reserve hugepage
# dpdk-devbind --bind=vfio-pci 0000:02:00.0
# dpdk-devbind --bind=vfio-pci 0000:03:00.0

# echo 4 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

5. Start testpmd, refer to [3]

6. Start pktgen in tap0, refer to[4]
# sh pktgen.sh tap0

7. Monitor pps in tap0 and tap1, refer to[5]
# sh pps.sh tap0
# sh pps.sh tap1


[1] Boot VM with vIOMMU
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap \
-cpu host -m 8G \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup3,downscript=/etc/qemu-ifdown3 \
-device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12,iommu_platform=on,ats=on \
-device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13,iommu_platform=on,ats=on \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1 \
-vnc :2 \
-monitor stdio \

[2] Boot VM without vIOMMU
# /usr/libexec/qemu-kvm -name rhel7.4 -M q35 \
-cpu host -m 8G \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,slot=1 \
-device pcie-root-port,id=root.2,slot=2 \
-device pcie-root-port,id=root.3,slot=3 \
-netdev tap,id=hostnet1,vhost=on,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-netdev tap,id=hostnet2,vhost=on,script=/etc/qemu-ifup3,downscript=/etc/qemu-ifdown3 \
-device virtio-net-pci,netdev=hostnet1,id=net1,bus=root.2,mac=88:66:da:5f:dd:12 \
-device virtio-net-pci,netdev=hostnet2,id=net2,bus=root.3,mac=88:66:da:5f:dd:13 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bus=root.1 \
-vnc :2 \
-monitor stdio \


[3] Boot testpmd with macswap
# /usr/bin/testpmd \
-l 1,2,3 \
-n 4 \
-d /usr/lib64/librte_pmd_virtio.so.1 \
-w 0000:02:00.0 -w 0000:03:00.0 \
-- \
--nb-cores=2 \
--disable-hw-vlan \
-i \
--disable-rss \
--rxq=1 --txq=1 \
--forward-mode=macswap


[4] script of pktgen.sh
# cat pktgen.sh

#!/bin/sh
# usage sh pktgen.sh $device $queues

modprobe -r pktgen
modprobe pktgen
echo reset > /proc/net/pktgen/pgctrl

ifconfig $1 up

function pgset() {
local result

echo $1 > $PGDEV

result=`cat $PGDEV | fgrep "Result: OK:"`
if [ "$result" = "" ]; then
cat $PGDEV | fgrep Result:
fi
}

function pg() {
echo inject > $PGDEV
cat $PGDEV
}

for i in 0 `seq $(($2-1))`
do
echo "Adding queue 0 of $1"
dev=$1@$i

PGDEV=/proc/net/pktgen/kpktgend_$i
pgset "rem_device_all"
pgset "add_device $dev"
pgset "max_before_softirq 100000"

# Configure the individual devices
echo "Configuring devices $dev"

PGDEV=/proc/net/pktgen/$dev

pgset "queue_map_min $i"
pgset "queue_map_max $i"
pgset "count 10000000"
pgset "min_pkt_size 60"
pgset "max_pkt_size 60"
pgset "dst $DST_system_ip"
pgset "dst_mac 88:66:da:5f:dd:12"
pgset "udp_src_min 0"
pgset "udp_src_max 65535"
pgset "udp_dst_min 0"
pgset "udp_dst_max 65535"
done

# Time to run

PGDEV=/proc/net/pktgen/pgctrl

echo "Running... ctrl^C to stop"

pgset "start"

echo "Done"

[5]script of pps.sh
# cat pps.sh 
#!/bin/bash
 
INTERVAL="1"  # update interval in seconds
 
if [ -z "$1" ]; then
        echo
        echo usage: $0 [network-interface]
        echo
        echo e.g. $0 eth0
        echo
        echo shows packets-per-second
        exit
fi
 
IF=$1
 
while true
do
        R1=`cat /sys/class/net/$1/statistics/rx_packets`
        T1=`cat /sys/class/net/$1/statistics/tx_packets`
        sleep $INTERVAL
        R2=`cat /sys/class/net/$1/statistics/rx_packets`
        T2=`cat /sys/class/net/$1/statistics/tx_packets`
        TXPPS=`expr $T2 - $T1`
        RXPPS=`expr $R2 - $R1`
        echo "TX $1: $TXPPS pkts/s RX $1: $RXPPS pkts/s"
done

Comment 21 Pei Zhang 2017-06-05 09:04:58 UTC
Versions of above Comment 21:
3.10.0-675.el7.x86_64
qemu-kvm-rhev-2.9.0-7.el7.x86_64
dpdk-16.11-4.el7fdp.x86_64(in guest)

Comment 22 jason wang 2017-06-05 09:17:14 UTC
(In reply to Pei Zhang from comment #20)
> Summary: The performance seems very close between with and without vIOMMU.
> 
> Flow chart:  pktgen -> tap0 -> virtio_net ->testpmd -> tap1
> 
> ==Results with vIOMMU==
> [Run 1]
> TX tap0: 1702552 pkts/s RX tap0: 0 pkts/s
> TX tap0: 1667841 pkts/s RX tap0: 0 pkts/s
> TX tap0: 1612958 pkts/s RX tap0: 0 pkts/s
> TX tap0: 1620435 pkts/s RX tap0: 0 pkts/s
> TX tap0: 1663647 pkts/s RX tap0: 0 pkts/s
> 
> TX tap1: 0 pkts/s RX tap1: 678904 pkts/s
> TX tap1: 0 pkts/s RX tap1: 678458 pkts/s
> TX tap1: 0 pkts/s RX tap1: 678194 pkts/s
> TX tap1: 0 pkts/s RX tap1: 677958 pkts/s
> TX tap1: 0 pkts/s RX tap1: 677852 pkts/s

Thanks for the testing.

Just to confirm, are you saying vIOMMU is faster?

Thanks

Comment 23 Pei Zhang 2017-06-05 10:11:06 UTC
(In reply to jason wang from comment #22)
> (In reply to Pei Zhang from comment #20)
> > Summary: The performance seems very close between with and without vIOMMU.
> > 
> > Flow chart:  pktgen -> tap0 -> virtio_net ->testpmd -> tap1
> > 
> > ==Results with vIOMMU==
> > [Run 1]
> > TX tap0: 1702552 pkts/s RX tap0: 0 pkts/s
> > TX tap0: 1667841 pkts/s RX tap0: 0 pkts/s
> > TX tap0: 1612958 pkts/s RX tap0: 0 pkts/s
> > TX tap0: 1620435 pkts/s RX tap0: 0 pkts/s
> > TX tap0: 1663647 pkts/s RX tap0: 0 pkts/s
> > 
> > TX tap1: 0 pkts/s RX tap1: 678904 pkts/s
> > TX tap1: 0 pkts/s RX tap1: 678458 pkts/s
> > TX tap1: 0 pkts/s RX tap1: 678194 pkts/s
> > TX tap1: 0 pkts/s RX tap1: 677958 pkts/s
> > TX tap1: 0 pkts/s RX tap1: 677852 pkts/s
> 
> Thanks for the testing.
> 
> Just to confirm, are you saying vIOMMU is faster? 

No, I can not say vIOMMU is faster. It's just very close.

As the results is not that stable, sometimes vIOMMU is faster, but not always. I did another 5 runs to confirm this question. (With each run, I reboot the VM then do testing)

http://pastebin.test.redhat.com/490772


Thanks,
Pei

> Thanks

Comment 24 Pei Zhang 2017-06-16 06:11:04 UTC
Hi Amnon, Jason,

Based on Comment 13, Comment 20, Comment 23, the throughput performance is very close between with and without vIOMMU. 

So can QE verify this bug? 


Thanks,
Pei

Comment 25 Amnon Ilan 2017-06-20 18:56:24 UTC
(In reply to Pei Zhang from comment #24)
> 
> Based on Comment 13, Comment 20, Comment 23, the throughput performance is
> very close between with and without vIOMMU. 
> 
> So can QE verify this bug? 

I think it can be verified now (keeping the needinfo for Jason to comment 
on that)

Comment 27 jason wang 2017-06-21 01:55:35 UTC
(In reply to Amnon Ilan from comment #25)
> (In reply to Pei Zhang from comment #24)
> > 
> > Based on Comment 13, Comment 20, Comment 23, the throughput performance is
> > very close between with and without vIOMMU. 
> > 
> > So can QE verify this bug? 
> 
> I think it can be verified now (keeping the needinfo for Jason to comment 
> on that)

Yes I think so.

Thanks

Comment 29 errata-xmlrpc 2017-08-01 23:39:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 30 errata-xmlrpc 2017-08-02 01:17:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 31 errata-xmlrpc 2017-08-02 02:09:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 32 errata-xmlrpc 2017-08-02 02:50:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 33 errata-xmlrpc 2017-08-02 03:14:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 34 errata-xmlrpc 2017-08-02 03:35:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392