Bug 1283257

Summary: [RFE] IOMMU support in Vhost-net
Product: Red Hat Enterprise Linux 7 Reporter: Amnon Ilan <ailan>
Component: kernelAssignee: Wei <wexu>
kernel sub component: KVM QA Contact: Quan Wenli <wquan>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: ailan, chayang, hannsj_uhl, huding, jasowang, juzhang, mtessun, peterx, pezhang, virt-maint, weliao, wquan, xfu, yfu
Version: 7.3Keywords: FutureFeature
Target Milestone: rc   
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-658.el7 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 20:02:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1283104, 1288337, 1395265, 1401433    

Description Amnon Ilan 2015-11-18 14:26:30 UTC
Description of problem:

Vhost-net should properly support IOMMU in order to allow guests to securely access devices from user space (e.g. in the dpdk in guest case)

Comment 1 jason wang 2016-08-23 07:18:48 UTC
Not a 7.3 material. Defer to 7.4.

Comment 2 Wei 2017-02-20 16:03:03 UTC
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1425127

Comment 3 jason wang 2017-03-23 06:10:12 UTC
Note for QE:

To test this, need cli like:

           -M q35 \
           -device intel-iommu,device-iotlb=on,intremap \
           -device ioh3420,id=root.1,chassis=1 \
           -device virtio-net-pci,netdev=hn0,id=v0,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \

This means you need:
[1] q35 chipset
[2] intel IOMMU with device-iotlb and interrupt remapping enabled
[3] pcie switch (ioh3420)
[4] modern virtio-net-pci device with both iommu_platform and ats enabled

In guest:
[1] add intel_iommu=on to kernel command line

For stress testing:
[1] netperf UDP with intel_iommu=on|strict
[2] pktgen to test both rx and tx

For performance testing:
Checking dpdk l2fwd performance should be sufficient

Comment 4 Quan Wenli 2017-03-23 06:27:25 UTC
(In reply to jason wang from comment #3)
> Note for QE:
> 
> To test this, need cli like:
> 
>            -M q35 \
>            -device intel-iommu,device-iotlb=on,intremap \
>            -device ioh3420,id=root.1,chassis=1 \
>            -device
> virtio-net-pci,netdev=hn0,id=v0,bus=root.1,disable-modern=off,disable-
> legacy=on,iommu_platform=on,ats=on \
> 
> This means you need:
> [1] q35 chipset
> [2] intel IOMMU with device-iotlb and interrupt remapping enabled
> [3] pcie switch (ioh3420)
> [4] modern virtio-net-pci device with both iommu_platform and ats enabled
> 

So we can just simply set "iommu_platform=off,ats=off" to disable vIOMMU?
 
> In guest:
> [1] add intel_iommu=on to kernel command line
> 
> For stress testing:
> [1] netperf UDP with intel_iommu=on|strict
> [2] pktgen to test both rx and tx
> 
> For performance testing:
> Checking dpdk l2fwd performance should be sufficient

for it, @pezhang, will your NFV team test it ?

Comment 5 jason wang 2017-03-23 06:31:38 UTC
(In reply to Quan Wenli from comment #4)
> (In reply to jason wang from comment #3)
> > Note for QE:
> > 
> > To test this, need cli like:
> > 
> >            -M q35 \
> >            -device intel-iommu,device-iotlb=on,intremap \
> >            -device ioh3420,id=root.1,chassis=1 \
> >            -device
> > virtio-net-pci,netdev=hn0,id=v0,bus=root.1,disable-modern=off,disable-
> > legacy=on,iommu_platform=on,ats=on \
> > 
> > This means you need:
> > [1] q35 chipset
> > [2] intel IOMMU with device-iotlb and interrupt remapping enabled
> > [3] pcie switch (ioh3420)
> > [4] modern virtio-net-pci device with both iommu_platform and ats enabled
> > 
> 
> So we can just simply set "iommu_platform=off,ats=off" to disable vIOMMU?
>  

You also need remove -device intel-iommu.

Thanks

Comment 6 Pei Zhang 2017-03-23 06:41:22 UTC
(In reply to Quan Wenli from comment #4)
> (In reply to jason wang from comment #3)
> > Note for QE:
> > 
> > To test this, need cli like:
> > 
> >            -M q35 \
> >            -device intel-iommu,device-iotlb=on,intremap \
> >            -device ioh3420,id=root.1,chassis=1 \
> >            -device
> > virtio-net-pci,netdev=hn0,id=v0,bus=root.1,disable-modern=off,disable-
> > legacy=on,iommu_platform=on,ats=on \
> > 
> > This means you need:
> > [1] q35 chipset
> > [2] intel IOMMU with device-iotlb and interrupt remapping enabled
> > [3] pcie switch (ioh3420)
> > [4] modern virtio-net-pci device with both iommu_platform and ats enabled
> > 
> 
> So we can just simply set "iommu_platform=off,ats=off" to disable vIOMMU?
>  
> > In guest:
> > [1] add intel_iommu=on to kernel command line
> > 
> > For stress testing:
> > [1] netperf UDP with intel_iommu=on|strict
> > [2] pktgen to test both rx and tx
> > 
> > For performance testing:
> > Checking dpdk l2fwd performance should be sufficient
> 
> for it, @pezhang, will your NFV team test it ?

Wenli, NFV testing can cover this.

Best Regards,
Pei

Comment 7 Rafael Aquini 2017-04-27 03:57:32 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 9 Rafael Aquini 2017-05-01 13:18:02 UTC
Patch(es) available on kernel-3.10.0-658.el7

Comment 11 Quan Wenli 2017-06-05 09:59:57 UTC
Hi, jason, wei

Could you help check following result between noiommu mode and vfio mode ? we could see:
1. 8% improvement with vfio mode when guest receiving the packet.
2. no difference with guest tx, and tx pps's performance is poor, it may seem like https://bugzilla.redhat.com/show_bug.cgi?id=1401433#c18

Packages:

host: 3.10.0-677.el7.x86_64
guest: 3.10.0-677.el7.x86_64
qemu-kvm-rhev-2.9.0-7.el7.x86_64


Steps:
1. boot up guest with iotlb device with viommu.

numactl -c 1 -m 1 /usr/libexec/qemu-kvm /home/kvm_autotest_root/images/RHEL-Server-7.2-64.qcow2 -netdev tap,id=hn0,queues=1,vhost=on,script=/etc/qemu-ifup-atbr0 -device ioh3420,id=root.1,chassis=1 -device virtio-net-pci,netdev=hn0,id=v0,mq=off,mac=00:00:05:00:00:07,bus=root.1 -netdev tap,id=hn1,queues=1,vhost=on,script=/etc/qemu-ifup-atbr0 -device ioh3420,id=root.2,chassis=2 -device virtio-net-pci,netdev=hn1,id=v1,mq=off,mac=00:00:05:00:00:08,bus=root.2 -m 6G -enable-kvm -cpu host -vnc :11 -smp 4 -monitor tcp:0:4444,server,nowait -M q35,kernel-irqchip=split -monitor stdio

2. pin 4 vcpus and 2 vhost to individual cores on host in one numa node. 

3. on guest 

3.1 Install dpdk-17.05-2.el7fdb.x86_64.rpm/dpdk-devel-17.05-2.el7fdb.x86_64.rpm/dpdk-tools-17.05-2.el7fdb.x86_64.rpm 
3.1 Add "intel_iommu=on", then reboot guest
3.2 echo 2048 >  /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
3.1 # ifconfig eth0 down & ifconfig eth1 down
3.2 # modprobe vfio ====> test with vfio mode 
 or # modprobe vfio enable_unsafe_noiommu_mode=Y ===> test with noiommu mode
    # modprobe vfio-pci
3.3 # lspci | grep Eth 
    # dpdk-devbind --bind=vfio-pci 0000:01:00.0
    # dpdk-devbind --bind=vfio-pci 0000:02:00.0
3.4 run testpmd then start it. 
# /usr/bin/testpmd \
-l 1,2,3 \
-n 4 \
-d /usr/lib64/librte_pmd_virtio.so.1 \
-w 0000:01:00.0 -w 0000:02:00.0 \
-- \
--nb-cores=2 \
--disable-hw-vlan \
-i \
--disable-rss \
--rxq=1 --txq=1

testpmd> start

4.Run "pktgen.sh tap0 "on host
  meanwhile run pps.sh tap0 on host to gather guest rx pps performance.
  and run pps.sh tap1 on host to gather guest tx pps performance.

5.Result 
           no-iommu mode  | vfio mode    
Guest rx |   968645       |   1050282    ------> 8% improve
Guest tx |   362869       |   364116     ------> no difference

Comment 12 Quan Wenli 2017-06-15 06:59:22 UTC
Hi, jason, wei 

Could you help check performance results in comment #11, is it as expected?

Comment 13 jason wang 2017-06-15 07:38:24 UTC
(In reply to Quan Wenli from comment #12)
> Hi, jason, wei 
> 
> Could you help check performance results in comment #11, is it as expected?

Kind of except for the low pps on tx.

What's the tx number of no-iommu mode before 658? If it's still low, this bug can be verified and you may open a new bug for tracking tx issue.

Thanks

Comment 14 Quan Wenli 2017-06-15 10:00:38 UTC
(In reply to jason wang from comment #13)
> (In reply to Quan Wenli from comment #12)
> > Hi, jason, wei 
> > 
> > Could you help check performance results in comment #11, is it as expected?
> 
> Kind of except for the low pps on tx.
> 
> What's the tx number of no-iommu mode before 658? If it's still low, this
> bug can be verified and you may open a new bug for tracking tx issue.
> 
> Thanks

Retest again with kernel-679 guest, the tx pps number is up to 0.52 for both no-iommu and vfio mode, downgrade kernel to 657, there is no ethernet in guest, so  can not get any pps.

Do you think the 0.52 tx pps is still bad ?

Comment 15 jason wang 2017-06-15 10:04:03 UTC
(In reply to Quan Wenli from comment #14)
> (In reply to jason wang from comment #13)
> > (In reply to Quan Wenli from comment #12)
> > > Hi, jason, wei 
> > > 
> > > Could you help check performance results in comment #11, is it as expected?
> > 
> > Kind of except for the low pps on tx.
> > 
> > What's the tx number of no-iommu mode before 658? If it's still low, this
> > bug can be verified and you may open a new bug for tracking tx issue.
> > 
> > Thanks
> 
> Retest again with kernel-679 guest, the tx pps number is up to 0.52 for both
> no-iommu and vfio mode, downgrade kernel to 657, there is no ethernet in
> guest, so  can not get any pps.

You need clear iommu_platform I think.

> 
> Do you think the 0.52 tx pps is still bad ?

Not good at least.

Thanks

Comment 16 Quan Wenli 2017-06-16 08:50:10 UTC
(In reply to jason wang from comment #15)
> (In reply to Quan Wenli from comment #14)
> > (In reply to jason wang from comment #13)
> > > (In reply to Quan Wenli from comment #12)
> > > > Hi, jason, wei 
> > > > 
> > > > Could you help check performance results in comment #11, is it as expected?
> > > 
> > > Kind of except for the low pps on tx.
> > > 
> > > What's the tx number of no-iommu mode before 658? If it's still low, this
> > > bug can be verified and you may open a new bug for tracking tx issue.
> > > 
> > > Thanks
> > 
> > Retest again with kernel-679 guest, the tx pps number is up to 0.52 for both
> > no-iommu and vfio mode, downgrade kernel to 657, there is no ethernet in
> > guest, so  can not get any pps.
> 
> You need clear iommu_platform I think.

no iommu_platform=on,ats=on 657 kernel - > 530173 pps
no iommu_platform=on,ats=on 679 kernel - > 529250 pps
enable iommu_platform=on,ats=on 679 kernel - > 529689 pps
enable iommu_platform=on,ats=on  kernel-4.11.0-rc5+ - > 528384 pps


They are almost similar on tx pps. do you think I need to open a new bug for tracking low tx performance(0.5mpps). 

 
> 
> > 
> > Do you think the 0.52 tx pps is still bad ?
> 
> Not good at least.
> 
> Thanks

Comment 17 jason wang 2017-06-16 09:57:55 UTC
(In reply to Quan Wenli from comment #16)
> (In reply to jason wang from comment #15)
> > (In reply to Quan Wenli from comment #14)
> > > (In reply to jason wang from comment #13)
> > > > (In reply to Quan Wenli from comment #12)
> > > > > Hi, jason, wei 
> > > > > 
> > > > > Could you help check performance results in comment #11, is it as expected?
> > > > 
> > > > Kind of except for the low pps on tx.
> > > > 
> > > > What's the tx number of no-iommu mode before 658? If it's still low, this
> > > > bug can be verified and you may open a new bug for tracking tx issue.
> > > > 
> > > > Thanks
> > > 
> > > Retest again with kernel-679 guest, the tx pps number is up to 0.52 for both
> > > no-iommu and vfio mode, downgrade kernel to 657, there is no ethernet in
> > > guest, so  can not get any pps.
> > 
> > You need clear iommu_platform I think.
> 
> no iommu_platform=on,ats=on 657 kernel - > 530173 pps
> no iommu_platform=on,ats=on 679 kernel - > 529250 pps
> enable iommu_platform=on,ats=on 679 kernel - > 529689 pps
> enable iommu_platform=on,ats=on  kernel-4.11.0-rc5+ - > 528384 pps
> 
> 
> They are almost similar on tx pps. do you think I need to open a new bug for
> tracking low tx performance(0.5mpps). 

According to your test, it was not introduced by iommu support. Please open a bug and flag it to 7.5. And we can verify this bug.

Thanks

> 
>  
> > 
> > > 
> > > Do you think the 0.52 tx pps is still bad ?
> > 
> > Not good at least.
> > 
> > Thanks

Comment 18 Quan Wenli 2017-06-19 05:59:44 UTC
(In reply to jason wang from comment #17)
> (In reply to Quan Wenli from comment #16)
> > (In reply to jason wang from comment #15)
> > > (In reply to Quan Wenli from comment #14)
> > > > (In reply to jason wang from comment #13)
> > > > > (In reply to Quan Wenli from comment #12)
> > > > > > Hi, jason, wei 
> > > > > > 
> > > > > > Could you help check performance results in comment #11, is it as expected?
> > > > > 
> > > > > Kind of except for the low pps on tx.
> > > > > 
> > > > > What's the tx number of no-iommu mode before 658? If it's still low, this
> > > > > bug can be verified and you may open a new bug for tracking tx issue.
> > > > > 
> > > > > Thanks
> > > > 
> > > > Retest again with kernel-679 guest, the tx pps number is up to 0.52 for both
> > > > no-iommu and vfio mode, downgrade kernel to 657, there is no ethernet in
> > > > guest, so  can not get any pps.
> > > 
> > > You need clear iommu_platform I think.
> > 
> > no iommu_platform=on,ats=on 657 kernel - > 530173 pps
> > no iommu_platform=on,ats=on 679 kernel - > 529250 pps
> > enable iommu_platform=on,ats=on 679 kernel - > 529689 pps
> > enable iommu_platform=on,ats=on  kernel-4.11.0-rc5+ - > 528384 pps
> > 
> > 
> > They are almost similar on tx pps. do you think I need to open a new bug for
> > tracking low tx performance(0.5mpps). 
> 
> According to your test, it was not introduced by iommu support. Please open
> a bug and flag it to 7.5. And we can verify this bug.
> 

Set it to verified and opend bug 1462633 for tracking the low tx issue. 

> Thanks
> 
> > 
> >  
> > > 
> > > > 
> > > > Do you think the 0.52 tx pps is still bad ?
> > > 
> > > Not good at least.
> > > 
> > > Thanks

Comment 20 errata-xmlrpc 2017-08-01 20:02:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Comment 21 errata-xmlrpc 2017-08-02 00:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842