Bug 488882
Summary: | cxgb3 driver very slow under Xen with HW acceleration enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Wagner <mwagner> | ||||
Component: | kernel-xen | Assignee: | Paolo Bonzini <pbonzini> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.3 | CC: | agospoda, clalance, divy, herbert.xu, indranil, leiwang, mjenner, mwagner, pbonzini, peterm, xen-maint, yuzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-01-13 20:46:36 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 514490 | ||||||
Attachments: |
|
Description
Mark Wagner
2009-03-06 02:37:59 UTC
Any statistics about drops or invalid checksums after this run? I'm guessing that if you look at the stats for the device before and after the run you would see a lot more than 87380 bytes received. Do you have this setup somewhere where I can take a look? A before and after [root@perf10 ~]# ethtool -k peth3 Offload parameters for peth3: Cannot get device rx csum settings: No such device Cannot get device tx csum settings: No such device Cannot get device scatter-gather settings: No such device Cannot get device tcp segmentation offload settings: No such device Cannot get device udp large send offload settings: No such device Cannot get device generic segmentation offload settings: No such device no offload info available [root@perf10 ~]# ethtool -k peth2 Offload parameters for peth2: Cannot get device udp large send offload settings: Operation not supported rx-checksumming: on tx-checksumming: off scatter-gather: off tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off [root@perf10 ~]# ethtool -S peth2 NIC statistics: TxOctetsOK : 134044256554 TxFramesOK : 107255105 TxMulticastFramesOK: 262 TxBroadcastFramesOK: 56 TxPauseFrames : 0 TxUnderrun : 0 TxExtUnderrun : 0 TxFrames64 : 87 TxFrames65To127 : 19544570 TxFrames128To255 : 17101 TxFrames256To511 : 59350 TxFrames512To1023 : 427272 TxFrames1024To1518 : 87206725 TxFrames1519ToMax : 0 RxOctetsOK : 79921913968 RxFramesOK : 79557155 RxMulticastFramesOK: 35673 RxBroadcastFramesOK: 1184 RxPauseFrames : 0 RxFCSErrors : 0 RxSymbolErrors : 0 RxShortErrors : 0 RxJabberErrors : 0 RxLengthErrors : 0 RxFIFOoverflow : 0 RxFrames64 : 91 RxFrames65To127 : 28118586 RxFrames128To255 : 1410 RxFrames256To511 : 5334 RxFrames512To1023 : 175308 RxFrames1024To1518 : 51256426 RxFrames1519ToMax : 0 PhyFIFOErrors : 0 TSO : 7374 VLANextractions : 0 VLANinsertions : 0 TxCsumOffload : 7707 RxCsumGood : 79520496 LroAggregated : 0 LroFlushed : 0 LroNoDesc : 0 RxDrops : 0 CheckTXEnToggled : 0 CheckResets : 0 [root@perf10 ~]# ethtool -K peth2 tx on sg on tso on Jump on guest, run netperf [root@dhcp47-134 np2.4]# ./netperf -l 15 -H 172.17.10.15 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.10.15 (172.17.10.15) port 0 AF_INET : spin interval : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 15.17 198.53 [root@perf10 ~]# ethtool -S peth2 NIC statistics: TxOctetsOK : 134438894870 TxFramesOK : 107515094 TxMulticastFramesOK: 262 TxBroadcastFramesOK: 57 TxPauseFrames : 0 TxUnderrun : 0 TxExtUnderrun : 0 TxFrames64 : 88 TxFrames65To127 : 19544582 TxFrames128To255 : 17102 TxFrames256To511 : 59352 TxFrames512To1023 : 427275 TxFrames1024To1518 : 87466695 TxFrames1519ToMax : 0 RxOctetsOK : 79924285765 RxFramesOK : 79590973 RxMulticastFramesOK: 35726 RxBroadcastFramesOK: 1184 RxPauseFrames : 0 RxFCSErrors : 0 RxSymbolErrors : 0 RxShortErrors : 0 RxJabberErrors : 0 RxLengthErrors : 0 RxFIFOoverflow : 0 RxFrames64 : 93 RxFrames65To127 : 28152397 RxFrames128To255 : 1412 RxFrames256To511 : 5337 RxFrames512To1023 : 175308 RxFrames1024To1518 : 51256426 RxFrames1519ToMax : 0 PhyFIFOErrors : 0 TSO : 13380 VLANextractions : 0 VLANinsertions : 0 TxCsumOffload : 13875 RxCsumGood : 79554260 LroAggregated : 0 LroFlushed : 0 LroNoDesc : 0 RxDrops : 0 CheckTXEnToggled : 0 CheckResets : 0 Here is a diff of the two outputs from a second, similar run: [root@perf10 ~]# diff -w bz1.txt bz2.txt 2,3c2,3 < TxOctetsOK : 134438894870 < TxFramesOK : 107515094 --- > TxOctetsOK : 134770288502 > TxFramesOK : 107733415 5c5 < TxBroadcastFramesOK: 57 --- > TxBroadcastFramesOK: 58 9,10c9,10 < TxFrames64 : 88 < TxFrames65To127 : 19544582 --- > TxFrames64 : 89 > TxFrames65To127 : 19544591 12,14c12,14 < TxFrames256To511 : 59352 < TxFrames512To1023 : 427275 < TxFrames1024To1518 : 87466695 --- > TxFrames256To511 : 59353 > TxFrames512To1023 : 427277 > TxFrames1024To1518 : 87685003 16,18c16,18 < RxOctetsOK : 79924299370 < RxFramesOK : 79591070 < RxMulticastFramesOK: 35817 --- > RxOctetsOK : 79926309990 > RxFramesOK : 79619767 > RxMulticastFramesOK: 35829 27,28c27,28 < RxFrames64 : 93 < RxFrames65To127 : 28152485 --- > RxFrames64 : 94 > RxFrames65To127 : 28181179 30c30 < RxFrames256To511 : 5343 --- > RxFrames256To511 : 5345 35c35 < TSO : 13380 --- > TSO : 18338 38,39c38,39 < TxCsumOffload : 13875 < RxCsumGood : 79554260 --- > TxCsumOffload : 18959 > RxCsumGood : 79582944 Mark, can you try just turning TSO off without turning tx checksum offload off too? Thanks! Herbert I tried with tso off and there were "spurts" of traffic but not a decent flow. The average throughput was less than half of what I was able to get with tx off as well. OK, please set up a machine with a Xen guest running and give me remote access so I can try to debug this. Thanks! I think that giving Herbert access to the machine fulfilled the need for the needinfo flag. Divy, have you done much testing with Xen? One opinion is that the way in which you free skbs (noted in the large comment in t3_eth_xmit) means that bursts like this will be expected when used in conjunction with virtualization (due to the limited number of pages available that need to be re-used quickly). Do you have any thoughts about trying to add a tx-completion interrupt to try and address this? Created attachment 350893 [details]
Tx credit return management for Dom0's Xen
Hi Andy,
We've not done much Xen testing in RHEL context. We however ship our driver in both Citrix'Xen server and VMware's ESX. We have hit such a performance degradation. We did not correlate it with tx hw assist, but we've root caused it. It points to the opinion you mention :)
In all the virtualized environments we have tested, the VM's app's send buffer frees up its load only when the hypervisor's driver frees the corresponding skb.
cxgb3 however does not free a TX skb on DMA completion.
The driver relies on FW generated credit returns posted on the receive control queues.
In non virtual environments, the driver programs the HW to coalesce these credit returns to minimize the FW management load, and relies on skb_orphan() to free up space in the app'send buffer. skbs are freed on credit return receptions.
It does not work for the VMs, skb_orphan() won't free up virtualized app'send buffer.
The attached patch provides a much more aggressive credit return policy, and has solved our perf issues on other virtualized platforms.
Cheers,
Divy
(In reply to comment #11) > > Hi Andy, > > We've not done much Xen testing in RHEL context. We however ship our driver in > both Citrix'Xen server and VMware's ESX. We have hit such a performance > degradation. We did not correlate it with tx hw assist, but we've root caused > it. It points to the opinion you mention :) > > In all the virtualized environments we have tested, the VM's app's send buffer > frees up its load only when the hypervisor's driver frees the corresponding > skb. > cxgb3 however does not free a TX skb on DMA completion. > The driver relies on FW generated credit returns posted on the receive control > queues. > > In non virtual environments, the driver programs the HW to coalesce these > credit returns to minimize the FW management load, and relies on skb_orphan() > to free up space in the app'send buffer. skbs are freed on credit return > receptions. > It does not work for the VMs, skb_orphan() won't free up virtualized app'send > buffer. > > The attached patch provides a much more aggressive credit return policy, and > has solved our perf issues on other virtualized platforms. > > Cheers, > Divy Divy, thanks for the quick response. I think this could be a nice solution, but I'm curious what the impact would be if we just removed the dependency on CONFIG_XEN and made those changes permanent. What will be the drop in performance when using a baremetal kernel? And what about using KVM? This would certainly still be a problem in that environment. Andy, it shouldn't be a problem for KVM because KVM doesn't do per-page tracking which is a Xen-specific hack. (In reply to comment #13) > Andy, it shouldn't be a problem for KVM because KVM doesn't do per-page > tracking which is a Xen-specific hack. Good to know, Herbert. /me needs some Xen and KVM lessons. :-) Hi Andy, Making these changes permanent would have an impact on performance on bare metal kernels. Instead of receiving a control packet returning coalesced credit returns, you'll have one per sent packet. More pressure on the pci bus, on the FW. It is better to not use this configuration if you do not need it. I'll ask our QA team to start testing KVM on RHEL5.4. I also need to get up to speed on KVM. Cheers, Divy FWIW I'm experimenting with a new TX interrupt mitigation mechanism that will hopefully resolve this problem without creating a different path for Xen. Herbert, should we go with Divy's patch or wait for your stuff to be complete? Do you have a BZ for your work? Just like it should not be a problem for KVM it shouldn't be a problem also for PCI passthrough to Xen HVM guests. However, I have no idea about PCI passthrough to PV guests. Those run under CONFIG_XEN so they would use the fix. Herbert/Divy, would they need it? My guess is "yes", but I'd like a confirmation. PV passthrough doesn't need the fix but I think it should still work, albeit with the same effect as if you'd applied the fix to a normal kernel. Divy, can you confirm? Yes, the fix does not change driver overall behavior, just request the HW to indicate TX completions more often than otherwise needed. I think I'll make the change only for dom0 then. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Do we know if the bad performance is also visible under KVM? Herbert mentioned it is not a problem in comment #13. Still some numbers would be nice to have... in kernel-2.6.18-223.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Hi, Mark As we do not have a Chelsio 10GbE card on hand, would you please help to verify this bug if convenient, thanks a lot:) Lei Wang I can't help at this point in time. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |