Bug 488882
| Summary: | cxgb3 driver very slow under Xen with HW acceleration enabled | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Mark Wagner <mwagner> | ||||
| Component: | kernel-xen | Assignee: | Paolo Bonzini <pbonzini> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 5.3 | CC: | agospoda, clalance, divy, herbert.xu, indranil, leiwang, mjenner, mwagner, pbonzini, peterm, xen-maint, yuzhang | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-01-13 20:46:36 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 514490 | ||||||
| Attachments: |
|
||||||
|
Description
Mark Wagner
2009-03-06 02:37:59 UTC
Any statistics about drops or invalid checksums after this run? I'm guessing that if you look at the stats for the device before and after the run you would see a lot more than 87380 bytes received. Do you have this setup somewhere where I can take a look? A before and after
[root@perf10 ~]# ethtool -k peth3
Offload parameters for peth3:
Cannot get device rx csum settings: No such device
Cannot get device tx csum settings: No such device
Cannot get device scatter-gather settings: No such device
Cannot get device tcp segmentation offload settings: No such device
Cannot get device udp large send offload settings: No such device
Cannot get device generic segmentation offload settings: No such device
no offload info available
[root@perf10 ~]# ethtool -k peth2
Offload parameters for peth2:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
[root@perf10 ~]# ethtool -S peth2
NIC statistics:
TxOctetsOK : 134044256554
TxFramesOK : 107255105
TxMulticastFramesOK: 262
TxBroadcastFramesOK: 56
TxPauseFrames : 0
TxUnderrun : 0
TxExtUnderrun : 0
TxFrames64 : 87
TxFrames65To127 : 19544570
TxFrames128To255 : 17101
TxFrames256To511 : 59350
TxFrames512To1023 : 427272
TxFrames1024To1518 : 87206725
TxFrames1519ToMax : 0
RxOctetsOK : 79921913968
RxFramesOK : 79557155
RxMulticastFramesOK: 35673
RxBroadcastFramesOK: 1184
RxPauseFrames : 0
RxFCSErrors : 0
RxSymbolErrors : 0
RxShortErrors : 0
RxJabberErrors : 0
RxLengthErrors : 0
RxFIFOoverflow : 0
RxFrames64 : 91
RxFrames65To127 : 28118586
RxFrames128To255 : 1410
RxFrames256To511 : 5334
RxFrames512To1023 : 175308
RxFrames1024To1518 : 51256426
RxFrames1519ToMax : 0
PhyFIFOErrors : 0
TSO : 7374
VLANextractions : 0
VLANinsertions : 0
TxCsumOffload : 7707
RxCsumGood : 79520496
LroAggregated : 0
LroFlushed : 0
LroNoDesc : 0
RxDrops : 0
CheckTXEnToggled : 0
CheckResets : 0
[root@perf10 ~]# ethtool -K peth2 tx on sg on tso on
Jump on guest, run netperf
[root@dhcp47-134 np2.4]# ./netperf -l 15 -H 172.17.10.15
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.10.15 (172.17.10.15) port 0 AF_INET : spin interval : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 15.17 198.53
[root@perf10 ~]# ethtool -S peth2
NIC statistics:
TxOctetsOK : 134438894870
TxFramesOK : 107515094
TxMulticastFramesOK: 262
TxBroadcastFramesOK: 57
TxPauseFrames : 0
TxUnderrun : 0
TxExtUnderrun : 0
TxFrames64 : 88
TxFrames65To127 : 19544582
TxFrames128To255 : 17102
TxFrames256To511 : 59352
TxFrames512To1023 : 427275
TxFrames1024To1518 : 87466695
TxFrames1519ToMax : 0
RxOctetsOK : 79924285765
RxFramesOK : 79590973
RxMulticastFramesOK: 35726
RxBroadcastFramesOK: 1184
RxPauseFrames : 0
RxFCSErrors : 0
RxSymbolErrors : 0
RxShortErrors : 0
RxJabberErrors : 0
RxLengthErrors : 0
RxFIFOoverflow : 0
RxFrames64 : 93
RxFrames65To127 : 28152397
RxFrames128To255 : 1412
RxFrames256To511 : 5337
RxFrames512To1023 : 175308
RxFrames1024To1518 : 51256426
RxFrames1519ToMax : 0
PhyFIFOErrors : 0
TSO : 13380
VLANextractions : 0
VLANinsertions : 0
TxCsumOffload : 13875
RxCsumGood : 79554260
LroAggregated : 0
LroFlushed : 0
LroNoDesc : 0
RxDrops : 0
CheckTXEnToggled : 0
CheckResets : 0
Here is a diff of the two outputs from a second, similar run:
[root@perf10 ~]# diff -w bz1.txt bz2.txt
2,3c2,3
< TxOctetsOK : 134438894870
< TxFramesOK : 107515094
---
> TxOctetsOK : 134770288502
> TxFramesOK : 107733415
5c5
< TxBroadcastFramesOK: 57
---
> TxBroadcastFramesOK: 58
9,10c9,10
< TxFrames64 : 88
< TxFrames65To127 : 19544582
---
> TxFrames64 : 89
> TxFrames65To127 : 19544591
12,14c12,14
< TxFrames256To511 : 59352
< TxFrames512To1023 : 427275
< TxFrames1024To1518 : 87466695
---
> TxFrames256To511 : 59353
> TxFrames512To1023 : 427277
> TxFrames1024To1518 : 87685003
16,18c16,18
< RxOctetsOK : 79924299370
< RxFramesOK : 79591070
< RxMulticastFramesOK: 35817
---
> RxOctetsOK : 79926309990
> RxFramesOK : 79619767
> RxMulticastFramesOK: 35829
27,28c27,28
< RxFrames64 : 93
< RxFrames65To127 : 28152485
---
> RxFrames64 : 94
> RxFrames65To127 : 28181179
30c30
< RxFrames256To511 : 5343
---
> RxFrames256To511 : 5345
35c35
< TSO : 13380
---
> TSO : 18338
38,39c38,39
< TxCsumOffload : 13875
< RxCsumGood : 79554260
---
> TxCsumOffload : 18959
> RxCsumGood : 79582944
Mark, can you try just turning TSO off without turning tx checksum offload off too? Thanks! Herbert I tried with tso off and there were "spurts" of traffic but not a decent flow. The average throughput was less than half of what I was able to get with tx off as well. OK, please set up a machine with a Xen guest running and give me remote access so I can try to debug this. Thanks! I think that giving Herbert access to the machine fulfilled the need for the needinfo flag. Divy, have you done much testing with Xen? One opinion is that the way in which you free skbs (noted in the large comment in t3_eth_xmit) means that bursts like this will be expected when used in conjunction with virtualization (due to the limited number of pages available that need to be re-used quickly). Do you have any thoughts about trying to add a tx-completion interrupt to try and address this? Created attachment 350893 [details]
Tx credit return management for Dom0's Xen
Hi Andy,
We've not done much Xen testing in RHEL context. We however ship our driver in both Citrix'Xen server and VMware's ESX. We have hit such a performance degradation. We did not correlate it with tx hw assist, but we've root caused it. It points to the opinion you mention :)
In all the virtualized environments we have tested, the VM's app's send buffer frees up its load only when the hypervisor's driver frees the corresponding skb.
cxgb3 however does not free a TX skb on DMA completion.
The driver relies on FW generated credit returns posted on the receive control queues.
In non virtual environments, the driver programs the HW to coalesce these credit returns to minimize the FW management load, and relies on skb_orphan() to free up space in the app'send buffer. skbs are freed on credit return receptions.
It does not work for the VMs, skb_orphan() won't free up virtualized app'send buffer.
The attached patch provides a much more aggressive credit return policy, and has solved our perf issues on other virtualized platforms.
Cheers,
Divy
(In reply to comment #11) > > Hi Andy, > > We've not done much Xen testing in RHEL context. We however ship our driver in > both Citrix'Xen server and VMware's ESX. We have hit such a performance > degradation. We did not correlate it with tx hw assist, but we've root caused > it. It points to the opinion you mention :) > > In all the virtualized environments we have tested, the VM's app's send buffer > frees up its load only when the hypervisor's driver frees the corresponding > skb. > cxgb3 however does not free a TX skb on DMA completion. > The driver relies on FW generated credit returns posted on the receive control > queues. > > In non virtual environments, the driver programs the HW to coalesce these > credit returns to minimize the FW management load, and relies on skb_orphan() > to free up space in the app'send buffer. skbs are freed on credit return > receptions. > It does not work for the VMs, skb_orphan() won't free up virtualized app'send > buffer. > > The attached patch provides a much more aggressive credit return policy, and > has solved our perf issues on other virtualized platforms. > > Cheers, > Divy Divy, thanks for the quick response. I think this could be a nice solution, but I'm curious what the impact would be if we just removed the dependency on CONFIG_XEN and made those changes permanent. What will be the drop in performance when using a baremetal kernel? And what about using KVM? This would certainly still be a problem in that environment. Andy, it shouldn't be a problem for KVM because KVM doesn't do per-page tracking which is a Xen-specific hack. (In reply to comment #13) > Andy, it shouldn't be a problem for KVM because KVM doesn't do per-page > tracking which is a Xen-specific hack. Good to know, Herbert. /me needs some Xen and KVM lessons. :-) Hi Andy, Making these changes permanent would have an impact on performance on bare metal kernels. Instead of receiving a control packet returning coalesced credit returns, you'll have one per sent packet. More pressure on the pci bus, on the FW. It is better to not use this configuration if you do not need it. I'll ask our QA team to start testing KVM on RHEL5.4. I also need to get up to speed on KVM. Cheers, Divy FWIW I'm experimenting with a new TX interrupt mitigation mechanism that will hopefully resolve this problem without creating a different path for Xen. Herbert, should we go with Divy's patch or wait for your stuff to be complete? Do you have a BZ for your work? Just like it should not be a problem for KVM it shouldn't be a problem also for PCI passthrough to Xen HVM guests. However, I have no idea about PCI passthrough to PV guests. Those run under CONFIG_XEN so they would use the fix. Herbert/Divy, would they need it? My guess is "yes", but I'd like a confirmation. PV passthrough doesn't need the fix but I think it should still work, albeit with the same effect as if you'd applied the fix to a normal kernel. Divy, can you confirm? Yes, the fix does not change driver overall behavior, just request the HW to indicate TX completions more often than otherwise needed. I think I'll make the change only for dom0 then. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Do we know if the bad performance is also visible under KVM? Herbert mentioned it is not a problem in comment #13. Still some numbers would be nice to have... in kernel-2.6.18-223.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Hi, Mark As we do not have a Chelsio 10GbE card on hand, would you please help to verify this bug if convenient, thanks a lot:) Lei Wang I can't help at this point in time. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |