Bug 1314111 - vxlan processing occurs on one cpu only
vxlan processing occurs on one cpu only
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel (Show other bugs)
7.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Paolo Abeni
Network QE
:
Depends On:
Blocks: 1323132
  Show dependency treegraph
 
Reported: 2016-03-02 19:04 EST by Hannes Frederic Sowa
Modified: 2016-04-01 07:03 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-30 11:19:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Hannes Frederic Sowa 2016-03-02 19:04:28 EST
Investigate if vxlan only processes packets on one CPU. Different flows should actually use different flows and get send out via different tx queues. If this is not the case we need to fix this.
Comment 3 Paolo Abeni 2016-03-30 11:19:48 EDT
Short summary: with vxlan offload, vxlan processing works correctly, without h/w offload the single cpu issue is unsolvable.

On h/w with vxlan offload, vxlan processing is correctly spread across the available cpus.

If the nic lacks vxlan offload, than the rx-hash by default ignores the udp ports and all vxlan flows collide on the same CPU (the IPs in the external header are the same in all flows).

To avoid such collision, the following setting could be used:

ethtool -N <interface> rx-flow-hash udp4 sdfn

but the above will introduce reordering if the external udp datagram is fragmented, and that may break existing applications.

Moreover without vxlan offloading we lose:

* LRO/GRO because we don't get CHECKSUM_PARTIAL frames or the depth we 
require the checksumming logic to look into the packet is not deep 
enough (outer frames don't have checksum by default)

* without CHECKSUM_PARTIAL no use of LCO (local checksum offload)

* sending checksum offload also not possible, because of the lack of 
CHECKSUM_PARTIAL in most hardware

We certainly should only consider either new networking cards with 
CHECKSUM_PARTIAL or we depend on vxlan (or later, geneve) offloading.

Note You need to log in before you can comment on or make changes to this bug.