Hide Forgot
Investigate if vxlan only processes packets on one CPU. Different flows should actually use different flows and get send out via different tx queues. If this is not the case we need to fix this.
Short summary: with vxlan offload, vxlan processing works correctly, without h/w offload the single cpu issue is unsolvable. On h/w with vxlan offload, vxlan processing is correctly spread across the available cpus. If the nic lacks vxlan offload, than the rx-hash by default ignores the udp ports and all vxlan flows collide on the same CPU (the IPs in the external header are the same in all flows). To avoid such collision, the following setting could be used: ethtool -N <interface> rx-flow-hash udp4 sdfn but the above will introduce reordering if the external udp datagram is fragmented, and that may break existing applications. Moreover without vxlan offloading we lose: * LRO/GRO because we don't get CHECKSUM_PARTIAL frames or the depth we require the checksumming logic to look into the packet is not deep enough (outer frames don't have checksum by default) * without CHECKSUM_PARTIAL no use of LCO (local checksum offload) * sending checksum offload also not possible, because of the lack of CHECKSUM_PARTIAL in most hardware We certainly should only consider either new networking cards with CHECKSUM_PARTIAL or we depend on vxlan (or later, geneve) offloading.