| Summary: | Asymmetric instance bandwidth using Floating IP w/ tunneling (GRE/VXLAN) through Neutron | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Brent Holden <bholden> | |
| Component: | openstack-neutron | Assignee: | Bob Kukura <rkukura> | |
| Status: | CLOSED CANTFIX | QA Contact: | Ofer Blaut <oblaut> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.0 | CC: | bholden, breeler, chrisw, hateya, jtaleric, lpeer, perfbz, rkukura, twilson, yeylon | |
| Target Milestone: | --- | |||
| Target Release: | 4.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Known Issue | ||
| Doc Text: |
When “generic receive offload” (GRO) is enabled while using GRE or VXLAN tunneling, inbound bandwidth available to instances from an external network using a OpenStack Networking router is extremely low.
Workaround: Disable GRO offloading on the network node where the l3-agent runs by adding the following line to /etc/sysconfig/network-scripts/ifcfg-ethX:
ETHTOOL_OPTS="-K ethX gro off"
where ethX is the network interface device used for the external network. Either reboot or run "ifdown ethX; ifup ethX" for the setting to take effect.
This will provide more symmetric bandwidth and faster inbound data flow.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1042507 (view as bug list) | Environment: | ||
| Last Closed: | 2013-12-08 20:39:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
|
Description
Brent Holden
2013-11-21 19:17:33 UTC
From my latest email to RHOS-TECH on this:
Here is a packet capture of the checksum issue on my environment
14:05:47.706700 IP (tos 0x0, ttl 64, id 11789, offset 0, flags [DF], proto GRE (47), length 1442)
192.168.3.200 > 192.168.3.202: GREv0, Flags [key present], key=0x1, length 1422
IP (tos 0x8, ttl 63, id 11789, offset 0, flags [DF], proto TCP (6), length 1400)
22.16.1.9.ssh > 10.0.0.6.46691: Flags [.], cksum 0x2689 (incorrect -> 0x0444), seq 16401:17749, ack 320, win 188, options [nop,nop,TS val 2375193 ecr 2421340], length 1348
Netperf going from the Guest to a physical machine:
[root@mongo1 ~]# netperf -4 -l 60 -H 22.16.1.9 -T1,1 -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.16.1.9 () port 0 AF_INET : demo : cpu bind
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 1024 60.01 3067.70
Netperf going from a physical machine to the Guest:
[root@sandyone ~]# netperf -4 -l 60 -H 22.16.1.3 -T1,1 -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.16.1.3 () port 0 AF_INET : demo : cpu bind
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 1024 64.00 0.06
Going from the Neutron Host to the Guest (where BR-EX w/ eth5 attached is):
[root@athos ~]# netperf -4 -l 60 -H 22.16.1.3 -T1,1 -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.16.1.3 () port 0 AF_INET : demo : cpu bind
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 1024 60.01 1907.41
I linked an upstream launchpad bug that may be related. That bug hasn't been completely resolved yet, but suggests that having GRO offloading enabled could be at least part of the problem. The email thread at http://lists.openstack.org/pipermail/openstack/2013-October/thread.html#1778 and http://lists.openstack.org/pipermail/openstack/2013-November/thread.html#2705 is also related to this upstream bug. On the chance that this issue is related to GRO offloading, please try running "ethtool -k eth5 on the network node, and if that shows "generic-receive-offload: on", try turning it off with "ethtool -K eth5 gro off", and see if that helps. Brent indicated in email that disabling GRO resolved the issue. Brent, can you update the bug with any relevant details? BobK I also re-tested with the GRO disabled, here are my results: External machine to Guest (where the issue existed): [root@sandyone ~]# netperf -4 -H 22.16.1.5 -l 60 -- -m 1024 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.16.1.5 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 1024 60.01 2405.41 [root@sandyone ~]# Guest to external machine: -bash-4.1# netperf -4 -H 22.16.1.9 -l 60 -- -m 1024 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.16.1.9 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 1024 60.01 3423.49 -bash-4.1# Thanks Joe. Still not quite symmetric, but much better. Looks like the outbound bandwidth improved a bit too, but not sure if this is significant. Disabling GRO everywhere (not just the network node) might provide further improvement. Not sure if there are any disadvantages to this. Concluding that this is a kernel bug, and we need to document disabling GRO when using GRE with OpenStack until it is fixed. Adding a documentation flag and leaving this bug on Neutron to capture and document the issue. Bob, did you report the relevant bug on the kernel? NEEDINFO Bob Kukura Could you please supply a command (or reference to explanatory text) in the Doc Text's workaraound, to show the user how to disable GRO offloading on the network node? Thanks I've cloned this as BZ 1042507 against the kernel. Livnat, shouldn't we keep this open to track eventually verifying the kernel fix with OpenStack? I've updated the wording of the doc text slightly, and added the command to disable GRO. This has been reproduced using both provider external networks and bridge-based (br-ex) external networks. Updated doc text workaround to persistently turn off GRO by adding: ETHTOOL_OPTS="-K ethX gro off" to /etc/sysconfig/network-scripts/ifcfg-ethX. (In reply to Bob Kukura from comment #9) > I've cloned this as BZ 1042507 against the kernel. > > Livnat, shouldn't we keep this open to track eventually verifying the kernel > fix with OpenStack? > I don't think we need to, I added a comment on the Kernel bug to ask Joe or Ofer to verify also in the context of Neutron. |