Bug 1378000
Summary: | [RHEL73] The Pod with QoS setting cannot reach outside network on RHEL7.3 | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> | |
Component: | Networking | Assignee: | Dan Williams <dcbw> | |
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 3.3.0 | CC: | aloughla, aos-bugs, bbennett, eparis, hongli, jbrouer, jneedle, rkhan | |
Target Milestone: | --- | |||
Target Release: | 3.3.1 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: recent 7.3 beta kernels changed how traffic shaping is configured on network interfaces, exposing a bug in openshift-sdn's traffic shaping feature.
Consequence: when traffic shaping was enabled for a pod, no traffic could be send or received from the pod.
Fix: the openshift-sdn bug was fixed.
Result: traffic shaping functionality with openshift-sdn should now work correctly, though no customers have been impacted as the combination of beta kernels and openshift is unsupported.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1378697 1378698 (view as bug list) | Environment: | ||
Last Closed: | 2016-10-27 15:42:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1375561, 1378697, 1378698 |
Description
Hongan Li
2016-09-21 09:39:26 UTC
The OpenShift code hasn't changed since this feature was added and it worked at that time, so confirming that it worked on 7.2 would help narrow down the cause. Can we confirm whether or not this worked on RHEL 7.2? It works fine on RHEL 7.2. Root caused the problem, and it's a result of kernel changes due to https://bugzilla.redhat.com/show_bug.cgi?id=1152231. Pushed an origin PR to fix: https://github.com/openshift/origin/pull/11126 @dcbw please make sure this is also patched in ose-3.3. Not just origin/master. Thank you Dan! (In reply to hongli from comment #0) > / # ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 02:42:0A:01:00:05 > inet addr:10.1.0.5 Bcast:0.0.0.0 Mask:255.255.255.0 > inet6 addr: fe80::42:aff:fe01:5/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1410 Metric:1 > RX packets:2 errors:0 dropped:0 overruns:0 frame:0 > TX packets:26 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 ^^^^^^^^^^^^ > RX bytes:180 (180.0 B) TX bytes:1404 (1.3 KiB) (Thanks for providing ifconfig output) The important info from ifconfig is the txqueuelen==0. I guess the eth0 part of a veth pair. It is not clear what kernel qdisc is getting used? The short answer is, userspace MUST set tx_queue_len on an interface with QoS/qdisc, thus DCBW's commit/fix is the right solution: https://github.com/openshift/origin/pull/11126/commits/8a2fbcf4fd7530d79e2 I recommend to use that fix, to make openshift set a queue length. It have *always* been a misconfiguration to add a qdisc to an interface with txqueuelen==0 (ifconfig syntax). Some qdisc's (htb, fifo, gred, plug, sfb) the kernel had workarounds setting txqueuelen=1 (2 for HTB). These workarounds were actually quite bad, as things seems to work, but work poorly, because a queue of 1 packet is not sufficient. I'm actually happy that we backported the fix which removed[1] these workarounds because it exposed a problem like this, instead of having a semi-working solution. Thus, setting queue len is actually a fix for OpenShift regardless of the kernel used. According to the PR: https://github.com/openshift/origin/pull/11126 I changed the openshift-sdn-ovs manually, and the Pod with QoS settings works well now. test env for 3.3.1 is not ready yet, will verify it ASAP if env ready. verified in openshift 3.3.1.1 with RHEL 7.3 Beta, the bug has been fixed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:2084 |