Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1378000 - [RHEL73] The Pod with QoS setting cannot reach outside network on RHEL7.3
[RHEL73] The Pod with QoS setting cannot reach outside network on RHEL7.3
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.3.1
Assigned To: Dan Williams
Meng Bo
:
Depends On:
Blocks: 1375561 1378697 1378698
  Show dependency treegraph
 
Reported: 2016-09-21 05:39 EDT by hongli
Modified: 2016-10-27 11:42 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: recent 7.3 beta kernels changed how traffic shaping is configured on network interfaces, exposing a bug in openshift-sdn's traffic shaping feature. Consequence: when traffic shaping was enabled for a pod, no traffic could be send or received from the pod. Fix: the openshift-sdn bug was fixed. Result: traffic shaping functionality with openshift-sdn should now work correctly, though no customers have been impacted as the combination of beta kernels and openshift is unsupported.
Story Points: ---
Clone Of:
: 1378697 1378698 (view as bug list)
Environment:
Last Closed: 2016-10-27 11:42:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2084 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3.1.3 bug fix update 2016-10-27 15:41:25 EDT

  None (edit)
Description hongli 2016-09-21 05:39:26 EDT
Description of problem:
The Pod with QoS setting cannot reach outside network on RHEL7.3


Version-Release number of selected component (if applicable):
oc v3.3.0.31
kubernetes v1.3.0+52492b4

ovs-vsctl (Open vSwitch) 2.4.0
Compiled Mar 22 2016 08:42:47
DB Schema 7.12.1

Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)


How reproducible:
always

Steps to Reproduce:
1. create iperf pod with QoS setting (see Additional info for iperf.yaml file)
2. oc rsh iperf
3. ifconfig eth0 and ping 10.1.0.1

Actual results:
/ # ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 02:42:0A:01:00:05  
          inet addr:10.1.0.5  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::42:aff:fe01:5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:180 (180.0 B)  TX bytes:1404 (1.3 KiB)

/ # 
/ # ping 10.1.0.1
PING 10.1.0.1 (10.1.0.1): 56 data bytes
^C
--- 10.1.0.1 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet loss


Expected results:
The Pod with QoS setting can reach outside network

Additional info:
1. no issue found if create pod without QoS setting
2. iperf.yaml
apiVersion: v1
kind: Pod
metadata:
  name: iperf
  annotations:
    kubernetes.io/ingress-bandwidth: 3M
    kubernetes.io/egress-bandwidth: 2M
spec:
  containers:
  - name: iperf
    image: yadu/hello-openshift-iperf
Comment 1 Dan Williams 2016-09-23 14:00:30 EDT
The OpenShift code hasn't changed since this feature was added and it worked at that time, so confirming that it worked on 7.2 would help narrow down the cause.

Can we confirm whether or not this worked on RHEL 7.2?
Comment 2 hongli 2016-09-25 21:40:23 EDT
It works fine on RHEL 7.2.
Comment 3 Dan Williams 2016-09-27 16:16:58 EDT
Root caused the problem, and it's a result of kernel changes due to https://bugzilla.redhat.com/show_bug.cgi?id=1152231.

Pushed an origin PR to fix: https://github.com/openshift/origin/pull/11126
Comment 4 Eric Paris 2016-09-27 23:42:48 EDT
@dcbw please make sure this is also patched in ose-3.3. Not just origin/master. Thank you Dan!
Comment 5 Jesper Brouer 2016-09-28 04:29:20 EDT
(In reply to hongli from comment #0)
> / # ifconfig eth0
> eth0      Link encap:Ethernet  HWaddr 02:42:0A:01:00:05  
>           inet addr:10.1.0.5  Bcast:0.0.0.0  Mask:255.255.255.0
>           inet6 addr: fe80::42:aff:fe01:5/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1410  Metric:1
>           RX packets:2 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
                         ^^^^^^^^^^^^
>           RX bytes:180 (180.0 B)  TX bytes:1404 (1.3 KiB)

(Thanks for providing ifconfig output)

The important info from ifconfig is the txqueuelen==0.

I guess the eth0 part of a veth pair.

It is not clear what kernel qdisc is getting used?
Comment 6 Jesper Brouer 2016-09-28 04:31:09 EDT
The short answer is, userspace MUST set tx_queue_len on an interface with QoS/qdisc, thus DCBW's commit/fix is the right solution:

 https://github.com/openshift/origin/pull/11126/commits/8a2fbcf4fd7530d79e2

I recommend to use that fix, to make openshift set a queue length.


It have *always* been a misconfiguration to add a qdisc to an interface with txqueuelen==0 (ifconfig syntax).  Some qdisc's (htb, fifo, gred, plug, sfb) the kernel had workarounds setting txqueuelen=1 (2 for HTB). These workarounds were actually quite bad, as things seems to work, but work poorly, because a queue of 1 packet is not sufficient.  I'm actually happy that we backported the fix which removed[1] these workarounds because it exposed a problem like this, instead of having a semi-working solution.  Thus, setting queue len is actually a fix for OpenShift regardless of the kernel used.
Comment 7 hongli 2016-09-28 05:40:48 EDT
According to the PR: https://github.com/openshift/origin/pull/11126

I changed the openshift-sdn-ovs manually, and the Pod with QoS settings works well now.
Comment 10 hongli 2016-10-09 21:46:47 EDT
test env for 3.3.1 is not ready yet, will verify it ASAP if env ready.
Comment 11 hongli 2016-10-11 01:46:02 EDT
verified in openshift 3.3.1.1 with RHEL 7.3 Beta, the bug has been fixed
Comment 13 errata-xmlrpc 2016-10-27 11:42:23 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2084

Note You need to log in before you can comment on or make changes to this bug.