Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1430729 - [3.5] router fail to be deployed due to /healthz probe is blocked by iptables.
[3.5] router fail to be deployed due to /healthz probe is blocked by iptables.
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.5.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Phil Cameron
zhaozhanqi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-03-09 07:45 EST by Johnny Liu
Modified: 2018-06-28 03:02 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-04-12 15:14:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0884 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 18:50:07 EDT

  None (edit)
Description Johnny Liu 2017-03-09 07:45:02 EST
Description of problem:
Seem like https://github.com/openshift/ose/commit/612dc5117a96e262764c3b0e574ef224252413f7 introduce this bug, pls see the following details.

Version-Release number of selected component (if applicable):
atomic-openshift-3.5.0.49-1.git.0.c8e072a.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Launch instances on openstack
2. set up env via openshift-ansible installer
3. 

Actual results:
router is failed to be deployed.
# oc get pod
router-1-zs593             0/1       CrashLoopBackOff   35         1h

# oc describe po router-1-zs593
<--snip-->
  1h	4m	27	{kubelet openshift-133.xxx}	spec.containers{router}	Normal	Created		(events with common reason combined)
  1h	4m	27	{kubelet openshift-133.xxx}	spec.containers{router}	Normal	Started		(events with common reason combined)
  1h	4m	38	{kubelet openshift-133.xxx}	spec.containers{router}	Warning	Unhealthy	Liveness probe failed: dial tcp 10.14.6.133:1936: getsockopt: no route to host
  1h	4m	5	{kubelet openshift-133.xxx}	spec.containers{router}	Warning	Unhealthy	Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  1h	4m	27	{kubelet openshift-133.xxx}	spec.containers{router}	Normal	Killing		(events with common reason combined)
<--snip-->

# oc get hostsubnet
NAME                               HOST                               HOST IP       SUBNET
openshift-133.xxx   openshift-133.xxx   10.14.6.133   10.129.0.0/23
openshift-137.xxx   openshift-137.xxx   10.14.6.137   10.128.0.0/23

For openshift-133.xxx, its external IP is 10.14.6.133, its internal IP is 192.168.2.108.

# iptables -L -n|more
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-NODEPORT-NON-LOCAL  all  --  0.0.0.0/0            0.0.0.0/0            /* Ensure that non-local NodePort traffic can flow */
2    KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0           
3    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* traffic from docker */
4    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* traffic from SDN */
5    ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 4789 /* 001 vxlan incoming */
6    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
7    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
8    INPUT_direct  all  --  0.0.0.0/0            0.0.0.0/0           
9    INPUT_ZONES_SOURCE  all  --  0.0.0.0/0            0.0.0.0/0           
10   INPUT_ZONES  all  --  0.0.0.0/0            0.0.0.0/0           
11   DROP       all  --  0.0.0.0/0            0.0.0.0/0            ctstate INVALID
12   REJECT     all  --  0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

If delete the 12th line in the above iptables, 10.14.6.133:1936 will be accessible, router will be deployed successfully.

# oc get dc router -o yaml
<--snip-->
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 1936
          timeoutSeconds: 1
<--snip-->


Expected results:
router is deployed successfully

Additional info:
In 3.4, there is no such issue, because 3.4 haproxy router is using the following healthz probe:
<--snip-->
        livenessProbe:
          failureThreshold: 3
          httpGet:
            host: localhost
            path: /healthz
            port: 1936
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
<--snip-->

It is accessing localhost:1936 which will not be blocked by iptables.
Comment 1 Phil Cameron 2017-03-09 17:26:12 EST
Rolled back fix for 1405440
PR 13331
Comment 2 Phil Cameron 2017-03-13 09:42:08 EDT
PR 13331 MERGED
Comment 3 Phil Cameron 2017-03-13 09:46:14 EDT
bmeng@redhat.com
This a rollback of a fix that didn't work properly. The original resolution documented increasing maxconn=20000 to work around this problem.
Comment 4 Phil Cameron 2017-03-14 09:30:49 EDT
Pull request:
https://github.com/openshift/origin/pull/13331
Comment 5 Troy Dawson 2017-03-14 10:22:44 EDT
This has been merged into ocp and is in OCP v3.5.0.52 or newer.
Comment 7 zhaozhanqi 2017-03-14 22:29:44 EDT
Verified this bug on v3.5.0.52

the router pod works well

Check the Liveness and Readiness are using http-get by localhost:

    Liveness:		http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:		http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Comment 9 errata-xmlrpc 2017-04-12 15:14:44 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884

Note You need to log in before you can comment on or make changes to this bug.