Bug 1466133
| Summary: | router pod cannot be running when set the stats-port to 0 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> | 
| Component: | Networking | Assignee: | Phil Cameron <pcameron> | 
| Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> | 
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, bbennett, bmeng, ccoleman, zzhao | 
| Version: | 3.6.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 3.7.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | 
       Cause: no doc changes needed
Consequence: 
Fix: 
Result: 
 | 
        
        
        
        Story Points: | --- | 
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-28 21:59:33 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
ROUTER_METRICS_TYPE=haproxy and statsPort=0 are not supported together. I'll fix that, not a release blocker. Per Clayton: The template router command line validation should reject these options. Closed docs PR 5446 Revised PR 16621 to provide valid listening port. Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/54ec92533ad37ae99887e9794b60da0391e529e3 Router stats-port=0 error stats-port=0 properly disables statistics. Fixes bug: 1466133 https://bugzilla.redhat.com/show_bug.cgi?id=1466133 https://github.com/openshift/origin/commit/0c514c9ec63f59a138c0a05ea2e011b0b7496953 Merge pull request #16621 from pecameron/bz1466133 Automatic merge from submit-queue. Router stats-port=0 error When type=haproxy-router, stats-port must not be 0. Fixes bug: 1466133 https://bugzilla.redhat.com/show_bug.cgi?id=1466133 Tested this bug on  v3.7.0-0.143.2
it still can be reproduced with same error:
urstable
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----					-------------		--------	------			-------
  8m		8m		2	default-scheduler						Warning		FailedScheduling	No nodes are available that match all of the following predicates:: PodFitsHostPorts (1).
  8m		8m		1	default-scheduler						Normal		Scheduled		Successfully assigned rotuer2-2-z6gtz to ip-172-18-13-227.ec2.internal
  8m		8m		1	kubelet, ip-172-18-13-227.ec2.internal				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "server-certificate" 
  8m		8m		1	kubelet, ip-172-18-13-227.ec2.internal				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "router-token-cp848" 
  8m		7m		3	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Pulled			Container image "registry.ops.openshift.com/openshift3/ose-haproxy-router:v3.7.0-0.143.2" already present on machine
  7m		7m		2	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Killing			Killing container with id docker://router:pod "rotuer2-2-z6gtz_default(8461a823-ae4a-11e7-ab8a-0e432c832c92)" container "router" is unhealthy, it will be killed and re-created.
  8m		6m		3	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Created			Created container
  8m		6m		3	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Started			Started container
  7m		6m		6	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Warning		Unhealthy		Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  7m		6m		6	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Warning		Unhealthy		Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  5m		2m		10	kubelet, ip-172-18-13-227.ec2.internal				Warning		FailedSync		Error syncing pod
[root@ip-172-18-7-84 ~]# oc version
oc v3.7.0-0.143.2
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO
    Please verify that the router pod with commit: https://github.com/openshift/origin/commit/54ec92533ad37ae99887e9794b60da0391e529e3 is running. We need to track down what is going on with this. This works on my cluster. - name: STATS_PASSWORD value: sLzdR6SgDJ - name: STATS_PORT value: "0" - name: STATS_USERNAME value: admin # oc rsh router-ab-17-sxwgb env | grep -e STAT -e LISTEN STATS_PASSWORD=sLzdR6SgDJ STATS_PORT=0 STATS_USERNAME=admin # oc logs router-ab-17-sxwgb I1011 12:37:50.742015 1 template.go:246] Starting template router (v3.7.0-alpha.1+5d7f1b8-859-dirty) I1011 12:37:51.346813 1 router.go:441] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I1011 12:37:51.346849 1 router.go:230] Router is including routes in all namespaces E1011 12:37:51.549196 1 router_controller.go:174] route route-secure already exposes www.example.com and is older E1011 12:37:51.576546 1 router_controller.go:174] route route-secure already exposes www.example.com and is older I1011 12:37:51.610080 1 router.go:441] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). Changed to modified, merged 16621 Sat Oct 7 16:59:05 2017 -0700 In OSE v3.7.0-0.145.0 Thanks for your comment, I will retest this bug once v3.7.0-0.145.0 is came out. Verified this bug on v3.7.0-0.153.0, it works well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188  | 
Description of problem: from the help info: --stats-port=1936: If the underlying router implementation can provide statistics this is a hint to expose it on this port. Specify 0 if you want to turn off exposing the statistics. When set the stats-port to 0. the router cannot be running with error: Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused Version-Release number of selected component (if applicable): openshift v3.6.126.1 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 How reproducible: always Steps to Reproduce: 1. oadm router router2 --stats-port=0 2. oc describe pod routerxxx 3. Actual results: step 2: ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR: ROUTER_LISTEN_ADDR: 0.0.0.0:0 ROUTER_METRICS_TYPE: haproxy ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: router2 ROUTER_SERVICE_NAMESPACE: default ROUTER_SUBDOMAIN: STATS_PASSWORD: RYWpLfG6SR STATS_PORT: 0 STATS_USERNAME: admin Mounts: /etc/pki/tls/private from server-certificate (ro) /var/run/secrets/kubernetes.io/serviceaccount from router-token-2vntm (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: router2-certs Optional: false router-token-2vntm: Type: Secret (a volume populated by a Secret) SecretName: router-token-2vntm Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 7m 6m 3 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: PodFitsHostPorts (1). 6m 6m 1 default-scheduler Normal Scheduled Successfully assigned router2-2-2jwx5 to host-8-174-59.host.centralci.eng.rdu2.redhat.com 6m 1m 7 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Normal Pulled Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-haproxy-router:v3.6.126.1" already present on machine 6m 1m 7 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Normal Created Created container 6m 1m 7 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Normal Started Started container 6m 1m 15 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Warning Unhealthy Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused 6m 1m 15 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Warning Unhealthy Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused 6m 1m 7 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Normal Killing Killing container with id docker://router:pod "router2-2-2jwx5_default(2298d32b-5c78-11e7-9c5a-fa163ed597dc)" container "router" is unhealthy, it will be killed and re-created. 4m 5s 18 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com spec.containers{router} Warning BackOff Back-off restarting failed container 4m 5s 18 kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com Warning FailedSync Error syncing pod Expected results: router should can be running. Additional info: