Bug 1466133

Summary: router pod cannot be running when set the stats-port to 0
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Phil Cameron <pcameron>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, bmeng, ccoleman, zzhao
Version: 3.6.0   
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Cause: no doc changes needed Consequence: Fix: Result:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 21:59:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhaozhanqi 2017-06-29 06:59:22 UTC
Description of problem:
from the help info:
      --stats-port=1936: If the underlying router implementation can provide statistics this is a hint to expose it on this port. Specify 0 if you want to turn off exposing the statistics.

When set the stats-port to 0. the router cannot be running with error:
Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused

Version-Release number of selected component (if applicable):
openshift v3.6.126.1
kubernetes v1.6.1+5115d708d7
etcd 3.2.0


How reproducible:
always

Steps to Reproduce:
1. oadm router router2 --stats-port=0
2. oc describe pod routerxxx
3.

Actual results:

step 2:
      ROUTER_EXTERNAL_HOST_PARTITION_PATH:	
      ROUTER_EXTERNAL_HOST_PASSWORD:		
      ROUTER_EXTERNAL_HOST_PRIVKEY:		/etc/secret-volume/router.pem
      ROUTER_EXTERNAL_HOST_USERNAME:		
      ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR:	
      ROUTER_LISTEN_ADDR:			0.0.0.0:0
      ROUTER_METRICS_TYPE:			haproxy
      ROUTER_SERVICE_HTTPS_PORT:		443
      ROUTER_SERVICE_HTTP_PORT:			80
      ROUTER_SERVICE_NAME:			router2
      ROUTER_SERVICE_NAMESPACE:			default
      ROUTER_SUBDOMAIN:				
      STATS_PASSWORD:				RYWpLfG6SR
      STATS_PORT:				0
      STATS_USERNAME:				admin
    Mounts:
      /etc/pki/tls/private from server-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-2vntm (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  server-certificate:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router2-certs
    Optional:	false
  router-token-2vntm:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-token-2vntm
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From								SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----								-------------		--------	------			-------
  7m		6m		3	default-scheduler									Warning		FailedScheduling	No nodes are available that match all of the following predicates:: PodFitsHostPorts (1).
  6m		6m		1	default-scheduler									Normal		Scheduled		Successfully assigned router2-2-2jwx5 to host-8-174-59.host.centralci.eng.rdu2.redhat.com
  6m		1m		7	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Normal		Pulled			Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-haproxy-router:v3.6.126.1" already present on machine
  6m		1m		7	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Normal		Created			Created container
  6m		1m		7	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Normal		Started			Started container
  6m		1m		15	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Warning		Unhealthy		Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  6m		1m		15	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Warning		Unhealthy		Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  6m		1m		7	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Normal		Killing			Killing container with id docker://router:pod "router2-2-2jwx5_default(2298d32b-5c78-11e7-9c5a-fa163ed597dc)" container "router" is unhealthy, it will be killed and re-created.
  4m		5s		18	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com	spec.containers{router}	Warning		BackOff			Back-off restarting failed container
  4m		5s		18	kubelet, host-8-174-59.host.centralci.eng.rdu2.redhat.com				Warning		FailedSync		Error syncing pod


Expected results:

router should can be running.

Additional info:

Comment 1 Clayton Coleman 2017-06-29 14:56:18 UTC
ROUTER_METRICS_TYPE=haproxy and statsPort=0 are not supported together.  I'll fix that, not a release blocker.

Comment 2 Ben Bennett 2017-09-15 20:14:17 UTC
Per Clayton: The template router command line validation should reject these options.

Comment 3 Phil Cameron 2017-09-29 19:48:49 UTC
PR 16621 
https://github.com/openshift/origin/pull/16621

Comment 4 Phil Cameron 2017-10-02 15:36:54 UTC
Docs PR 5446
https://github.com/openshift/openshift-docs/pull/5446

Comment 5 Phil Cameron 2017-10-02 20:32:47 UTC
Closed docs PR 5446
Revised PR 16621 to provide valid listening port.

Comment 6 openshift-github-bot 2017-10-07 23:59:09 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/54ec92533ad37ae99887e9794b60da0391e529e3
Router stats-port=0 error

stats-port=0 properly disables statistics.

Fixes bug: 1466133
https://bugzilla.redhat.com/show_bug.cgi?id=1466133

https://github.com/openshift/origin/commit/0c514c9ec63f59a138c0a05ea2e011b0b7496953
Merge pull request #16621 from pecameron/bz1466133

Automatic merge from submit-queue.

Router stats-port=0 error

When type=haproxy-router, stats-port must not be 0.

Fixes bug: 1466133
https://bugzilla.redhat.com/show_bug.cgi?id=1466133

Comment 7 zhaozhanqi 2017-10-11 06:34:24 UTC
Tested this bug on  v3.7.0-0.143.2

it still can be reproduced with same error:

urstable
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath		Type		Reason			Message
  ---------	--------	-----	----					-------------		--------	------			-------
  8m		8m		2	default-scheduler						Warning		FailedScheduling	No nodes are available that match all of the following predicates:: PodFitsHostPorts (1).
  8m		8m		1	default-scheduler						Normal		Scheduled		Successfully assigned rotuer2-2-z6gtz to ip-172-18-13-227.ec2.internal
  8m		8m		1	kubelet, ip-172-18-13-227.ec2.internal				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "server-certificate" 
  8m		8m		1	kubelet, ip-172-18-13-227.ec2.internal				Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "router-token-cp848" 
  8m		7m		3	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Pulled			Container image "registry.ops.openshift.com/openshift3/ose-haproxy-router:v3.7.0-0.143.2" already present on machine
  7m		7m		2	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Killing			Killing container with id docker://router:pod "rotuer2-2-z6gtz_default(8461a823-ae4a-11e7-ab8a-0e432c832c92)" container "router" is unhealthy, it will be killed and re-created.
  8m		6m		3	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Created			Created container
  8m		6m		3	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Normal		Started			Started container
  7m		6m		6	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Warning		Unhealthy		Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  7m		6m		6	kubelet, ip-172-18-13-227.ec2.internal	spec.containers{router}	Warning		Unhealthy		Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  5m		2m		10	kubelet, ip-172-18-13-227.ec2.internal				Warning		FailedSync		Error syncing pod
[root@ip-172-18-7-84 ~]# oc version
oc v3.7.0-0.143.2
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO

Comment 8 Phil Cameron 2017-10-11 12:43:41 UTC
Please verify that the router pod with commit:
https://github.com/openshift/origin/commit/54ec92533ad37ae99887e9794b60da0391e529e3
is running. We need to track down what is going on with this.


This works on my cluster.
        - name: STATS_PASSWORD
          value: sLzdR6SgDJ
        - name: STATS_PORT
          value: "0"
        - name: STATS_USERNAME
          value: admin

# oc rsh router-ab-17-sxwgb env | grep -e STAT -e LISTEN
STATS_PASSWORD=sLzdR6SgDJ
STATS_PORT=0
STATS_USERNAME=admin


# oc logs router-ab-17-sxwgb
I1011 12:37:50.742015       1 template.go:246] Starting template router (v3.7.0-alpha.1+5d7f1b8-859-dirty)
I1011 12:37:51.346813       1 router.go:441] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I1011 12:37:51.346849       1 router.go:230] Router is including routes in all namespaces
E1011 12:37:51.549196       1 router_controller.go:174] route route-secure already exposes www.example.com and is older
E1011 12:37:51.576546       1 router_controller.go:174] route route-secure already exposes www.example.com and is older
I1011 12:37:51.610080       1 router.go:441] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).

Comment 11 Phil Cameron 2017-10-12 15:01:23 UTC
Changed to modified, merged 16621 Sat Oct 7 16:59:05 2017 -0700
In OSE  v3.7.0-0.145.0

Comment 12 zhaozhanqi 2017-10-13 02:20:28 UTC
Thanks for your comment, I will retest this bug once v3.7.0-0.145.0 is came out.

Comment 13 zhaozhanqi 2017-10-16 07:04:35 UTC
Verified this bug on v3.7.0-0.153.0, it works well.

Comment 17 errata-xmlrpc 2017-11-28 21:59:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188