Bug 1572699 - router liveness/readiness failed to check http://localhost:1936/healthz
Summary: router liveness/readiness failed to check http://localhost:1936/healthz
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.6.0
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 3.6.z
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-27 15:39 UTC by DeShuai Ma
Modified: 2018-04-27 17:58 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-27 17:58:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description DeShuai Ma 2018-04-27 15:39:11 UTC
Description of problem:
When verify a 3.6 hotfix, encounter router  
    Liveness:   http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3


Version-Release number of selected component (if applicable):
oc v3.6.112
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dma-master-nfs-1:8443
openshift v3.6.112
kubernetes v1.6.1+5115d708d7

How reproducible:
Always

Steps to Reproduce:
1. Deploy router and check router status
[root@dma-master-nfs-1 ~]# oc get po
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-4-rv7x9   1/1       Running   0          51m
router-1-deploy           0/1       Error     0          41m
router-2-deploy           0/1       Error     0          16m
router-3-1gnwb            0/1       Running   1          46s
router-3-deploy           1/1       Running   0          50s
[root@dma-master-nfs-1 ~]# oc logs router-3-1gnwb
I0427 15:07:11.148681       1 template.go:246] Starting template router (v3.6.173.0.112)
I0427 15:07:11.183136       1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1935
I0427 15:07:11.293812       1 router.go:554] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I0427 15:07:11.293867       1 router.go:240] Router is including routes in all namespaces
I0427 15:07:11.395950       1 router.go:554] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
[root@dma-master-nfs-1 ~]#
[root@dma-master-nfs-1 ~]# oc describe po router-3-1gnwb
Name:                   router-3-1gnwb
Namespace:              default
Security Policy:        hostnetwork
Node:                   dma-node-registry-router-1/10.1.2.5
Start Time:             Fri, 27 Apr 2018 15:06:28 +0000
Labels:                 deployment=router-3
                        deploymentconfig=router
                        router=router
Annotations:            kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"router-3","uid":"8c7ee964-4a2c-11e8-9f73-000d3a948cd7...
                        openshift.io/deployment-config.latest-version=3
                        openshift.io/deployment-config.name=router
                        openshift.io/deployment.name=router-3
                        openshift.io/scc=hostnetwork
Status:                 Running
IP:                     10.1.2.5
Controllers:            ReplicationController/router-3
Containers:
  router:
    Container ID:       docker://559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464
    Image:              registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6
    Image ID:           docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:5fb9607463b80b1e0bda1966f37f7b038399ab8188c38fee348fbeeb06ecee0d
    Ports:              80/TCP, 443/TCP, 1936/TCP, 1935/TCP
    State:              Running
      Started:          Fri, 27 Apr 2018 15:07:10 +0000
    Last State:         Terminated
      Reason:           Error
      Exit Code:        2
      Started:          Fri, 27 Apr 2018 15:06:32 +0000
      Finished:         Fri, 27 Apr 2018 15:07:08 +0000
    Ready:              False
    Restart Count:      1
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DEFAULT_CERTIFICATE_DIR:                  /etc/pki/tls/private
      ROUTER_CIPHERS:                          
      ROUTER_EXTERNAL_HOST_HOSTNAME:           
      ROUTER_EXTERNAL_HOST_HTTPS_VSERVER:      
      ROUTER_EXTERNAL_HOST_HTTP_VSERVER:       
      ROUTER_EXTERNAL_HOST_INSECURE:            false
      ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS:   
      ROUTER_EXTERNAL_HOST_PARTITION_PATH:     
      ROUTER_EXTERNAL_HOST_PASSWORD:           
      ROUTER_EXTERNAL_HOST_PRIVKEY:             /etc/secret-volume/router.pem
      ROUTER_EXTERNAL_HOST_USERNAME:           
      ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR:      
      ROUTER_LISTEN_ADDR:                       0.0.0.0:1935
      ROUTER_METRICS_TYPE:                      haproxy
      ROUTER_SERVICE_HTTPS_PORT:                443
      ROUTER_SERVICE_HTTP_PORT:                 80
      ROUTER_SERVICE_NAME:                      router
      ROUTER_SERVICE_NAMESPACE:                 default
      ROUTER_SUBDOMAIN:                        
      STATS_PASSWORD:                           EloFJXtMXO
      STATS_PORT:                               1936
      STATS_USERNAME:                           admin
    Mounts:
      /etc/pki/tls/private from server-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-jtk8f (ro)
Conditions:
  Type          Status
  Initialized   True
  Ready         False
  PodScheduled  True
Volumes:
  server-certificate:
    Type:       Secret (a volume populated by a Secret)
    SecretName: router-certs
    Optional:   false
  router-token-jtk8f:
    Type:       Secret (a volume populated by a Secret)
    SecretName: router-token-jtk8f
    Optional:   false
QoS Class:      Burstable
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath           Type            Reason          Message
  ---------     --------        -----   ----                                    -------------           --------        ------          -------
  1m            1m              1       default-scheduler                                               Normal          Scheduled       Successfully assigned router-3-1gnwb to dma-node-registry-router-1
  1m            1m              1       kubelet, dma-node-registry-router-1     spec.containers{router} Normal          Created         Created container with id 68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c
  1m            1m              1       kubelet, dma-node-registry-router-1     spec.containers{router} Normal          Started         Started container with id 68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c
  1m            25s             2       kubelet, dma-node-registry-router-1     spec.containers{router} Normal          Pulled          Container image "registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6" already present on machine
  25s           25s             1       kubelet, dma-node-registry-router-1     spec.containers{router} Normal          Killing         Killing container with id docker://68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c:pod "router-3-1gnwb_default(8f68dde8-4a2c-11e8-9f73-000d3a948cd7)" container "router" is unhealthy, it will be killed and re-created.
  23s           23s             1       kubelet, dma-node-registry-router-1     spec.containers{router} Normal          Created         Created container with id 559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464
  23s           23s             1       kubelet, dma-node-registry-router-1     spec.containers{router} Normal          Started         Started container with id 559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464
  45s           5s              4       kubelet, dma-node-registry-router-1     spec.containers{router} Warning         Unhealthy       Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused
  45s           5s              4       kubelet, dma-node-registry-router-1     spec.containers{router} Warning         Unhealthy       Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused

Actual results:


Expected results:


Additional info:

Comment 2 DeShuai Ma 2018-04-27 16:04:47 UTC
Deploy another and add oc logs
[root@dma-master-nfs-1 ~]# oc get po 
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-4-rv7x9   1/1       Running   0          1h
router-1-deploy           0/1       Error     0          1h
router-2-deploy           0/1       Error     0          1h
router-3-deploy           0/1       Error     0          57m
router-4-deploy           0/1       Error     0          44m
router-5-deploy           0/1       Error     0          33m
router-6-6nk5j            0/1       Running   1          45s
router-6-deploy           1/1       Running   0          50s
[root@dma-master-nfs-1 ~]# oc logs router-6-6nk5j
I0427 16:03:42.617428       1 template.go:246] Starting template router (v3.6.173.0.96)
I0427 16:03:42.647580       1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1935
I0427 16:03:42.844955       1 router.go:554] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I0427 16:03:42.845010       1 router.go:240] Router is including routes in all namespaces
I0427 16:03:43.047871       1 router.go:554] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).

Comment 3 DeShuai Ma 2018-04-27 16:30:36 UTC
After remove remove '::1         localhost localhost.localdomain localhost6 localhost6.localdomain6' from /etc/host ; Then restart master/node, redeploy router, still same error.

My os is rhel75, Is't related?

//no node:
[root@dma-node-registry-router-1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

//master node
[root@dma-master-nfs-1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

//rsh into router container check router
[root@dma-master-nfs-1 ~]# oc rsh router-9-p6nkp
sh-4.2$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
sh-4.2$ exit
exit
[root@dma-master-nfs-1 ~]# oc get po 
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-4-rv7x9   1/1       Running   0          2h
router-1-deploy           0/1       Error     0          2h
router-2-deploy           0/1       Error     0          1h
router-3-deploy           0/1       Error     0          1h
router-4-deploy           0/1       Error     0          1h
router-5-deploy           0/1       Error     0          56m
router-9-deploy           1/1       Running   0          56s
router-9-p6nkp            0/1       Running   1          51s
[root@dma-master-nfs-1 ~]# oc describe po router-9-p6nkp
Name:			router-9-p6nkp
Namespace:		default
Security Policy:	hostnetwork
Node:			dma-master-nfs-1/10.1.2.4
Start Time:		Fri, 27 Apr 2018 16:26:01 +0000
Labels:			deployment=router-9
			deploymentconfig=router
			router=router
Annotations:		kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"router-9","uid":"a965057d-4a37-11e8-bc04-000d3a948cd7...
			openshift.io/deployment-config.latest-version=9
			openshift.io/deployment-config.name=router
			openshift.io/deployment.name=router-9
			openshift.io/scc=hostnetwork
Status:			Running
IP:			10.1.2.4
Controllers:		ReplicationController/router-9
Containers:
  router:
    Container ID:	docker://4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d
    Image:		registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6.173.0.96
    Image ID:		docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:d08bfa25f21c74a21d0cd14b96c538a8dc8cec0a69b6df1eecac2bdb0b6d8d44
    Ports:		80/TCP, 443/TCP, 1936/TCP, 1935/TCP
    State:		Running
      Started:		Fri, 27 Apr 2018 16:26:43 +0000
    Last State:		Terminated
      Reason:		Error
      Exit Code:	2
      Started:		Fri, 27 Apr 2018 16:26:05 +0000
      Finished:		Fri, 27 Apr 2018 16:26:41 +0000
    Ready:		False
    Restart Count:	1
    Requests:
      cpu:	100m
      memory:	256Mi
    Liveness:	http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:	http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DEFAULT_CERTIFICATE_DIR:			/etc/pki/tls/private
      ROUTER_CIPHERS:				
      ROUTER_EXTERNAL_HOST_HOSTNAME:		
      ROUTER_EXTERNAL_HOST_HTTPS_VSERVER:	
      ROUTER_EXTERNAL_HOST_HTTP_VSERVER:	
      ROUTER_EXTERNAL_HOST_INSECURE:		false
      ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS:	
      ROUTER_EXTERNAL_HOST_PARTITION_PATH:	
      ROUTER_EXTERNAL_HOST_PASSWORD:		
      ROUTER_EXTERNAL_HOST_PRIVKEY:		/etc/secret-volume/router.pem
      ROUTER_EXTERNAL_HOST_USERNAME:		
      ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR:	
      ROUTER_LISTEN_ADDR:			0.0.0.0:1935
      ROUTER_METRICS_TYPE:			haproxy
      ROUTER_SERVICE_HTTPS_PORT:		443
      ROUTER_SERVICE_HTTP_PORT:			80
      ROUTER_SERVICE_NAME:			router
      ROUTER_SERVICE_NAMESPACE:			default
      ROUTER_SUBDOMAIN:				
      STATS_PASSWORD:				EloFJXtMXO
      STATS_PORT:				1936
      STATS_USERNAME:				admin
    Mounts:
      /etc/pki/tls/private from server-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-jtk8f (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  server-certificate:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-certs
    Optional:	false
  router-token-jtk8f:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	router-token-jtk8f
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath		Type		Reason		Message
  ---------	--------	-----	----				-------------		--------	------		-------
  1m		1m		1	default-scheduler					Normal		Scheduled	Successfully assigned router-9-p6nkp to dma-master-nfs-1
  59s		59s		1	kubelet, dma-master-nfs-1	spec.containers{router}	Normal		Created		Created container with id 33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64
  59s		59s		1	kubelet, dma-master-nfs-1	spec.containers{router}	Normal		Started		Started container with id 33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64
  1m		23s		2	kubelet, dma-master-nfs-1	spec.containers{router}	Normal		Pulled		Container image "registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6.173.0.96" already present on machine
  23s		23s		1	kubelet, dma-master-nfs-1	spec.containers{router}	Normal		Killing		Killing container with id docker://33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64:pod "router-9-p6nkp_default(ac5e9e29-4a37-11e8-bc04-000d3a948cd7)" container "router" is unhealthy, it will be killed and re-created.
  22s		22s		1	kubelet, dma-master-nfs-1	spec.containers{router}	Normal		Created		Created container with id 4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d
  21s		21s		1	kubelet, dma-master-nfs-1	spec.containers{router}	Normal		Started		Started container with id 4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d
  43s		3s		4	kubelet, dma-master-nfs-1	spec.containers{router}	Warning		Unhealthy	Liveness probe failed: Get http://localhost:1936/healthz: dial tcp 127.0.0.1:1936: getsockopt: connection refused
  43s		3s		4	kubelet, dma-master-nfs-1	spec.containers{router}	Warning		Unhealthy	Readiness probe failed: Get http://localhost:1936/healthz: dial tcp 127.0.0.1:1936: getsockopt: connection refused

Comment 4 Ben Bennett 2018-04-27 17:58:03 UTC
The problem was that they had built the wrong version.  v3.6.112 was built, but v3.6.173.0.113-1 is the latest.


Note You need to log in before you can comment on or make changes to this bug.