Bug 1572699
| Summary: | router liveness/readiness failed to check http://localhost:1936/healthz | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | DeShuai Ma <dma> |
| Component: | Networking | Assignee: | Ben Bennett <bbennett> |
| Status: | CLOSED NOTABUG | QA Contact: | Meng Bo <bmeng> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.6.0 | CC: | aos-bugs, mifiedle |
| Target Milestone: | --- | ||
| Target Release: | 3.6.z | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-27 17:58:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Deploy another and add oc logs [root@dma-master-nfs-1 ~]# oc get po NAME READY STATUS RESTARTS AGE docker-registry-4-rv7x9 1/1 Running 0 1h router-1-deploy 0/1 Error 0 1h router-2-deploy 0/1 Error 0 1h router-3-deploy 0/1 Error 0 57m router-4-deploy 0/1 Error 0 44m router-5-deploy 0/1 Error 0 33m router-6-6nk5j 0/1 Running 1 45s router-6-deploy 1/1 Running 0 50s [root@dma-master-nfs-1 ~]# oc logs router-6-6nk5j I0427 16:03:42.617428 1 template.go:246] Starting template router (v3.6.173.0.96) I0427 16:03:42.647580 1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1935 I0427 16:03:42.844955 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I0427 16:03:42.845010 1 router.go:240] Router is including routes in all namespaces I0427 16:03:43.047871 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). After remove remove '::1 localhost localhost.localdomain localhost6 localhost6.localdomain6' from /etc/host ; Then restart master/node, redeploy router, still same error.
My os is rhel75, Is't related?
//no node:
[root@dma-node-registry-router-1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
//master node
[root@dma-master-nfs-1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
//rsh into router container check router
[root@dma-master-nfs-1 ~]# oc rsh router-9-p6nkp
sh-4.2$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
sh-4.2$ exit
exit
[root@dma-master-nfs-1 ~]# oc get po
NAME READY STATUS RESTARTS AGE
docker-registry-4-rv7x9 1/1 Running 0 2h
router-1-deploy 0/1 Error 0 2h
router-2-deploy 0/1 Error 0 1h
router-3-deploy 0/1 Error 0 1h
router-4-deploy 0/1 Error 0 1h
router-5-deploy 0/1 Error 0 56m
router-9-deploy 1/1 Running 0 56s
router-9-p6nkp 0/1 Running 1 51s
[root@dma-master-nfs-1 ~]# oc describe po router-9-p6nkp
Name: router-9-p6nkp
Namespace: default
Security Policy: hostnetwork
Node: dma-master-nfs-1/10.1.2.4
Start Time: Fri, 27 Apr 2018 16:26:01 +0000
Labels: deployment=router-9
deploymentconfig=router
router=router
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"router-9","uid":"a965057d-4a37-11e8-bc04-000d3a948cd7...
openshift.io/deployment-config.latest-version=9
openshift.io/deployment-config.name=router
openshift.io/deployment.name=router-9
openshift.io/scc=hostnetwork
Status: Running
IP: 10.1.2.4
Controllers: ReplicationController/router-9
Containers:
router:
Container ID: docker://4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d
Image: registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6.173.0.96
Image ID: docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:d08bfa25f21c74a21d0cd14b96c538a8dc8cec0a69b6df1eecac2bdb0b6d8d44
Ports: 80/TCP, 443/TCP, 1936/TCP, 1935/TCP
State: Running
Started: Fri, 27 Apr 2018 16:26:43 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 27 Apr 2018 16:26:05 +0000
Finished: Fri, 27 Apr 2018 16:26:41 +0000
Ready: False
Restart Count: 1
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private
ROUTER_CIPHERS:
ROUTER_EXTERNAL_HOST_HOSTNAME:
ROUTER_EXTERNAL_HOST_HTTPS_VSERVER:
ROUTER_EXTERNAL_HOST_HTTP_VSERVER:
ROUTER_EXTERNAL_HOST_INSECURE: false
ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS:
ROUTER_EXTERNAL_HOST_PARTITION_PATH:
ROUTER_EXTERNAL_HOST_PASSWORD:
ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem
ROUTER_EXTERNAL_HOST_USERNAME:
ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR:
ROUTER_LISTEN_ADDR: 0.0.0.0:1935
ROUTER_METRICS_TYPE: haproxy
ROUTER_SERVICE_HTTPS_PORT: 443
ROUTER_SERVICE_HTTP_PORT: 80
ROUTER_SERVICE_NAME: router
ROUTER_SERVICE_NAMESPACE: default
ROUTER_SUBDOMAIN:
STATS_PASSWORD: EloFJXtMXO
STATS_PORT: 1936
STATS_USERNAME: admin
Mounts:
/etc/pki/tls/private from server-certificate (ro)
/var/run/secrets/kubernetes.io/serviceaccount from router-token-jtk8f (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
server-certificate:
Type: Secret (a volume populated by a Secret)
SecretName: router-certs
Optional: false
router-token-jtk8f:
Type: Secret (a volume populated by a Secret)
SecretName: router-token-jtk8f
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 default-scheduler Normal Scheduled Successfully assigned router-9-p6nkp to dma-master-nfs-1
59s 59s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Created Created container with id 33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64
59s 59s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Started Started container with id 33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64
1m 23s 2 kubelet, dma-master-nfs-1 spec.containers{router} Normal Pulled Container image "registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6.173.0.96" already present on machine
23s 23s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Killing Killing container with id docker://33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64:pod "router-9-p6nkp_default(ac5e9e29-4a37-11e8-bc04-000d3a948cd7)" container "router" is unhealthy, it will be killed and re-created.
22s 22s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Created Created container with id 4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d
21s 21s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Started Started container with id 4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d
43s 3s 4 kubelet, dma-master-nfs-1 spec.containers{router} Warning Unhealthy Liveness probe failed: Get http://localhost:1936/healthz: dial tcp 127.0.0.1:1936: getsockopt: connection refused
43s 3s 4 kubelet, dma-master-nfs-1 spec.containers{router} Warning Unhealthy Readiness probe failed: Get http://localhost:1936/healthz: dial tcp 127.0.0.1:1936: getsockopt: connection refused
The problem was that they had built the wrong version. v3.6.112 was built, but v3.6.173.0.113-1 is the latest. |
Description of problem: When verify a 3.6 hotfix, encounter router Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Version-Release number of selected component (if applicable): oc v3.6.112 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dma-master-nfs-1:8443 openshift v3.6.112 kubernetes v1.6.1+5115d708d7 How reproducible: Always Steps to Reproduce: 1. Deploy router and check router status [root@dma-master-nfs-1 ~]# oc get po NAME READY STATUS RESTARTS AGE docker-registry-4-rv7x9 1/1 Running 0 51m router-1-deploy 0/1 Error 0 41m router-2-deploy 0/1 Error 0 16m router-3-1gnwb 0/1 Running 1 46s router-3-deploy 1/1 Running 0 50s [root@dma-master-nfs-1 ~]# oc logs router-3-1gnwb I0427 15:07:11.148681 1 template.go:246] Starting template router (v3.6.173.0.112) I0427 15:07:11.183136 1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1935 I0427 15:07:11.293812 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I0427 15:07:11.293867 1 router.go:240] Router is including routes in all namespaces I0427 15:07:11.395950 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). [root@dma-master-nfs-1 ~]# [root@dma-master-nfs-1 ~]# oc describe po router-3-1gnwb Name: router-3-1gnwb Namespace: default Security Policy: hostnetwork Node: dma-node-registry-router-1/10.1.2.5 Start Time: Fri, 27 Apr 2018 15:06:28 +0000 Labels: deployment=router-3 deploymentconfig=router router=router Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"router-3","uid":"8c7ee964-4a2c-11e8-9f73-000d3a948cd7... openshift.io/deployment-config.latest-version=3 openshift.io/deployment-config.name=router openshift.io/deployment.name=router-3 openshift.io/scc=hostnetwork Status: Running IP: 10.1.2.5 Controllers: ReplicationController/router-3 Containers: router: Container ID: docker://559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464 Image: registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6 Image ID: docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:5fb9607463b80b1e0bda1966f37f7b038399ab8188c38fee348fbeeb06ecee0d Ports: 80/TCP, 443/TCP, 1936/TCP, 1935/TCP State: Running Started: Fri, 27 Apr 2018 15:07:10 +0000 Last State: Terminated Reason: Error Exit Code: 2 Started: Fri, 27 Apr 2018 15:06:32 +0000 Finished: Fri, 27 Apr 2018 15:07:08 +0000 Ready: False Restart Count: 1 Requests: cpu: 100m memory: 256Mi Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private ROUTER_CIPHERS: ROUTER_EXTERNAL_HOST_HOSTNAME: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER: ROUTER_EXTERNAL_HOST_HTTP_VSERVER: ROUTER_EXTERNAL_HOST_INSECURE: false ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS: ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR: ROUTER_LISTEN_ADDR: 0.0.0.0:1935 ROUTER_METRICS_TYPE: haproxy ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: router ROUTER_SERVICE_NAMESPACE: default ROUTER_SUBDOMAIN: STATS_PASSWORD: EloFJXtMXO STATS_PORT: 1936 STATS_USERNAME: admin Mounts: /etc/pki/tls/private from server-certificate (ro) /var/run/secrets/kubernetes.io/serviceaccount from router-token-jtk8f (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: router-certs Optional: false router-token-jtk8f: Type: Secret (a volume populated by a Secret) SecretName: router-token-jtk8f Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned router-3-1gnwb to dma-node-registry-router-1 1m 1m 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Created Created container with id 68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c 1m 1m 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Started Started container with id 68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c 1m 25s 2 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Pulled Container image "registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6" already present on machine 25s 25s 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Killing Killing container with id docker://68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c:pod "router-3-1gnwb_default(8f68dde8-4a2c-11e8-9f73-000d3a948cd7)" container "router" is unhealthy, it will be killed and re-created. 23s 23s 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Created Created container with id 559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464 23s 23s 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Started Started container with id 559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464 45s 5s 4 kubelet, dma-node-registry-router-1 spec.containers{router} Warning Unhealthy Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused 45s 5s 4 kubelet, dma-node-registry-router-1 spec.containers{router} Warning Unhealthy Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused Actual results: Expected results: Additional info: