Description of problem: When verify a 3.6 hotfix, encounter router Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Version-Release number of selected component (if applicable): oc v3.6.112 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dma-master-nfs-1:8443 openshift v3.6.112 kubernetes v1.6.1+5115d708d7 How reproducible: Always Steps to Reproduce: 1. Deploy router and check router status [root@dma-master-nfs-1 ~]# oc get po NAME READY STATUS RESTARTS AGE docker-registry-4-rv7x9 1/1 Running 0 51m router-1-deploy 0/1 Error 0 41m router-2-deploy 0/1 Error 0 16m router-3-1gnwb 0/1 Running 1 46s router-3-deploy 1/1 Running 0 50s [root@dma-master-nfs-1 ~]# oc logs router-3-1gnwb I0427 15:07:11.148681 1 template.go:246] Starting template router (v3.6.173.0.112) I0427 15:07:11.183136 1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1935 I0427 15:07:11.293812 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I0427 15:07:11.293867 1 router.go:240] Router is including routes in all namespaces I0427 15:07:11.395950 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). [root@dma-master-nfs-1 ~]# [root@dma-master-nfs-1 ~]# oc describe po router-3-1gnwb Name: router-3-1gnwb Namespace: default Security Policy: hostnetwork Node: dma-node-registry-router-1/10.1.2.5 Start Time: Fri, 27 Apr 2018 15:06:28 +0000 Labels: deployment=router-3 deploymentconfig=router router=router Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"router-3","uid":"8c7ee964-4a2c-11e8-9f73-000d3a948cd7... openshift.io/deployment-config.latest-version=3 openshift.io/deployment-config.name=router openshift.io/deployment.name=router-3 openshift.io/scc=hostnetwork Status: Running IP: 10.1.2.5 Controllers: ReplicationController/router-3 Containers: router: Container ID: docker://559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464 Image: registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6 Image ID: docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:5fb9607463b80b1e0bda1966f37f7b038399ab8188c38fee348fbeeb06ecee0d Ports: 80/TCP, 443/TCP, 1936/TCP, 1935/TCP State: Running Started: Fri, 27 Apr 2018 15:07:10 +0000 Last State: Terminated Reason: Error Exit Code: 2 Started: Fri, 27 Apr 2018 15:06:32 +0000 Finished: Fri, 27 Apr 2018 15:07:08 +0000 Ready: False Restart Count: 1 Requests: cpu: 100m memory: 256Mi Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private ROUTER_CIPHERS: ROUTER_EXTERNAL_HOST_HOSTNAME: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER: ROUTER_EXTERNAL_HOST_HTTP_VSERVER: ROUTER_EXTERNAL_HOST_INSECURE: false ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS: ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR: ROUTER_LISTEN_ADDR: 0.0.0.0:1935 ROUTER_METRICS_TYPE: haproxy ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: router ROUTER_SERVICE_NAMESPACE: default ROUTER_SUBDOMAIN: STATS_PASSWORD: EloFJXtMXO STATS_PORT: 1936 STATS_USERNAME: admin Mounts: /etc/pki/tls/private from server-certificate (ro) /var/run/secrets/kubernetes.io/serviceaccount from router-token-jtk8f (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: router-certs Optional: false router-token-jtk8f: Type: Secret (a volume populated by a Secret) SecretName: router-token-jtk8f Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned router-3-1gnwb to dma-node-registry-router-1 1m 1m 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Created Created container with id 68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c 1m 1m 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Started Started container with id 68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c 1m 25s 2 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Pulled Container image "registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6" already present on machine 25s 25s 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Killing Killing container with id docker://68507389ed10cd7ec1aab641d4e18021013bb1406c2af31eacef60e88c85cc2c:pod "router-3-1gnwb_default(8f68dde8-4a2c-11e8-9f73-000d3a948cd7)" container "router" is unhealthy, it will be killed and re-created. 23s 23s 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Created Created container with id 559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464 23s 23s 1 kubelet, dma-node-registry-router-1 spec.containers{router} Normal Started Started container with id 559abf68c7d4d2131196ee9376d8020abe8b4f923b329e1bde319fefb2eb4464 45s 5s 4 kubelet, dma-node-registry-router-1 spec.containers{router} Warning Unhealthy Liveness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused 45s 5s 4 kubelet, dma-node-registry-router-1 spec.containers{router} Warning Unhealthy Readiness probe failed: Get http://localhost:1936/healthz: dial tcp [::1]:1936: getsockopt: connection refused Actual results: Expected results: Additional info:
Deploy another and add oc logs [root@dma-master-nfs-1 ~]# oc get po NAME READY STATUS RESTARTS AGE docker-registry-4-rv7x9 1/1 Running 0 1h router-1-deploy 0/1 Error 0 1h router-2-deploy 0/1 Error 0 1h router-3-deploy 0/1 Error 0 57m router-4-deploy 0/1 Error 0 44m router-5-deploy 0/1 Error 0 33m router-6-6nk5j 0/1 Running 1 45s router-6-deploy 1/1 Running 0 50s [root@dma-master-nfs-1 ~]# oc logs router-6-6nk5j I0427 16:03:42.617428 1 template.go:246] Starting template router (v3.6.173.0.96) I0427 16:03:42.647580 1 metrics.go:43] Router health and metrics port listening at 0.0.0.0:1935 I0427 16:03:42.844955 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I0427 16:03:42.845010 1 router.go:240] Router is including routes in all namespaces I0427 16:03:43.047871 1 router.go:554] Router reloaded: - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s).
After remove remove '::1 localhost localhost.localdomain localhost6 localhost6.localdomain6' from /etc/host ; Then restart master/node, redeploy router, still same error. My os is rhel75, Is't related? //no node: [root@dma-node-registry-router-1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 //master node [root@dma-master-nfs-1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 //rsh into router container check router [root@dma-master-nfs-1 ~]# oc rsh router-9-p6nkp sh-4.2$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 sh-4.2$ exit exit [root@dma-master-nfs-1 ~]# oc get po NAME READY STATUS RESTARTS AGE docker-registry-4-rv7x9 1/1 Running 0 2h router-1-deploy 0/1 Error 0 2h router-2-deploy 0/1 Error 0 1h router-3-deploy 0/1 Error 0 1h router-4-deploy 0/1 Error 0 1h router-5-deploy 0/1 Error 0 56m router-9-deploy 1/1 Running 0 56s router-9-p6nkp 0/1 Running 1 51s [root@dma-master-nfs-1 ~]# oc describe po router-9-p6nkp Name: router-9-p6nkp Namespace: default Security Policy: hostnetwork Node: dma-master-nfs-1/10.1.2.4 Start Time: Fri, 27 Apr 2018 16:26:01 +0000 Labels: deployment=router-9 deploymentconfig=router router=router Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"router-9","uid":"a965057d-4a37-11e8-bc04-000d3a948cd7... openshift.io/deployment-config.latest-version=9 openshift.io/deployment-config.name=router openshift.io/deployment.name=router-9 openshift.io/scc=hostnetwork Status: Running IP: 10.1.2.4 Controllers: ReplicationController/router-9 Containers: router: Container ID: docker://4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d Image: registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6.173.0.96 Image ID: docker-pullable://registry.access.redhat.com/openshift3/ose-haproxy-router@sha256:d08bfa25f21c74a21d0cd14b96c538a8dc8cec0a69b6df1eecac2bdb0b6d8d44 Ports: 80/TCP, 443/TCP, 1936/TCP, 1935/TCP State: Running Started: Fri, 27 Apr 2018 16:26:43 +0000 Last State: Terminated Reason: Error Exit Code: 2 Started: Fri, 27 Apr 2018 16:26:05 +0000 Finished: Fri, 27 Apr 2018 16:26:41 +0000 Ready: False Restart Count: 1 Requests: cpu: 100m memory: 256Mi Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private ROUTER_CIPHERS: ROUTER_EXTERNAL_HOST_HOSTNAME: ROUTER_EXTERNAL_HOST_HTTPS_VSERVER: ROUTER_EXTERNAL_HOST_HTTP_VSERVER: ROUTER_EXTERNAL_HOST_INSECURE: false ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS: ROUTER_EXTERNAL_HOST_PARTITION_PATH: ROUTER_EXTERNAL_HOST_PASSWORD: ROUTER_EXTERNAL_HOST_PRIVKEY: /etc/secret-volume/router.pem ROUTER_EXTERNAL_HOST_USERNAME: ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR: ROUTER_LISTEN_ADDR: 0.0.0.0:1935 ROUTER_METRICS_TYPE: haproxy ROUTER_SERVICE_HTTPS_PORT: 443 ROUTER_SERVICE_HTTP_PORT: 80 ROUTER_SERVICE_NAME: router ROUTER_SERVICE_NAMESPACE: default ROUTER_SUBDOMAIN: STATS_PASSWORD: EloFJXtMXO STATS_PORT: 1936 STATS_USERNAME: admin Mounts: /etc/pki/tls/private from server-certificate (ro) /var/run/secrets/kubernetes.io/serviceaccount from router-token-jtk8f (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: server-certificate: Type: Secret (a volume populated by a Secret) SecretName: router-certs Optional: false router-token-jtk8f: Type: Secret (a volume populated by a Secret) SecretName: router-token-jtk8f Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 1 default-scheduler Normal Scheduled Successfully assigned router-9-p6nkp to dma-master-nfs-1 59s 59s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Created Created container with id 33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64 59s 59s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Started Started container with id 33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64 1m 23s 2 kubelet, dma-master-nfs-1 spec.containers{router} Normal Pulled Container image "registry.access.redhat.com/openshift3/ose-haproxy-router:v3.6.173.0.96" already present on machine 23s 23s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Killing Killing container with id docker://33f8f902bd45c509614dbe5daa9464749baed4f14b11c53caa5ab755cbb28a64:pod "router-9-p6nkp_default(ac5e9e29-4a37-11e8-bc04-000d3a948cd7)" container "router" is unhealthy, it will be killed and re-created. 22s 22s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Created Created container with id 4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d 21s 21s 1 kubelet, dma-master-nfs-1 spec.containers{router} Normal Started Started container with id 4fb7175cdf64ff10b06dd5c70a05f53f16f31a43d52d14262cf4856816002a4d 43s 3s 4 kubelet, dma-master-nfs-1 spec.containers{router} Warning Unhealthy Liveness probe failed: Get http://localhost:1936/healthz: dial tcp 127.0.0.1:1936: getsockopt: connection refused 43s 3s 4 kubelet, dma-master-nfs-1 spec.containers{router} Warning Unhealthy Readiness probe failed: Get http://localhost:1936/healthz: dial tcp 127.0.0.1:1936: getsockopt: connection refused
The problem was that they had built the wrong version. v3.6.112 was built, but v3.6.173.0.113-1 is the latest.