1457399 – http liveness and readiness checks failing

Bug 1457399 - http liveness and readiness checks failing

Summary: http liveness and readiness checks failing

Keywords:
Status:	CLOSED DUPLICATE of bug 1451854
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Seth Jennings
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1461505 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-31 16:02 UTC by Ruben Romero Montes
Modified:	2020-08-13 09:17 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-16 03:44:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
customer reproducer with only 1 instance (58.29 KB, text/plain) 2017-05-31 16:02 UTC, Ruben Romero Montes	no flags	Details
journal from azuur05 (878.63 KB, text/x-vhdl) 2017-06-01 13:25 UTC, Ruben Romero Montes	no flags	Details
journal from pew05 (696.84 KB, text/x-vhdl) 2017-06-01 13:25 UTC, Ruben Romero Montes	no flags	Details
namespace events (56.55 KB, text/plain) 2017-06-01 13:26 UTC, Ruben Romero Montes	no flags	Details
pods dump (54.54 KB, text/x-vhdl) 2017-06-01 13:27 UTC, Ruben Romero Montes	no flags	Details
nsenter.tcpdump.port.8080.cap (5.51 KB, application/octet-stream) 2017-06-15 15:08 UTC, Ruben Romero Montes	no flags	Details
tcpdump.port.8080.cap (83.17 KB, application/octet-stream) 2017-06-15 15:08 UTC, Ruben Romero Montes	no flags	Details
ipneigh.out (15.48 KB, text/plain) 2017-06-15 15:09 UTC, Ruben Romero Montes	no flags	Details
capture.cap (5.12 KB, application/octet-stream) 2017-06-15 15:09 UTC, Ruben Romero Montes	no flags	Details
ipa.out (630 bytes, text/plain) 2017-06-15 19:18 UTC, markv	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Origin (Github)	14685	0	None	None	None	2017-06-15 20:39:43 UTC

Description Ruben Romero Montes 2017-05-31 16:02:06 UTC

Created attachment 1283832 [details]
customer reproducer with only 1 instance

Description of problem:
When defining the liveness and readiness probes on the deployment config and scale up, the application is restarted multiple times due to the failure of the readiness check.

Version-Release number of selected component (if applicable):
OCP 3.5.5.8

How reproducible:
Customer can reproduce it frequently

Steps to Reproduce:
1. Create a project and a new application using jboss-webserver-3/webserver30-tomcat8-openshift without any source
2. Configure liveness and readiness probes with the following parameters
          livenessProbe:
            httpGet:
              path: /docs
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 300
            timeoutSeconds: 30
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5
3. Scale up to 10 instances
4. Curl'ing pods directly to the same path as the probe succeeds
5. Pods will eventually become ready after several restarts due to the probes reporting a failure

Actual results:
Even with one replica it sometimes takes 4 container restart to get it right (see attachment)

Mon May 29 08:59:28 CEST 2017
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
tomcat-16-vpc7n   0/1       Running   1          10m       10.1.17.129   osclu1-azuur-05.uz.kuleuven.ac.be
Mon May 29 09:09:29 CEST 2017
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
tomcat-16-vpc7n   0/1       Running   3          20m       10.1.17.129   osclu1-azuur-05.uz.kuleuven.ac.be
Mon May 29 09:19:30 CEST 2017
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
tomcat-16-vpc7n   1/1       Running   4          30m       10.1.17.129   osclu1-azuur-05.uz.kuleuven.ac.be


Expected results:
I would expect the pods to become ready without restarts if the server is started and serving requests.

Additional info:

Comment 1 Derek Carr 2017-05-31 16:28:23 UTC

Are all 10 scaled instances running on the same node?

We recently fixed a bug in kubernetes 1.7 and backported to OCP 3.6 where failureThresholds were not being reset across pod restarts.  I am wondering if that is the source of the restarts > 1 in the referenced scenario.

See: https://github.com/openshift/origin/pull/14332

Comment 3 Ruben Romero Montes 2017-06-01 07:05:36 UTC

(In reply to Derek Carr from comment #1)
> Are all 10 scaled instances running on the same node?
Not necessarily, it has been seen even with one instance.

> We recently fixed a bug in kubernetes 1.7 and backported to OCP 3.6 where
> failureThresholds were not being reset across pod restarts.  I am wondering
> if that is the source of the restarts > 1 in the referenced scenario.
> 
> See: https://github.com/openshift/origin/pull/14332
I don't think it is related to this issue. In the "Actual results" section you can see how the restarts grew each 10 minutes, meaning the pod was restarted.
I will try to get the events, the journal from the node the pods were being deployed and any other information you find relevant.

Comment 4 Ruben Romero Montes 2017-06-01 13:25:24 UTC

Created attachment 1284145 [details]
journal from azuur05

Comment 5 Ruben Romero Montes 2017-06-01 13:25:52 UTC

Created attachment 1284146 [details]
journal from pew05

Comment 6 Ruben Romero Montes 2017-06-01 13:26:25 UTC

Created attachment 1284147 [details]
namespace events

Comment 7 Ruben Romero Montes 2017-06-01 13:27:12 UTC

Created attachment 1284148 [details]
pods dump

$ for pod in `oc get po | grep -o "tomcat-[0-9a-z\-]*"`; 
     do echo --POD $pod; 
        echo ---- POD DESCRIPTION; 
        oc get po $pod -o yaml; 
        echo ---- POD LOGS; 
        oc logs $pod; 
        echo ---- PODS ACCESS.log; 
        oc exec $pod -- cat /opt/webserver/logs/localhost_access_log.2017-06-01.txt; 
     done > tomcat_pods.dump

Comment 8 Ruben Romero Montes 2017-06-01 14:42:57 UTC

Pay attention to tomcat-16-mv9sj (the last one)

**** POD: tomcat-16-7hd54 (Started OK, is ready and processed curl requests)
...
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-06-01T08:58:49Z
  hostIP: 10.254.250.55
  phase: Running
  podIP: 10.1.17.66
  startTime: 2017-06-01T08:58:44Z

12m        12m         1         tomcat-16-7hd54   Pod                                               Normal    Scheduled                     {default-scheduler }                          Successfully assigned tomcat-16-7hd54 to osclu1-azuur-05.uz.kuleuven.ac.be
12m        12m         1         tomcat-16-7hd54   Pod                     spec.containers{tomcat}   Normal    Pulling                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   pulling image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-7hd54   Pod                     spec.containers{tomcat}   Normal    Pulled                        {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Successfully pulled image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-7hd54   Pod                     spec.containers{tomcat}   Normal    Created                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Created container with docker id b92065aa1408; Security:[seccomp=unconfined]
12m        12m         1         tomcat-16-7hd54   Pod                     spec.containers{tomcat}   Normal    Started                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Started container with docker id b92065aa1408

Processed the first request at:
10.1.4.1 - - [01/Jun/2017:09:01:28 +0000] "GET / HTTP/1.1" 404 -
10.1.4.1 - - [01/Jun/2017:09:03:19 +0000] "GET / HTTP/1.1" 404 -
10.1.17.1 - - [01/Jun/2017:09:04:09 +0000] "GET /docs HTTP/1.1" 302 -

**** POD: tomcat-16-fszzw (Started OK, is ready and processed curl requests)
...
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-06-01T08:58:47Z
  hostIP: 10.254.185.49
  phase: Running
  podIP: 10.1.11.156
  startTime: 2017-06-01T08:58:44Z

12m        12m         1         tomcat-16-fszzw   Pod                                               Normal    Scheduled                     {default-scheduler }                          Successfully assigned tomcat-16-fszzw to osclu1-pew-05.uz.kuleuven.ac.be
12m        12m         1         tomcat-16-fszzw   Pod                     spec.containers{tomcat}   Normal    Pulling                       {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     pulling image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-fszzw   Pod                     spec.containers{tomcat}   Normal    Pulled                        {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     Successfully pulled image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-fszzw   Pod                     spec.containers{tomcat}   Normal    Created                       {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     Created container with docker id 6dec2b636981; Security:[seccomp=unconfined]
12m        12m         1         tomcat-16-fszzw   Pod                     spec.containers{tomcat}   Normal    Started                       {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     Started container with docker id 6dec2b636981

Processed the first request at:
10.1.4.1 - - [01/Jun/2017:09:01:15 +0000] "GET / HTTP/1.1" 404 -
10.1.4.1 - - [01/Jun/2017:09:03:13 +0000] "GET / HTTP/1.1" 404 -
10.1.11.1 - - [01/Jun/2017:09:03:54 +0000] "GET /docs HTTP/1.1" 302 -


**** POD: tomcat-16-f9h76 (Started OK, is ready and processed curl requests)
...
ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-06-01T08:58:48Z
  hostIP: 10.254.185.49
  phase: Running
  podIP: 10.1.11.157
  startTime: 2017-06-01T08:58:44Z

12m        12m         1         tomcat-16-f9h76   Pod                                               Normal    Scheduled                     {default-scheduler }                          Successfully assigned tomcat-16-f9h76 to osclu1-pew-05.uz.kuleuven.ac.be
12m        12m         1         tomcat-16-f9h76   Pod                     spec.containers{tomcat}   Normal    Pulling                       {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     pulling image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-f9h76   Pod                     spec.containers{tomcat}   Normal    Pulled                        {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     Successfully pulled image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-f9h76   Pod                     spec.containers{tomcat}   Normal    Created                       {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     Created container with docker id b52c9dda5bb7; Security:[seccomp=unconfined]
12m        12m         1         tomcat-16-f9h76   Pod                     spec.containers{tomcat}   Normal    Started                       {kubelet osclu1-pew-05.uz.kuleuven.ac.be}     Started container with docker id b52c9dda5bb7

Processed the first requests at:
10.1.4.1 - - [01/Jun/2017:09:01:21 +0000] "GET / HTTP/1.1" 404 -
10.1.4.1 - - [01/Jun/2017:09:03:16 +0000] "GET / HTTP/1.1" 404 -
10.1.11.1 - - [01/Jun/2017:09:03:54 +0000] "GET /docs HTTP/1.1" 302 -

**** POD: tomcat-16-mv9sj (Required 1 restart, is NOT ready and processed curl requests)
...
    ready: false
    restartCount: 0
    state:
      running:
        startedAt: 2017-06-01T08:58:49Z
  hostIP: 10.254.250.55
  phase: Running
  podIP: 10.1.17.68
  startTime: 2017-06-01T08:58:44Z

12m        12m         1         tomcat-16-mv9sj   Pod                                               Normal    Scheduled                     {default-scheduler }                          Successfully assigned tomcat-16-mv9sj to osclu1-azuur-05.uz.kuleuven.ac.be
4m         12m         2         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Pulling                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   pulling image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
4m         12m         2         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Pulled                        {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Successfully pulled image "172.30.241.23:5000/test-tomcat/tomcat@sha256:2754206e231365d5f0ae59107ccfcd677cf8769b524618051323b448b05c732e"
12m        12m         1         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Created                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Created container with docker id 1698b56bdbe0; Security:[seccomp=unconfined]
12m        12m         1         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Started                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Started container with docker id 1698b56bdbe0
4m         6m          5         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Warning   Unhealthy                     {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Liveness probe failed: Get http://10.1.17.68:8080/docs: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
4m         6m          5         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Warning   Unhealthy                     {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Readiness probe failed: Get http://10.1.17.68:8080/docs: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
4m         4m          1         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Killing                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Killing container with docker id 1698b56bdbe0: pod "tomcat-16-mv9sj_test-tomcat(84649f7a-46a8-11e7-ac29-005056ab1e93)" container "tomcat" is unhealthy, it will be killed and re-created.
4m         4m          1         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Created                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Created container with docker id 84508fbfedb0; Security:[seccomp=unconfined]
4m         4m          1         tomcat-16-mv9sj   Pod                     spec.containers{tomcat}   Normal    Started                       {kubelet osclu1-azuur-05.uz.kuleuven.ac.be}   Started container with docker id 84508fbfedb0

Processed the first request at (I couldn't find any request to /docs here):
10.1.4.1 - - [01/Jun/2017:09:01:33 +0000] "GET / HTTP/1.1" 404 -
10.1.4.1 - - [01/Jun/2017:09:03:24 +0000] "GET / HTTP/1.1" 404 -
10.1.4.1 - - [01/Jun/2017:09:05:19 +0000] "GET / HTTP/1.1" 404 -

Comment 10 Ruben Romero Montes 2017-06-01 15:56:23 UTC

Analysis given the previous comment:

 1. pod tomcat-16-mv9sj is created at 08:58:44 -- startTime: 2017-06-01T08:58:44Z
 2. tomcat server is started at 08:59:14 -- 2017-06-01 08:59:14,685 [main] INFO  org.apache.catalina.startup.Catalina- Server startup in 2943 ms
 3. first request is received at 09:01:33 -- 10.1.4.1 - - [01/Jun/2017:09:01:33 +0000] "GET / HTTP/1.1" 404 -
 4. 6 minutes later (aprox. 09:04) the probe fails by timeout and the pod is killed - Liveness probe failed: Get http://10.1.17.68:8080/docs: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
 5. 8 minutes later (aprox. 09:06) a new container is created. I might or might not have failed after 6 minutes more. Logs don't show it.

Comment 11 Seth Jennings 2017-06-03 17:57:58 UTC

Looking at the probe:

            initialDelaySeconds: 300
            timeoutSeconds: 30
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5

The time between container restarts, if the probe URI is timing out, is 300 + 30 * 5 + 10 * 4 = 490s = 8m10s.  That is initialDelaySeconds + timeoutSeconds * failureThreshold + periodSeconds * (failureThreshold - 1).

> 5. 8 minutes later (aprox. 09:06) a new container is created.

That aligns with what you are reporting.

Due to the bug Derek mentioned, I would expect to see a restart every 330 seconds (5m30s) after a single probe failure on the restarted pod.  If the application, is ready by the first probe, then the container appears as Ready.

Looking at the report:

Mon May 29 08:59:28 CEST 2017
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
tomcat-16-vpc7n   0/1       Running   1          10m       10.1.17.129   osclu1-azuur-05.uz.kuleuven.ac.be
Mon May 29 09:09:29 CEST 2017
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
tomcat-16-vpc7n   0/1       Running   3          20m       10.1.17.129   osclu1-azuur-05.uz.kuleuven.ac.be
Mon May 29 09:19:30 CEST 2017
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
tomcat-16-vpc7n   1/1       Running   4          30m       10.1.17.129   osclu1-azuur-05.uz.kuleuven.ac.be

Timeline:

@0m the container is started
@8m10s the container restarts due to 5 probe failures
@10m get shows 1 restart and 0/1 Ready
@13m40s the container restarts due to 1 probe failure
@19m10s the container restarts due to 1 probe failure
@20m get shows 3 restarts and 0/1 Ready
@24m40s the container restarts due to 1 probe failure
@29m40s the container probe succeeds
@30m get show 4 restarts and 1/1 Ready

tl;dr The results you are reporting are the expected results when the aforementioned bug in comment 1 is taken into account.

Comment 12 Seth Jennings 2017-06-05 14:23:48 UTC


*** This bug has been marked as a duplicate of bug 1455056 ***

Comment 13 Ruben Romero Montes 2017-06-06 08:29:46 UTC

The question here is why even if the server is capable of processing requests, the probe fails and the pod is restarted. Regardless the count.

Comment 14 Seth Jennings 2017-06-06 14:10:10 UTC

Sorry Ruben, misunderstood what the issue was.

If the openshift node process can not contact the application on its pod IP address, then yes, the probe will always fail.  This looks like a networking issue.

However, I do see you mentioned:

> 4. Curl'ing pods directly to the same path as the probe succeeds

Is this confirming that the manual curl from the node to the pod is working at the same time the probe is failing?  Could there be an intermittent network issue?

Is this a containerized install?

Comment 15 Ruben Romero Montes 2017-06-06 14:44:56 UTC

(In reply to Seth Jennings from comment #14)
> Sorry Ruben, misunderstood what the issue was.
> 
> If the openshift node process can not contact the application on its pod IP
> address, then yes, the probe will always fail.  This looks like a networking
> issue.
> 
> However, I do see you mentioned:
> 
> > 4. Curl'ing pods directly to the same path as the probe succeeds
> 
> Is this confirming that the manual curl from the node to the pod is working
> at the same time the probe is failing?  Could there be an intermittent
> network issue?
See the mentioned networking issue the installation is also experiencing:  
  https://bugzilla.redhat.com/show_bug.cgi?id=1454948

> Is this a containerized install?
No it is not but it is partially deployed on Azure and OpenStack

Comment 16 Ruben Romero Montes 2017-06-08 08:53:02 UTC

One reproducer is a containerized install, the other is not.

Comment 18 Ruben Romero Montes 2017-06-12 15:09:34 UTC

Reported case also on OCP 3.4.1-24
Updated version label accordingly.

Comment 25 Eric Paris 2017-06-14 17:05:08 UTC

At the moment of a liveness failure can you collect `ip neigh` on the host and `ip a` inside the container that is failing to respond?

Comment 26 Eric Paris 2017-06-14 17:27:12 UTC

I'd also love to see a tcpdump inside and outside the container in question during a (couple of?) liveness failures. Something like:

tcpdump -i any -nn port 8080

If you don't have tcpdump inside the container you can actually grab the same thing from outside if you know the pid of a process in the right container. You should be able to get that from docker ps and docker inspect. Then on the host you can run:

nsenter -t $PID -n -- tcpdump -i any -nn port 8080

Comment 27 Ben Bennett 2017-06-14 17:33:30 UTC

Also, at the moment of failure, from the host the container is running on, can you do:

tcpdump -n -i tun0 -w capture.cap host 192.168.1.1 

(but replace that IP with the IP of the pod that is failing to respond)

Then please get me the capture.cap file

Comment 28 Ruben Romero Montes 2017-06-15 15:08:17 UTC

Created attachment 1288095 [details]
nsenter.tcpdump.port.8080.cap

Comment 29 Ruben Romero Montes 2017-06-15 15:08:45 UTC

Created attachment 1288096 [details]
tcpdump.port.8080.cap

Comment 30 Ruben Romero Montes 2017-06-15 15:09:07 UTC

Created attachment 1288097 [details]
ipneigh.out

Comment 31 Ruben Romero Montes 2017-06-15 15:09:52 UTC

Created attachment 1288098 [details]
capture.cap

Comment 32 Ruben Romero Montes 2017-06-15 15:10:44 UTC

[root@osclu1-cross-01 ~]# oc get pods -o wide
NAME              READY     STATUS      RESTARTS   AGE       IP            NODE
tomcat-20-f9d7v   0/1       Running     0          30m       10.1.11.34    osclu1-pew-05.uz.kuleuven.ac.be
tomcat-20-fmnqt   1/1       Running     0          30m       10.1.17.88    osclu1-azuur-05.uz.kuleuven.ac.be
tomcat-3-build    0/1       Completed   0          7d        10.1.11.128   osclu1-pew-05.uz.kuleuven.ac.be

ip neigh >/tmp/ipneigh.outhi
nsenter -t 15455 -n -- ip a >/tmp/ipa.out
tcpdump -i any -nn port 8080 -w /tmp/tcpdump.port.8080
nsenter -t 15455 -n -- tcpdump -i any -nn port 8080 -w /tmp/nsenter.tcpdump.port.8080
tcpdump -n -i tun0 -w capture.cap host 10.1.11.34

I let the dumps run for about 30 seconds. Checks are done every 10 seconds

Comment 33 Ben Bennett 2017-06-15 16:56:42 UTC

Can we please get the /tmp/ipa.out attachment?  That has the critical piece of information in it.

Comment 34 markv 2017-06-15 19:18:09 UTC

Created attachment 1288158 [details]
ipa.out

Comment 35 Eric Paris 2017-06-15 19:30:16 UTC

10.1.11.34 dev tun0 lladdr de:0e:f4:db:09:7c REACHABLE

3: eth0@if2140: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 62:fd:49:57:02:9b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.1.11.34/24 scope global eth0

Comment 36 markv 2017-06-15 20:22:09 UTC

Maybe I 'm kicking in an open door but the mac address are not the same. Is that why the checks ar no working ?

Comment 37 Ben Bennett 2017-06-15 20:39:28 UTC

markv: Precisely.  https://github.com/openshift/origin/pull/14685 will address the symptom.  I'm still not sure why the arp cache gets into this state though.

Comment 38 Seth Jennings 2017-06-15 21:14:48 UTC

Ben, can I dup this to 1451854 then?

Comment 39 Derek Carr 2017-06-16 02:05:01 UTC

*** Bug 1461505 has been marked as a duplicate of this bug. ***

Comment 40 Seth Jennings 2017-06-16 03:44:41 UTC


*** This bug has been marked as a duplicate of bug 1451854 ***

Note You need to log in before you can comment on or make changes to this bug.