Description of problem: In our INT environment, there is a single pod restarting repeatedly due to a failed liveness and readiness probe. There is an identical pod running on a different node without issues. The issue was present in my last two installs of openshift in INT, as well as on our router in PROD. In PROD the only way to keep the router pods online was to completely remove the liveness/readiness probes. [root@dev-preview-int-master-d41bf ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE docker-registry-17-deploy 1/1 Running 0 8m 10.1.3.2 ip-172-31-7-74.ec2.internal docker-registry-17-ru1jj 1/1 Running 0 7m 10.1.3.3 ip-172-31-7-74.ec2.internal docker-registry-17-tyfe1 0/1 CrashLoopBackOff 6 7m 10.1.5.5 ip-172-31-7-73.ec2.internal router-12-7u1n3 1/1 Running 0 6m 172.31.7.73 ip-172-31-7-73.ec2.internal router-12-ux2z6 1/1 Running 0 6m 172.31.7.74 ip-172-31-7-74.ec2.internal LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 2016-08-25T17:18:09Z 2016-08-25T17:13:19Z 15 docker-registry-17-tyfe1 Pod spec.containers{registry} Warning Unhealthy {kubelet ip-172-31-7-73.ec2.internal} Readiness probe failed: Get https://10.1.5.5:5000/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Version-Release number of selected component (if applicable): How reproducible: Unknown, but it's present in the Online INT and PROD environments. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
PROD is running oc version v3.2.1.13-5-gddf7d17 INT is running oc version v3.3.0.25+d2ac65e-dirty, but also had the issue on 3.3.0.24 All INT nodes have 'net.ipv4.ip_forward = 1'.
*** This bug has been marked as a duplicate of bug 1329399 ***