Bug 1370249

Summary: readiness probe failed
Product: OpenShift Container Platform Reporter: Stefanie Forrester <dakini>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Status: CLOSED DUPLICATE QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, dakini, eparis, jtanenba
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-26 15:20:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Stefanie Forrester 2016-08-25 17:48:26 UTC
Description of problem:

In our INT environment, there is a single pod restarting repeatedly due to a failed liveness and readiness probe. There is an identical pod running on a different node without issues.

The issue was present in my last two installs of openshift in INT, as well as on our router in PROD. In PROD the only way to keep the router pods online was to completely remove the liveness/readiness probes.


[root@dev-preview-int-master-d41bf ~]# oc get pods -o wide
NAME                        READY     STATUS             RESTARTS   AGE       IP            NODE
docker-registry-17-deploy   1/1       Running            0          8m        10.1.3.2      ip-172-31-7-74.ec2.internal
docker-registry-17-ru1jj    1/1       Running            0          7m        10.1.3.3      ip-172-31-7-74.ec2.internal
docker-registry-17-tyfe1    0/1       CrashLoopBackOff   6          7m        10.1.5.5      ip-172-31-7-73.ec2.internal
router-12-7u1n3             1/1       Running            0          6m        172.31.7.73   ip-172-31-7-73.ec2.internal
router-12-ux2z6             1/1       Running            0          6m        172.31.7.74   ip-172-31-7-74.ec2.internal


LASTSEEN               FIRSTSEEN              COUNT     NAME                       KIND      SUBOBJECT                   TYPE      REASON      SOURCE                                  MESSAGE
2016-08-25T17:18:09Z   2016-08-25T17:13:19Z   15        docker-registry-17-tyfe1   Pod       spec.containers{registry}   Warning   Unhealthy   {kubelet ip-172-31-7-73.ec2.internal}   Readiness probe failed: Get https://10.1.5.5:5000/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Version-Release number of selected component (if applicable):


How reproducible:
Unknown, but it's present in the Online INT and PROD environments.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Stefanie Forrester 2016-08-25 17:51:48 UTC
PROD is running oc version v3.2.1.13-5-gddf7d17
INT is running oc version v3.3.0.25+d2ac65e-dirty, but also had the issue on 3.3.0.24

All INT nodes have 'net.ipv4.ip_forward = 1'.

Comment 5 Ben Bennett 2016-08-26 15:20:46 UTC

*** This bug has been marked as a duplicate of bug 1329399 ***