Bug 1370249 - readiness probe failed
Summary: readiness probe failed
Keywords:
Status: CLOSED DUPLICATE of bug 1329399
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-25 17:48 UTC by Stefanie Forrester
Modified: 2016-09-06 14:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-26 15:20:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Stefanie Forrester 2016-08-25 17:48:26 UTC
Description of problem:

In our INT environment, there is a single pod restarting repeatedly due to a failed liveness and readiness probe. There is an identical pod running on a different node without issues.

The issue was present in my last two installs of openshift in INT, as well as on our router in PROD. In PROD the only way to keep the router pods online was to completely remove the liveness/readiness probes.


[root@dev-preview-int-master-d41bf ~]# oc get pods -o wide
NAME                        READY     STATUS             RESTARTS   AGE       IP            NODE
docker-registry-17-deploy   1/1       Running            0          8m        10.1.3.2      ip-172-31-7-74.ec2.internal
docker-registry-17-ru1jj    1/1       Running            0          7m        10.1.3.3      ip-172-31-7-74.ec2.internal
docker-registry-17-tyfe1    0/1       CrashLoopBackOff   6          7m        10.1.5.5      ip-172-31-7-73.ec2.internal
router-12-7u1n3             1/1       Running            0          6m        172.31.7.73   ip-172-31-7-73.ec2.internal
router-12-ux2z6             1/1       Running            0          6m        172.31.7.74   ip-172-31-7-74.ec2.internal


LASTSEEN               FIRSTSEEN              COUNT     NAME                       KIND      SUBOBJECT                   TYPE      REASON      SOURCE                                  MESSAGE
2016-08-25T17:18:09Z   2016-08-25T17:13:19Z   15        docker-registry-17-tyfe1   Pod       spec.containers{registry}   Warning   Unhealthy   {kubelet ip-172-31-7-73.ec2.internal}   Readiness probe failed: Get https://10.1.5.5:5000/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Version-Release number of selected component (if applicable):


How reproducible:
Unknown, but it's present in the Online INT and PROD environments.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Stefanie Forrester 2016-08-25 17:51:48 UTC
PROD is running oc version v3.2.1.13-5-gddf7d17
INT is running oc version v3.3.0.25+d2ac65e-dirty, but also had the issue on 3.3.0.24

All INT nodes have 'net.ipv4.ip_forward = 1'.

Comment 5 Ben Bennett 2016-08-26 15:20:46 UTC

*** This bug has been marked as a duplicate of bug 1329399 ***


Note You need to log in before you can comment on or make changes to this bug.