Bug 1678100

Summary: Readiness probe failed for hawkular metrics
Product: OpenShift Container Platform Reporter: Priyanka Kanthale <pkanthal>
Component: HawkularAssignee: Jan Martiska <jmartisk>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jmartisk, pkanthal, rvargasp
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-18 01:34:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Priyanka Kanthale 2019-02-18 06:18:47 UTC
Description of problem:
In fresh 3.11.43 cluster installation hawkular-metrics pod does not come up and goes into crash loopback

# oc get pods
NAME                            READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-r6ckd      1/1       Running            1          6d
hawkular-metrics-94hxj          0/1       CrashLoopBackOff   2803       6d
hawkular-metrics-schema-d65rc   0/1       Completed          0          6d
heapster-l6wcg                  0/1       Running            1064       6d


Events:
  Type     Reason     Age                   From                         Message
  ----     ------     ----                  ----                         -------
  Normal   Killing    1h (x2801 over 6d)    kubelet, xyz  Killing container with id docker://hawkular-metrics:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  34m (x15716 over 6d)  kubelet, xyz Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again.
  Warning  Unhealthy  15m (x8463 over 6d)   kubelet, xyz  Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>.
Traceback (most recent call last):
  File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module>
    if int(uptime) < int(timeout):


# oc -n openshift-infra get job
NAME                      DESIRED   SUCCESSFUL   AGE
hawkular-metrics-schema   1         1            6d

# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                 STORAGECLASS              REASON    AGE
pvc-e33d4076-14d8-11e9-8146-005056a108a0   25G        RWO            Delete           Bound     openshift-infra/metrics-cassandra-1   glusterfs-storage-block             6d

# oc get pvc
NAME                  STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
metrics-cassandra-1   Bound     pvc-e33d4076-14d8-11e9-8146-005056a108a0   25G        RWO            glusterfs-storage-block   6d

workaround : Increase readiness probe.

Comment 5 Jan Martiska 2019-02-20 07:30:15 UTC
PR containing this enhancement: https://github.com/openshift/openshift-ansible/pull/11216

Comment 14 errata-xmlrpc 2019-10-18 01:34:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3139