Bug 1678100 - Readiness probe failed for hawkular metrics
Summary: Readiness probe failed for hawkular metrics
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.11.z
Assignee: Jan Martiska
QA Contact: Junqi Zhao
Depends On:
TreeView+ depends on / blocked
Reported: 2019-02-18 06:18 UTC by Priyanka Kanthale
Modified: 2020-03-02 02:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-10-18 01:34:36 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3139 None None None 2019-10-18 01:34:58 UTC

Description Priyanka Kanthale 2019-02-18 06:18:47 UTC
Description of problem:
In fresh 3.11.43 cluster installation hawkular-metrics pod does not come up and goes into crash loopback

# oc get pods
NAME                            READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-r6ckd      1/1       Running            1          6d
hawkular-metrics-94hxj          0/1       CrashLoopBackOff   2803       6d
hawkular-metrics-schema-d65rc   0/1       Completed          0          6d
heapster-l6wcg                  0/1       Running            1064       6d

  Type     Reason     Age                   From                         Message
  ----     ------     ----                  ----                         -------
  Normal   Killing    1h (x2801 over 6d)    kubelet, xyz  Killing container with id docker://hawkular-metrics:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  34m (x15716 over 6d)  kubelet, xyz Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again.
  Warning  Unhealthy  15m (x8463 over 6d)   kubelet, xyz  Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>.
Traceback (most recent call last):
  File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module>
    if int(uptime) < int(timeout):

# oc -n openshift-infra get job
NAME                      DESIRED   SUCCESSFUL   AGE
hawkular-metrics-schema   1         1            6d

# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                 STORAGECLASS              REASON    AGE
pvc-e33d4076-14d8-11e9-8146-005056a108a0   25G        RWO            Delete           Bound     openshift-infra/metrics-cassandra-1   glusterfs-storage-block             6d

# oc get pvc
NAME                  STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
metrics-cassandra-1   Bound     pvc-e33d4076-14d8-11e9-8146-005056a108a0   25G        RWO            glusterfs-storage-block   6d

workaround : Increase readiness probe.

Comment 5 Jan Martiska 2019-02-20 07:30:15 UTC
PR containing this enhancement: https://github.com/openshift/openshift-ansible/pull/11216

Comment 14 errata-xmlrpc 2019-10-18 01:34:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.