Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1551474

Summary:

[3.7]hawkular metrics pod failed at liveness check, pod can not be started up

Product:

OpenShift Container Platform

Reporter:

Junqi Zhao <juzhao>

Component:

Hawkular

Assignee:

Ruben Vargas Palma <rvargasp>

Status:

CLOSED DEFERRED

QA Contact:

Junqi Zhao <juzhao>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.7.0

CC:

aos-bugs, jlee, jrosenta, juzhao, rvargasp, suchaudh

Target Milestone:

---

Keywords:

Regression

Target Release:

3.7.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1567827 1613130 (view as bug list)

Environment:

Last Closed:

2019-11-20 18:49:57 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1567827, 1613130

Attachments:

Description	Flags
metrics pods log	none

Description Junqi Zhao 2018-03-05 09:15:46 UTC

Description of problem:
Depoly metrics with the currently latest images, hawkular metrics pod failed at liveness check, pod can not be started up
metrics-hawkular-metrics/images/v3.7.37-1
metrics-cassandra/images/v3.7.36-1
metrics-heapster/images/v3.7.36-1

Note: Try again with metrics-hawkular-metrics-v3.7.36-1, it does not have this issue.

# oc get po
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-6wpjn   1/1       Running            0          16m
hawkular-metrics-6xxrs       0/1       CrashLoopBackOff   8          16m
heapster-sjzt5               0/1       Running            1          16m

# oc describe po hawkular-metrics-6xxrs
***************************************snipped**********************************
  16m		15m		4	kubelet, 172.16.120.80	spec.containers{hawkular-metrics}	Warning		Unhealthy		Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>.
Traceback (most recent call last):
  File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module>
    if int(uptime) < int(timeout):
ValueError: invalid literal for int() with base 10: ''

  16m	15m	4	kubelet, 172.16.120.80	spec.containers{hawkular-metrics}	Warning	Unhealthy	Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again.

  15m	15m	3	kubelet, 172.16.120.80	spec.containers{hawkular-metrics}	Normal	Pulled	Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-hawkular-metrics:v3.7" already present on machine
  15m	1m	64	kubelet, 172.16.120.80	spec.containers{hawkular-metrics}	Warning	BackOff	Back-off restarting failed container
***************************************snipped**********************************


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy metrics 3.7 via ansible
2.
3.

Actual results:
hawkular metrics pod failed at liveness check, pod can not be started up

Expected results:
All pods should be healthy

Additional info:

Comment 1 Junqi Zhao 2018-03-05 09:16:38 UTC

Blocks metrics installation and other feature testings

Comment 2 John Sanda 2018-03-05 14:37:00 UTC

Please provide logs, the output of `oc get pods -o yaml`, and `oc get pods --all-namespaces | wc -l`.

A very common cause for the livenes probe failing is heap pressure. GC logs are written to /opt/eap/standalone/log. You can try to capture any GC log files with `oc cp <hawkular-metrics-pod>:/opt/eap/standalone/log hawkular-metrics-log`. That directory is lost on container restart so you may or may not be able to get GC log files.

Comment 4 Junqi Zhao 2018-03-05 15:46:45 UTC

Created attachment 1404375 [details]
metrics pods log

Comment 8 Junqi Zhao 2018-03-09 08:59:33 UTC

Tested with metrics-hawkular-metrics-v3.7.36-2, issue does not happen

Images:
metrics-cassandra-v3.7.37-1
metrics-hawkular-metrics-v3.7.36-2
metrics-heapster-v3.7.37-1


# openshift version
openshift v3.7.36
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8


# oc get po -n openshift-infra
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-vql6d   1/1       Running   0          27m
hawkular-metrics-lgt4m       1/1       Running   0          27m
heapster-l6z7c               1/1       Running   0          27m

Comment 9 Junqi Zhao 2018-04-03 07:44:33 UTC

Tested with metrics-hawkular-metrics-v3.7.42-2, issue does not happen

Images
metrics-hawkular-metrics/images/v3.7.42-2
metrics-cassandra/images/v3.7.42-2
metrics-heapster/images/v3.7.42-2

Comment 21 giriraj rajawat 2018-08-02 08:13:57 UTC

Team can we have an update on this , Customer is facing the issue.
Let us know if you need more information on this from customer end.

Thanks,
Giriraj Rajawat

Comment 22 John Sanda 2018-08-06 19:31:36 UTC

Joel, did updating the image resolve the problem?

Comment 23 John Sanda 2018-08-06 19:34:00 UTC

I am resetting the version to 3.7 since that is the version for which the problem was reported.

Giriraj, can you please open a separate ticket (or clone this one)? Thanks.

Comment 27 Stephen Cuppett 2019-11-20 18:49:57 UTC

OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift

Comment 28 Red Hat Bugzilla 2023-09-15 00:06:45 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days