Bug 1816500 - Readiness and Liveness probes are failing for the application pods
Summary: Readiness and Liveness probes are failing for the application pods
Keywords:
Status: CLOSED DUPLICATE of bug 1812004
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.11.0
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.5.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-24 07:04 UTC by manisha
Modified: 2023-09-07 22:31 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-26 13:48:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description manisha 2020-03-24 07:04:33 UTC
Description of problem: Cu is facing an issue where readiness and liveness probe for application pods are failing when given the resources after analyzing the grafana dashboard statistics but since behind it Prometheus query includes 'namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate' is computed over a 5 minutes range so it doesn't reflect the instantaneous CPU usage. therefore asked to check cpu and memory usage using 'oc adm top pods'. According to it a pod use 2m CPU So he assigned 50-100m for this pod however healthchech fails continues for the pod. checked that issue is neither neither node specific nor application specific.

Also, when cpu and memory limits are increased then pods are working fine though the project events keeping logging with readiness and liveness probe failed.


Actual results: Application pod is failing with the readiness and the liveness probe failed.

Comment 4 Pawel Krupa 2020-03-26 13:48:10 UTC
Before OpenShift 4.5 a query responsible for data visualized with `oc adm top pods` came from a query which wasn't instant and output was smoothed over 5m time. After fixing this with https://bugzilla.redhat.com/show_bug.cgi?id=1812004 this is no longer the case. 

Bug won't be backported as it is not critical for cluster operations.

*** This bug has been marked as a duplicate of bug 1812004 ***


Note You need to log in before you can comment on or make changes to this bug.