Bug 1643948

Summary: Cluster console doesn't display the real value of Crashlooping Pods (it displays 0)
Product: OpenShift Container Platform Reporter: Alberto Gonzalez de Dios <algonzal>
Component: Management ConsoleAssignee: Samuel Padgett <spadgett>
Status: CLOSED ERRATA QA Contact: Yadan Pei <yapei>
Severity: low Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas, rsandu, smunilla, spadgett, yapei
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the cluster console in OpenShift 3.11 would always show the value "0" for the Crashlooping Pods count on the cluster status page even when there were crashlooping pods. The problem has been fixed, and the count now accurately reflects the count for the selected projects.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-20 03:11:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Cluster console Crashlooping Pod counter none

Description Alberto Gonzalez de Dios 2018-10-29 14:17:20 UTC
Created attachment 1498583 [details]
Cluster console Crashlooping Pod counter

Description of problem: 
Crashlooping Pod number in Cluster console doesn't display the real value. It displays "0" instead of the real pod number value in "CrashLoopBackOff" state.


Version-Release number of selected component (if applicable):
Openshift 3.11


How reproducible:
Create a new app, restart a pod may times so it becomes in Crahsloop state, and check Openshift Cluster Console. Instead of showing a Crashlooping value of "1", it always displays "0".


Steps to Reproduce:
1. Create a new project test:
oc new-project test
2. Create a new test app:
oc new-project testoc new-app https://github.com/openshift/sti-ruby.git --context-dir=2.0/test/puma-test-app
3. Get POD Container ID:
docker ps -a | grep ruby | grep ose-pod | grep Up
4. Kill POD Container ID with SIGTERM (I used SIGHUP):
docker kill --signal=SIGHUP CONTAINER-ID
5. Repeat 3 and 4 until POD status changes to "CrashLoopBackOff"
watch -n 5 "docker kill --signal=SIGHUP $(docker ps -a | grep ruby | grep ose-pod | grep Up | awk '{print $1}')"
oc get pods | grep Crash
6. Check Cluster Console (make sure Project is the new one, "test")


Actual results:
Crashlooping Pods number in Cluster Console remains as "0" instead of "1"


Expected results:
Crashlooping Pods number in Cluster Console should be "1"

Comment 1 Samuel Padgett 2018-10-30 11:16:51 UTC
Fixed by https://github.com/openshift/console/pull/716

Comment 5 Yadan Pei 2018-11-05 06:30:00 UTC
1. create dummy pods

2. check status on cluster consoleļ¼Œ Pods page and Home -> Status page

Crashlooping Pods are NOT showing on Status page, recording in attachment

Comment 6 Yadan Pei 2018-11-05 06:31:04 UTC
apiVersion: v1
kind: Pod
metadata:
  name: dummy-pod
spec:
  containers:
    - name: dummy-pod
      image: ubuntu
  restartPolicy: Always

Comment 9 Yadan Pei 2018-11-05 06:44:06 UTC
Verify the bug on openshift v3.11.38

Comment 10 Samuel Padgett 2018-11-05 13:43:54 UTC
(In reply to Yadan Pei from comment #5)
> 1. create dummy pods
> 
> 2. check status on cluster consoleļ¼Œ Pods page and Home -> Status page
> 
> Crashlooping Pods are NOT showing on Status page, recording in attachment

We're querying Prometheus for pods with 5 container restarts within the last 5 minutes. It might take a few minutes to update (as you've found).

Comment 11 Yadan Pei 2018-11-06 02:14:56 UTC
Thanks for the info Sam

Comment 14 errata-xmlrpc 2018-11-20 03:11:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537