Bug 1925179

Summary: MG fix [continuation from bug 1893619]: Do not attempt creating helper pod if storagecluster/cephcluster already deleted
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: must-gatherAssignee: RAJAT SINGH <rajasing>
Status: CLOSED ERRATA QA Contact: Neha Berry <nberry>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: muagarwa, ocs-bugs, rajasing, sabose
Target Milestone: ---   
Target Release: OCS 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.7.0-728.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 09:19:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
terminal output none

Description Neha Berry 2021-02-04 14:33:57 UTC
Created attachment 1755077 [details]
terminal output

Description of problem (please be detailed as possible and provide log
snippests):
==============

This bug is in continuation with the fixes requested in Bug 1893619#c9.

b) No need to attempt creation of helper pod, as storagecluster/cephcluster is not present

Currently, even though storagecluster is deleted, must-gather still attempts to create a helper pod for 300s and then finally 

POD status stuck since rook-ceph-mon-endpoints already deleted
================
pod/must-gather-7564k-helper                        0/1     ContainerCreating   0          5m7s

pod describe
==================

$ oc describe  pod/must-gather-7564k-helper -n openshift-storage
Name:         must-gather-7564k-helper
Namespace:    openshift-storage
  Warning  FailedMount  55s (x11 over 7m7s)  kubelet            MountVolume.SetUp failed for volume "mon-endpoint-volume" : configmap "rook-ceph-mon-endpoints" not found


From terminal logs
======================
[must-gather-7564k] POD waiting for helper pod and debug pod for 0 seconds
[must-gather-7564k] POD waiting for the ip-10-0-132-174us-east-2computeinternal-debug pod to be in ready state
[must-gather-7564k] POD waiting for helper pod and debug pod for 3 seconds


[must-gather-7564k] POD collecting dump of noobaa-operator-77464f9777-7nf9f pod from openshift-storage
[must-gather-7564k] POD Skipping ceph collection as Storage Cluster is not present
[must-gather-7564k] POD pod "must-gather-7564k-helper" deleted




Discussion = https://chat.google.com/room/AAAAREGEba8/nh97ybwG23o

Version of all relevant components (if applicable):
=====================================================
OCS = 4.7.0-250.ci
OCP = 4.7.0-0.nightly-2021-02-04-031352

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
=========================================================================
No

Is there any workaround available to the best of your knowledge?
===================================================
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
=================================================
3

Can this issue reproducible?
================================
Yes

Can this issue reproduce from the UI?
=======================================
NA

If this is a regression, please provide more details to justify this:
=============================================================
NO

Steps to Reproduce:
=======================
2 scenarios can lead to this issue

>> Scenario 1) Installed OCS operator and initiated must-gather
a)Operator Hub->Install OCS operator
b) When the operator pods are up and CSV is in succeeded state, initiate must-gather

oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7

Note: Do not install storage cluster

>> Scenario 2) Triggered Uninstall of OCS by deleting storage cluster ; initiate must-gather 

a) Delete storagecluster in a running OCS cluster (follow Uninstall docs)

$ oc delete storagecluster --all -n openshift-storage --wait=true --timeout=5m
initiate must-gather 

b) Once storagecluster and dependent ceph cluster are deleted, initiate must-gather

$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7


Actual results:
=====================
helper pod still tries to come up but fails when storagecluster is already deleted

Expected results:
=====================
With storagecluster deleted, MG should not attempt to bring up helper pod.


Additional info:
=======================

$ oc describe  pod/must-gather-7564k-helper -n openshift-storage
Name:         must-gather-7564k-helper
Namespace:    openshift-storage
Priority:     0
Node:         ip-10-0-132-174.us-east-2.compute.internal/10.0.132.174
Start Time:   Thu, 04 Feb 2021 17:06:05 +0530
Labels:       must-gather-helper-pod=
Annotations:  openshift.io/scc: rook-ceph
Status:       Pending
IP:           
IPs:          <none>
Containers:
  must-gather-helper:
    Container ID:  
    Image:         quay.io/rhceph-dev/rook-ceph@sha256:9ad045ea253aa7e0938307a182196759bd3f3dc29edd2e8cee8073ba3f7c2040
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /tini
    Args:
      -g
      --
      /usr/local/bin/toolbox.sh
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ROOK_CEPH_USERNAME:  <set to the key 'ceph-username' in secret 'rook-ceph-mon'>  Optional: false
      ROOK_CEPH_SECRET:    <set to the key 'ceph-secret' in secret 'rook-ceph-mon'>    Optional: false
    Mounts:
      /dev from dev (rw)
      /etc/rook from mon-endpoint-volume (rw)
      /lib/modules from libmodules (rw)
      /sys/bus from sysbus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5nqfv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  sysbus:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/bus
    HostPathType:  
  libmodules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  mon-endpoint-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-ceph-mon-endpoints
    Optional:  false
  default-token-5nqfv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5nqfv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    7m7s                 default-scheduler  Successfully assigned openshift-storage/must-gather-7564k-helper to ip-10-0-132-174.us-east-2.compute.internal
  Warning  FailedMount  5m4s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[mon-endpoint-volume], unattached volumes=[mon-endpoint-volume default-token-5nqfv dev sysbus libmodules]: timed out waiting for the condition
  Warning  FailedMount  58s (x2 over 3m1s)   kubelet            Unable to attach or mount volumes: unmounted volumes=[mon-endpoint-volume], unattached volumes=[dev sysbus libmodules mon-endpoint-volume default-token-5nqfv]: timed out waiting for the condition
  Warning  FailedMount  55s (x11 over 7m7s)  kubelet            MountVolume.SetUp failed for volume "mon-endpoint-volume" : configmap "rook-ceph-mon-endpoints" not found

Comment 7 errata-xmlrpc 2021-05-19 09:19:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041