1925179 – MG fix [continuation from bug 1893619]: Do not attempt creating helper pod if storagecluster/cephcluster already deleted

Bug 1925179 - MG fix [continuation from bug 1893619]: Do not attempt creating helper pod if storagecluster/cephcluster already deleted

Summary: MG fix [continuation from bug 1893619]: Do not attempt creating helper pod if...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	must-gather
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	RAJAT SINGH
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-04 14:33 UTC by Neha Berry
Modified:	2021-05-19 09:19 UTC (History)
CC List:	4 users (show)
Fixed In Version:	4.7.0-728.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-19 09:19:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
terminal output (152.94 KB, text/plain) 2021-02-04 14:33 UTC, Neha Berry	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ocs-operator pull 1044	None	open	must-gather:do not create helper pod if storagecluster is deleted	2021-02-14 10:46:06 UTC
Github	openshift ocs-operator pull 1077	None	open	Bug 1925179: [release-4.7] must-gather:do not create helper pod if storagecluster is deleted	2021-02-15 14:01:28 UTC
Red Hat Product Errata	RHSA-2021:2041	None	None	None	2021-05-19 09:19:45 UTC

Description Neha Berry 2021-02-04 14:33:57 UTC

Created attachment 1755077 [details]
terminal output

Description of problem (please be detailed as possible and provide log
snippests):
==============

This bug is in continuation with the fixes requested in Bug 1893619#c9.

b) No need to attempt creation of helper pod, as storagecluster/cephcluster is not present

Currently, even though storagecluster is deleted, must-gather still attempts to create a helper pod for 300s and then finally 

POD status stuck since rook-ceph-mon-endpoints already deleted
================
pod/must-gather-7564k-helper                        0/1     ContainerCreating   0          5m7s

pod describe
==================

$ oc describe  pod/must-gather-7564k-helper -n openshift-storage
Name:         must-gather-7564k-helper
Namespace:    openshift-storage
  Warning  FailedMount  55s (x11 over 7m7s)  kubelet            MountVolume.SetUp failed for volume "mon-endpoint-volume" : configmap "rook-ceph-mon-endpoints" not found


From terminal logs
======================
[must-gather-7564k] POD waiting for helper pod and debug pod for 0 seconds
[must-gather-7564k] POD waiting for the ip-10-0-132-174us-east-2computeinternal-debug pod to be in ready state
[must-gather-7564k] POD waiting for helper pod and debug pod for 3 seconds


[must-gather-7564k] POD collecting dump of noobaa-operator-77464f9777-7nf9f pod from openshift-storage
[must-gather-7564k] POD Skipping ceph collection as Storage Cluster is not present
[must-gather-7564k] POD pod "must-gather-7564k-helper" deleted




Discussion = https://chat.google.com/room/AAAAREGEba8/nh97ybwG23o

Version of all relevant components (if applicable):
=====================================================
OCS = 4.7.0-250.ci
OCP = 4.7.0-0.nightly-2021-02-04-031352

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
=========================================================================
No

Is there any workaround available to the best of your knowledge?
===================================================
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
=================================================
3

Can this issue reproducible?
================================
Yes

Can this issue reproduce from the UI?
=======================================
NA

If this is a regression, please provide more details to justify this:
=============================================================
NO

Steps to Reproduce:
=======================
2 scenarios can lead to this issue

>> Scenario 1) Installed OCS operator and initiated must-gather
a)Operator Hub->Install OCS operator
b) When the operator pods are up and CSV is in succeeded state, initiate must-gather

oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7

Note: Do not install storage cluster

>> Scenario 2) Triggered Uninstall of OCS by deleting storage cluster ; initiate must-gather 

a) Delete storagecluster in a running OCS cluster (follow Uninstall docs)

$ oc delete storagecluster --all -n openshift-storage --wait=true --timeout=5m
initiate must-gather 

b) Once storagecluster and dependent ceph cluster are deleted, initiate must-gather

$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7


Actual results:
=====================
helper pod still tries to come up but fails when storagecluster is already deleted

Expected results:
=====================
With storagecluster deleted, MG should not attempt to bring up helper pod.


Additional info:
=======================

$ oc describe  pod/must-gather-7564k-helper -n openshift-storage
Name:         must-gather-7564k-helper
Namespace:    openshift-storage
Priority:     0
Node:         ip-10-0-132-174.us-east-2.compute.internal/10.0.132.174
Start Time:   Thu, 04 Feb 2021 17:06:05 +0530
Labels:       must-gather-helper-pod=
Annotations:  openshift.io/scc: rook-ceph
Status:       Pending
IP:           
IPs:          <none>
Containers:
  must-gather-helper:
    Container ID:  
    Image:         quay.io/rhceph-dev/rook-ceph@sha256:9ad045ea253aa7e0938307a182196759bd3f3dc29edd2e8cee8073ba3f7c2040
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /tini
    Args:
      -g
      --
      /usr/local/bin/toolbox.sh
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ROOK_CEPH_USERNAME:  <set to the key 'ceph-username' in secret 'rook-ceph-mon'>  Optional: false
      ROOK_CEPH_SECRET:    <set to the key 'ceph-secret' in secret 'rook-ceph-mon'>    Optional: false
    Mounts:
      /dev from dev (rw)
      /etc/rook from mon-endpoint-volume (rw)
      /lib/modules from libmodules (rw)
      /sys/bus from sysbus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5nqfv (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  sysbus:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/bus
    HostPathType:  
  libmodules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  mon-endpoint-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-ceph-mon-endpoints
    Optional:  false
  default-token-5nqfv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5nqfv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    7m7s                 default-scheduler  Successfully assigned openshift-storage/must-gather-7564k-helper to ip-10-0-132-174.us-east-2.compute.internal
  Warning  FailedMount  5m4s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[mon-endpoint-volume], unattached volumes=[mon-endpoint-volume default-token-5nqfv dev sysbus libmodules]: timed out waiting for the condition
  Warning  FailedMount  58s (x2 over 3m1s)   kubelet            Unable to attach or mount volumes: unmounted volumes=[mon-endpoint-volume], unattached volumes=[dev sysbus libmodules mon-endpoint-volume default-token-5nqfv]: timed out waiting for the condition
  Warning  FailedMount  55s (x11 over 7m7s)  kubelet            MountVolume.SetUp failed for volume "mon-endpoint-volume" : configmap "rook-ceph-mon-endpoints" not found

Comment 7 errata-xmlrpc 2021-05-19 09:19:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Note You need to log in before you can comment on or make changes to this bug.