Created attachment 1725682 [details] terminal-log Description of problem (please be detailed as possible and provide log snippests): --------------------------------------------------------------------------- Even When storage cluster is already deleted, must-gather tries to collect ceph outputs, both in internal and external mode(not supposed to attempt at all after fix of Bug 1845976) Related BZ for helper pod re-try & attempt @ collecting ceph outputs - Bug 1893611 Special mention ====================== As per Bug 1845976 fix, in external mode, must-gather skips attempt to collect ceph outputs via toolbox and doesn't bring up a must-gather-helper pod to accomplish that. But, in corner cases,when the storage cluster(read -also cephcluster) is already deleted(e.g. when uninstall is initiated and is half-way through), and one tries to collect must-gather, helper pod tries to come up(but fails) to collect ceph outputs (not possible for external mode). >> Also, when the storagecluster is already deleted (even for internal mode), following errors are seen in the must-gather collection outputs - https://bugzilla.redhat.com/show_bug.cgi?id=1893611#c3 Version of all relevant components (if applicable): --------------------------------------- tested in OCS 4.6 ; ocs-operator.v4.6.0-144.ci and ocs-operator.v4.6.0-147.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? -------------------------------------------------- No. But it throws some errors while collecting must-gather Is there any workaround available to the best of your knowledge? ------------------------------------------------------- Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ----------------------------------------------- 3 Can this issue reproducible? ------------------------------- yes. Can this issue reproduce from the UI? --------------------------------------- NA If this is a regression, please provide more details to justify this: ------------------------------------------------- No Steps to Reproduce: ----------------------- 1. Delete storagecluster, esp in an external mode cluster 2. Start must-gather oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.6 |tee terminal-must-gather 3. Check in the absence of storagecluster, it tries to bring up a helper pod and collect ceph outputs(not possible) 4. Same behavior is seen in internal mode cluster too (logs https://bugzilla.redhat.com/show_bug.cgi?id=1893611#c2) Actual results: ------------------- With Storage cluster deleted, must-gather has no way to know if it was an external cluster, hence tries to collect ceph outputs via helper toolbox pod (which obviously fails to come up) Expected results: ----------------------- Internal mode: If storage cluster is deleted, no attempt should be made to collect ceph outputs as cephcluster is already deleted External Mode: the behavior should be the same as above + external mode should skip creating helper pod (as it does when storagecluster exists and the "external" status of cluster is known) Additional info: ------------------------- Before deletion of storagecluster Wed Oct 28 13:21:08 UTC 2020 -------------- ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-144.ci OpenShift Container Storage 4.6.0-144.ci Succeeded -------------- =======PODS ====== NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES csi-cephfsplugin-nbqmv 3/3 Running 0 21m 10.1.160.161 compute-1 <none> <none> csi-cephfsplugin-provisioner-56455449bd-4ck6g 6/6 Running 0 21m 10.129.2.66 compute-2 <none> <none> csi-cephfsplugin-provisioner-56455449bd-6kbq4 6/6 Running 0 21m 10.131.0.185 compute-1 <none> <none> csi-cephfsplugin-s4v9w 3/3 Running 0 21m 10.1.160.165 compute-0 <none> <none> csi-cephfsplugin-zmm4l 3/3 Running 0 21m 10.1.160.180 compute-2 <none> <none> csi-rbdplugin-8dxmp 3/3 Running 0 21m 10.1.160.165 compute-0 <none> <none> csi-rbdplugin-8jw6k 3/3 Running 0 21m 10.1.160.161 compute-1 <none> <none> csi-rbdplugin-mtdwc 3/3 Running 0 21m 10.1.160.180 compute-2 <none> <none> csi-rbdplugin-provisioner-586fc6cfc-6bzxb 6/6 Running 0 21m 10.131.0.184 compute-1 <none> <none> csi-rbdplugin-provisioner-586fc6cfc-8b2xl 6/6 Running 0 21m 10.128.2.46 compute-0 <none> <none> noobaa-core-0 1/1 Terminating 0 21m 10.128.2.47 compute-0 <none> <none> noobaa-endpoint-6799cdd795-stzwq 1/1 Terminating 0 20m 10.128.2.48 compute-0 <none> <none> noobaa-operator-f7789cf94-wp74l 1/1 Running 0 23h 10.131.1.213 compute-1 <none> <none> ocs-metrics-exporter-576f474c87-9r7bv 1/1 Running 0 23h 10.129.3.104 compute-2 <none> <none> ocs-operator-686fd84dd7-6l45s 1/1 Running 0 23h 10.129.3.102 compute-2 <none> <none> rook-ceph-operator-7558fcf89c-wmjr4 1/1 Running 0 23h 10.129.3.103 compute-2 <none> <none> -------------- ======= PVC ========== -------------- ======= storagecluster ========== NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-external-storagecluster 21m Deleting true 2020-10-28T12:59:46Z 4.6.0 -------------- ======= cephcluster ========== NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH ocs-external-storagecluster-cephcluster 21m Deleting Cluster is deleting HEALTH_OK
@pulkit could you also check why we get errors listed here - https://bugzilla.redhat.com/show_bug.cgi?id=1893611#c3 esp. about cephobjectstoreUsers when the storagecluster does not exist (not dependent on helper pod) [must-gather-gn4xm] POD collecting dump cephobjectstoreusers [must-gather-gn4xm] POD error: error executing jsonpath "{range .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in range, nothing to end. Printing more information for debugging the template: [must-gather-gn4xm] POD template was: [must-gather-gn4xm] POD {range .items[*]}{@.metadata.name}{'\n'}{end} [must-gather-gn4xm] POD object given to jsonpath engine was: [must-gather-gn4xm] POD map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} [must-gather-gn4xm] POD [must-gather-gn4xm] POD
(In reply to Neha Berry from comment #3) > @pulkit could you also check why we get errors listed here - > https://bugzilla.redhat.com/show_bug.cgi?id=1893611#c3 > > esp. about cephobjectstoreUsers when the storagecluster does not exist (not > dependent on helper pod) > > > > [must-gather-gn4xm] POD collecting dump cephobjectstoreusers > [must-gather-gn4xm] POD error: error executing jsonpath "{range > .items[*]}{@.metadata.name}{'\\n'}{end}": Error executing template: not in > range, nothing to end. Printing more information for debugging the template: > [must-gather-gn4xm] POD template was: > [must-gather-gn4xm] POD {range > .items[*]}{@.metadata.name}{'\n'}{end} > [must-gather-gn4xm] POD object given to jsonpath engine was: > [must-gather-gn4xm] POD map[string]interface > {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", > "metadata":map[string]interface {}{"resourceVersion":"", "selfLink":""}} > [must-gather-gn4xm] POD > [must-gather-gn4xm] POD Bug raised - https://bugzilla.redhat.com/show_bug.cgi?id=1893619
PR Link : https://github.com/openshift/ocs-operator/pull/968
@nberry I have added better logging commands and now, MG would not try to collect logs when an external cluster is not present. It would be great if you can test this Pr from your end as well. Thanks
Hi Rajat, Is it OK if we test this fix once the bug is ON_QA. If the PR looks good, feel free to move bug to modified and ON_QA once it is part of a build. Adding qa_ack.
Sure!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041