Created attachment 1760405 [details] Terminal log Description of problem (please be detailed as possible and provide log snippests): ===================================================================== This bug is raised to report the issues discussed in chat thread [1]. @pulkit please add any more enhancement you seem deemed fit for this bug(as discussed in the chats) Issues --------- 1. If the MG is collected when Storagecluster is not yet created or is already deleted(node labels also removed), debug pods and helper pod creation is skipped. Hence these processes do no run in background and no PID is generated. But we still see following incomplete log message where ofcourse PIDs are missing: [must-gather-zr45d] POD not creating helper pod since storagecluster is not present >>[must-gather-zr45d] POD waiting for to terminate Since no instance name/PID exists, hence we should rather skip printing this altogether In a normal OCS cluster, it looks like this [must-gather-vbvzz] POD pod/must-gather-vbvzz-helper labeled [must-gather-vbvzz] POD waiting for 103 104 106 107 to terminate 2. if no storagecluster created, do we really need attempts at collecting following noobaa related stuffs (If yes, ignore this comment) collecting dump of noobaa Wrote inspect data to must-gather/noobaa. collecting dump of backingstore Wrote inspect data to must-gather/noobaa. collecting dump of bucketclass Wrote inspect data to must-gather/noobaa. [1] - https://chat.google.com/room/AAAAREGEba8/B5FNcAjENMY Version of all relevant components (if applicable): ====================================================== OCS 4.7 all versions Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ================================================================ No Is there any workaround available to the best of your knowledge? ============================================= No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? =================================================== 1 Can this issue reproducible? =============================== Always Can this issue reproduce from the UI? ====================================== NA If this is a regression, please provide more details to justify this: ============================================================== Not sure. pulkit can confirm Steps to Reproduce: ====================== 1. Install OCS operator. Do not install Storagecluster 2. Initiate a must-gather collection oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7 |tee terminal-must-gather Similar observation seen in another repoducer: 1. Install OCS operator and create storagecluster 2. Delete storagecluster and follow uninstall steps(remove OCS completely, along with OCS node label) Expected results: ====================== If PIDs are not created, we do not need the message to be printed Additional info: ======================= -------------- ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.7.0-280.ci OpenShift Container Storage 4.7.0-280.ci Succeeded -------------- =======PODS ====== NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES noobaa-operator-5f6c776566-2tdfs 1/1 Running 0 47s 10.131.0.54 compute-2 <none> <none> ocs-metrics-exporter-79db8f64-vr97x 1/1 Running 0 47s 10.131.2.244 compute-1 <none> <none> ocs-operator-6dbf6f8c97-75x6k 1/1 Running 0 47s 10.128.4.214 compute-4 <none> <none> rook-ceph-operator-79dfd4d7d6-vlznh 1/1 Running 0 47s 10.130.2.146 compute-5 <none> <none>
(In reply to Neha Berry from comment #0) > Created attachment 1760405 [details] > Terminal log > > Description of problem (please be detailed as possible and provide log > snippests): > ===================================================================== > This bug is raised to report the issues discussed in chat thread [1]. > @pulkit please add any more enhancement you seem deemed fit for this bug(as > discussed in the chats) > > Issues > --------- > > 1. If the MG is collected when Storagecluster is not yet created or is > already deleted(node labels also removed), debug pods and helper pod > creation is skipped. Hence these processes do no run in background and no > PID is generated. But we still see following incomplete log message where > ofcourse PIDs are missing: > > [must-gather-zr45d] POD not creating helper pod since storagecluster is not > present > >>[must-gather-zr45d] POD waiting for to terminate > > Since no instance name/PID exists, hence we should rather skip printing this > altogether Ok, so the message "waiting to terminate" is coming from debug pods that get created irrespective of if storagecluster is present or not. We can skip creating debug pods if storagecluster is not present. > > In a normal OCS cluster, it looks like this > > [must-gather-vbvzz] POD pod/must-gather-vbvzz-helper labeled > [must-gather-vbvzz] POD waiting for 103 104 106 107 to terminate > > 2. if no storagecluster created, do we really need attempts at collecting > following noobaa related stuffs (If yes, ignore this comment) I mean, I understand it correctly, we still want to collect namespace resources and noobaa resources irrespective of ceph collection. Correct me if I am wrong here @ > > collecting dump of noobaa > Wrote inspect data to must-gather/noobaa. > collecting dump of backingstore > Wrote inspect data to must-gather/noobaa. > collecting dump of bucketclass > Wrote inspect data to must-gather/noobaa. > > > [1] - https://chat.google.com/room/AAAAREGEba8/B5FNcAjENMY > > Version of all relevant components (if applicable): > ====================================================== > OCS 4.7 all versions > > > Does this issue impact your ability to continue to work with the product > (please explain in detail what is the user impact)? > ================================================================ > No > > Is there any workaround available to the best of your knowledge? > ============================================= > No > > Rate from 1 - 5 the complexity of the scenario you performed that caused this > bug (1 - very simple, 5 - very complex)? > =================================================== > 1 > > Can this issue reproducible? > =============================== > Always > > Can this issue reproduce from the UI? > ====================================== > NA > > If this is a regression, please provide more details to justify this: > ============================================================== > Not sure. pulkit can confirm > > Steps to Reproduce: > ====================== > > 1. Install OCS operator. Do not install Storagecluster > 2. Initiate a must-gather collection > oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7 > |tee terminal-must-gather > > Similar observation seen in another repoducer: > > 1. Install OCS operator and create storagecluster > 2. Delete storagecluster and follow uninstall steps(remove OCS completely, > along with OCS node label) > > > > Expected results: > ====================== > If PIDs are not created, we do not need the message to be printed > > Additional info: > ======================= > -------------- > ========CSV ====== > NAME DISPLAY VERSION > REPLACES PHASE > ocs-operator.v4.7.0-280.ci OpenShift Container Storage 4.7.0-280.ci > Succeeded > -------------- > =======PODS ====== > NAME READY STATUS RESTARTS AGE IP > NODE NOMINATED NODE READINESS GATES > noobaa-operator-5f6c776566-2tdfs 1/1 Running 0 47s > 10.131.0.54 compute-2 <none> <none> > ocs-metrics-exporter-79db8f64-vr97x 1/1 Running 0 47s > 10.131.2.244 compute-1 <none> <none> > ocs-operator-6dbf6f8c97-75x6k 1/1 Running 0 47s > 10.128.4.214 compute-4 <none> <none> > rook-ceph-operator-79dfd4d7d6-vlznh 1/1 Running 0 47s > 10.130.2.146 compute-5 <none> <none> pkundra
this is the PR https://github.com/openshift/ocs-operator/pull/1109
PR for skipping the noobaa when storagecluster is not present https://github.com/openshift/ocs-operator/pull/1132
Created attachment 1790126 [details] terminal output @rajasing sorry I missed my Needinfo Tested with ocs-4.8.0-416.ci and it seems that both the issues reported in Comment#0 are still not fixed 1. If the MG is collected when Storagecluster is not yet created or is already deleted(node labels also removed), debug pods and helper pod creation is skipped. Hence these processes do no run in background and no PID is generated. But we still see following incomplete log message where ofcourse PIDs are missing: [must-gather-zr45d] POD not creating helper pod since storagecluster is not present >>[must-gather-zr45d] POD waiting for to terminate Since no instance name/PID exists, hence we should rather skip printing this altogether 2. if no storagecluster created, do we really need attempts at collecting following noobaa related stuffs . collecting dump of noobaa Wrote inspect data to must-gather/noobaa. collecting dump of backingstore Wrote inspect data to must-gather/noobaa. collecting dump of bucketclass Wrote inspect data to must-gather/noobaa. Noobaa operator pod is the only available pod until we create storagecluster, so only this makes sense: [must-gather-5vbjl] POD 2021-06-11T07:46:23.820021735Z collecting dump of noobaa-operator-866c7c65d4-st76g pod from openshift-storage If noobaa is not yet installed, not sure if collecting backingstore, bucketclass, dump of noobaa, status of noobaa and obc list makes sense.. Even RGW is not yet up, so we wont even have RGW based obcs and bucketclasses. Fri Jun 11 08:09:34 AM UTC 2021 -------------- ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-416.ci OpenShift Container Storage 4.8.0-416.ci Succeeded -------------- =======PODS ====== NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES noobaa-operator-866c7c65d4-st76g 1/1 Running 0 31m 10.131.1.139 compute-0 <none> <none> ocs-metrics-exporter-6dffc4d6bb-pwzd9 1/1 Running 0 31m 10.129.2.16 compute-2 <none> <none> ocs-operator-768678d7ff-86w2h 1/1 Running 0 31m 10.128.2.14 compute-1 <none> <none> rook-ceph-operator-7c655dfbdb-6tthb 1/1 Running 0 31m 10.129.2.15 compute-2 <none> <none> -------------- ======= PVC ========== No resources found in openshift-storage namespace. -------------- ======= storagecluster ========== No resources found in openshift-storage namespace. -------------- ======= cephcluster ========== No resources found in openshift-storage namespace. ======= backingstore ========== No resources found in openshift-storage namespace. ======= PV ==== No resources found ======= bucketclass ========== No resources found in openshift-storage namespace. ======= obc ========== No resources found
Discussed offline, not a blocker for 4.8
Rewant, please sync up with Neha once and check if something is pending here.
As discussed a new bug is already created for #2 https://bugzilla.redhat.com/show_bug.cgi?id=2015408
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086