Environment info NooBaa Version: 5.9.0-20210722 ODF version: 4.9.0-214.ci Platform: OpenShift 4.9 Actual behavior Files are of 0 size and no content inside Expected behavior Files shouldn't be of 0 size Steps to reproduce Execute oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.9 Go to /root/must-gather.local.458190093719636697/quay-io-rhceph-dev-ocs-must-gather-sha256-2cfd1466b584aba7c3d94a6d2397267443416fe7db8a7fc132e4137ed640ca8f/noobaa/raw_output More information - Screenshots / Logs / Other output [root.cp.fyre.ibm.com raw_output]# ll total 23524 -rw-r--r-- 1 root root 2957 Nov 23 21:49 BackingStoreList_crs.yaml -rw-r--r-- 1 root root 2592 Nov 23 21:49 BucketClassList_crs.yaml -rw-r--r-- 1 root root 0 Nov 23 21:49 db-noobaa-db-pg-0-pvc-describe.txt -rw-r--r-- 1 root root 2663 Nov 23 21:49 NamespaceStoreList_crs.yaml -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-core-0-core.log -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-core-0-core-previous.log -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-core-0-pod-describe.txt -rw-r--r-- 1 root root 1543 Nov 23 21:49 noobaa-db-pg-0-db.log -rw-r--r-- 1 root root 1543 Nov 23 21:49 noobaa-db-pg-0-db-previous.log -rw-r--r-- 1 root root 8898 Nov 23 21:49 noobaa-db-pg-0-initialize-database.log -rw-r--r-- 1 root root 221 Nov 23 21:49 noobaa-db-pg-0-init.log -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-db-pg-0-pod-describe.txt -rw-r--r-- 1 root root 12576413 Nov 23 21:49 noobaa-default-backing-store-noobaa-pod-76ea4a0d-noobaa-agent.log -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-default-backing-store-noobaa-pod-76ea4a0d-pod-describe.txt -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-default-backing-store-noobaa-pvc-76ea4a0d-pvc-describe.txt -rw-r--r-- 1 root root 742051 Nov 23 21:49 noobaa_diagnostics_1637732969.tar.gz -rw-r--r-- 1 root root 25472 Nov 23 21:49 noobaa-endpoint-5554c7f8cb-6r9lh-endpoint.log -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-endpoint-5554c7f8cb-6r9lh-pod-describe.txt -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-endpoint-scc-describe.txt -rw-r--r-- 1 root root 6349 Nov 23 21:49 NooBaaList_crs.yaml -rw-r--r-- 1 root root 10680815 Nov 23 21:49 noobaa-operator-67c57b5464-68mlz-noobaa-operator.log -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-operator-67c57b5464-68mlz-pod-describe.txt -rw-r--r-- 1 root root 0 Nov 23 21:49 noobaa-scc-describe.txt -rw-r--r-- 1 root root 15 Nov 23 21:49 obc_list -rw-r--r-- 1 root root 1897 Nov 23 21:49 status
Opened on OCP instead of ODF, changing components
Can you attach must-gather logs for the cluster and on which env are you running the cluster?
Created attachment 1844112 [details] Attaching the logs
Environment info is mentioned in the ticket at the time of filing it. Let me know if more details are required
Moving to 4.10 and cloning to 4.9.z
There is nothing that can be done in must-gather as this issue is with the noobaa. The files with size 0 are added in the recent releases i.e., noobaa 5.9. We need to open the bug or issue in noobaa mostly. I would like to close this bug in this case Ketan. Let us know if it's fine for you.
I'm a bit confused here, Some of the files are the pod logs, which you probably get by 'oc logs <POD>' - is it returns nothing, or returns something but a file is written with nothing how is that noobaa? Other files are describe for certain entities, for example, db-noobaa-db-pg-0-pvc-describe.txt which is a oc describe command on a PV. How is that a noobaa problem?
@ypadia invoking the same commands, not within the MG run is printing the expected output.
@nbecker and @lmaunda true. I was checking it with the previous version of noobaa. Working on it with noobaa5.9 again and will update. Sorry for the confusion caused.
In must-gather automation test, we verify that file size bigger than zero. https://github.com/red-hat-storage/ocs-ci/blob/master/ocs_ci/ocs/must_gather/must_gather.py#L80 What platform did you work on? AWS/Vmware/Azure/IBM/GCP?
@oviner After going through logs and testing everything, I have noticed it is due to the cluster issue, The cluster in which I test didn't have `rook-ceph-mon-endpoints` configmap due to which the must-gather-helper pods are not created. And as the pods are not up we are getting this txt file as empty. I couldn't get the fresh cluster to test. Trying to debug the reason why the config map map is not created. Not sure about the cluster in which Ketan has run the must-gather.
@oded do we do that for each file in the package? that is good! I believe it was deployed on BM, but might be mistaken. The two options are either VM os a virtualization lab that SpectrumScale uses for bringing up clusters. @yati good find so far
Thanks @Nimrod. I am still working to find the root cause with proof. I will update the bug once I find it.
After all the verification, me and Rewant has come to the following conclusion: 1. The issue is with the cluster. We don't have `rook-ceph-mon-endpoints` created in the cluster details shared by you. And due to this The must-gather-helper pod doesn't comes up. 2. As the pods are not up, The contents are not being copied in the txt file. We need to check the following before running the must gather: `oc get configmap -n openshift-storage` The above command should have `rook-ceph-mon-endpoints` in the output. Please make sure you have mon pods up and running. Otherwise there is no use of running the must-gather. Here is the output gathered from my cluster: ``` [yatipadia@192 raw_output]$ ll total 6148 -rw-r--r--. 1 yatipadia yatipadia 3244 Dec 2 18:13 BackingStoreList_crs.yaml -rw-r--r--. 1 yatipadia yatipadia 2588 Dec 2 18:13 BucketClassList_crs.yaml -rw-r--r--. 1 yatipadia yatipadia 615 Dec 2 18:13 db-noobaa-db-pg-0-pvc-describe.txt -rw-r--r--. 1 yatipadia yatipadia 73 Dec 2 18:13 NamespaceStoreList_crs.yaml -rw-r--r--. 1 yatipadia yatipadia 728101 Dec 2 18:13 noobaa-core-0-core.log -rw-r--r--. 1 yatipadia yatipadia 5496 Dec 2 18:13 noobaa-core-0-pod-describe.txt -rw-r--r--. 1 yatipadia yatipadia 1468 Dec 2 18:13 noobaa-db-pg-0-db.log -rw-r--r--. 1 yatipadia yatipadia 8898 Dec 2 18:13 noobaa-db-pg-0-initialize-database.log -rw-r--r--. 1 yatipadia yatipadia 347 Dec 2 18:13 noobaa-db-pg-0-init.log -rw-r--r--. 1 yatipadia yatipadia 6011 Dec 2 18:13 noobaa-db-pg-0-pod-describe.txt -rw-r--r--. 1 yatipadia yatipadia 257885 Dec 2 18:13 noobaa_diagnostics_1638448999.tar.gz -rw-r--r--. 1 yatipadia yatipadia 489700 Dec 2 18:13 noobaa-endpoint-7844c7c8b6-2zjqj-endpoint.log -rw-r--r--. 1 yatipadia yatipadia 5218 Dec 2 18:13 noobaa-endpoint-7844c7c8b6-2zjqj-pod-describe.txt -rw-r--r--. 1 yatipadia yatipadia 1969 Dec 2 18:13 noobaa-endpoint-scc-describe.txt -rw-r--r--. 1 yatipadia yatipadia 8497 Dec 2 18:13 NooBaaList_crs.yaml -rw-r--r--. 1 yatipadia yatipadia 4718180 Dec 2 18:13 noobaa-operator-7bc85f88b7-9mgp9-noobaa-operator.log -rw-r--r--. 1 yatipadia yatipadia 4571 Dec 2 18:13 noobaa-operator-7bc85f88b7-9mgp9-pod-describe.txt -rw-r--r--. 1 yatipadia yatipadia 1951 Dec 2 18:13 noobaa-scc-describe.txt -rw-r--r--. 1 yatipadia yatipadia 15 Dec 2 18:13 obc_list -rw-r--r--. 1 yatipadia yatipadia 1810 Dec 2 18:13 status ``` Also attaching our must-gather in the attachment for your reference. Please try this out and let us know if your issue is resolved.
Would the ook-ceph-mon-endpoints be there when we deploy MCG only in ODF 4.9? Up to 4.9, everything was up. But starting 4.9 we can choose the components we deploy, and we support MCG only. Can that be the reason for the problem?
Yes that can be the reason. Even the MCG cluster I used had the same issue.
@ketan.khurana are you fine with the above discussion and can we close this bug as we don't have any fix here? The only thing we need to do is, check that the mon pods are up and the mon-endpoint config map is present. Without that running must-gather won't be successful.
wait, we can't close this in 4.9 MCG only is a supported and valid option. must gather must work even if a customer deployed MCG only
In that case we will see what we can do and will update the bug accordingly.
I'm understanding with the latest comments that the fix needs to be provided. Let me know if I've something to check
Ketan can you still check comment 16 and see if the mon pods are up. Because on until the mon pods are up we can't run must gather.
I do not have the access to previous setup and with ODF GA'ed we're unable to collect noobaa logs. Can we have ETA for this bug?
Yes, I will be working on this soon.
I cross checked `oc get configmap` on the second cluster and could confirm that the `rook-ceph-mon-endpoints` is not running. [root@api raw_output]# oc get configmap NAME DATA AGE 4fd470de.openshift.io 0 30d ab76f4c9.openshift.io 0 30d kube-root-ca.crt 1 30d noobaa-config 3 30d noobaa-operator-lock 0 30d noobaa-postgres-config 1 30d noobaa-postgres-initdb-sh 1 30d odf-operator-manager-config 23 30d openshift-service-ca.crt 1 30d rook-ceph-operator-config 3 30d Which component is responsible to bring up this rook-ceph-mon-endpoint configmap ? @ypadia @muagarwa @nbecker
Yes Ketan, you will not find the rook-ceph-mon-endpoints configmap as it's MCG only cluster and also after further verification I see it's not required for noobaa and I see both to be a different issue. Also, I am now unable to reproduce the issue too. So we are not sure if it is environment-specific. I will check the must-gather attached by you and update the bug.
We have a RCA, clearing the need info
Hi, Can you check my test procedure? 1.Deploy MCG_ONLY cluster [OCP4.10+ODF4.10.0] [the platform is not relevant] 2.Run MG command: oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10 3.Extract (Unzip) Tar Gz File [/noobaa/raw_output/noobaa_diagnostics_1642337281.tar.gz] tar -xf noobaa_diagnostics_1642337281.tar.gz -C ./extract 4.Verify the file size is bigger than zero $ ll total 8268 -rw-r--r--. 1 odedviner odedviner 2934 Jan 16 14:48 BackingStoreList_crs.yaml -rw-r--r--. 1 odedviner odedviner 2586 Jan 16 14:48 BucketClassList_crs.yaml -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 db-noobaa-db-pg-0-pvc-describe.txt -rw-r--r--. 1 odedviner odedviner 72 Jan 16 14:48 NamespaceStoreList_crs.yaml -rw-r--r--. 1 odedviner odedviner 71 Jan 16 14:48 NooBaaAccountList_crs.yaml -rw-r--r--. 1 odedviner odedviner 2444846 Jan 16 14:48 noobaa-core-0-core.log -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 noobaa-core-0-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 1468 Jan 16 14:48 noobaa-db-pg-0-db.log -rw-r--r--. 1 odedviner odedviner 8898 Jan 16 14:48 noobaa-db-pg-0-initialize-database.log -rw-r--r--. 1 odedviner odedviner 347 Jan 16 14:48 noobaa-db-pg-0-init.log -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 noobaa-db-pg-0-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 70533 Jan 16 14:48 noobaa-endpoint-7565dbcb9c-25jpz-endpoint.log -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 noobaa-endpoint-7565dbcb9c-25jpz-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 noobaa-endpoint-scc-describe.txt -rw-r--r--. 1 odedviner odedviner 9060 Jan 16 14:48 NooBaaList_crs.yaml -rw-r--r--. 1 odedviner odedviner 5896496 Jan 16 14:48 noobaa-operator-68d6c55545-c6mws-noobaa-operator.log -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 noobaa-operator-68d6c55545-c6mws-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 0 Jan 16 14:48 noobaa-scc-describe.txt Some Notes: a.Do you have list of the expected file names on noobaa_diagnostics dir? b.In OCS-CI, we dont extract noobaa_diagnostics. so we cant test the content of this dir, Do you think we need to add this?
Some files on noobaa_diagnostics dir are empty SetUp: OCP Version:4.10.0-0.nightly-2022-01-24-020644 ODF Version:full_version=4.10.0-115 Test Process: 1.Check noobaa status: NAME TYPE TARGET-BUCKET PHASE AGE noobaa-default-backing-store s3-compatible nb.1643048815168.apps.oviner8-jan24.qe.rh-ocs.com Ready 1h19m3s #------------------# #- Bucket Classes -# #------------------# NAME PLACEMENT NAMESPACE-POLICY PHASE AGE noobaa-default-bucket-class {"tiers":[{"backingStores":["noobaa-default-backing-store"]}]} null Ready 1h19m3s 2.Run MG command: $ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10 3.Extract (Unzip) Tar Gz File [/noobaa/raw_output/noobaa_diagnostics.tar.gz] $ tar -xf noobaa_diagnostics_1643051282.tar.gz -C ./extract 4.Verify the file size is bigger than zero $ ll total 5776 -rw-r--r--. 1 odedviner odedviner 3249 Jan 24 21:08 BackingStoreList_crs.yaml -rw-r--r--. 1 odedviner odedviner 2588 Jan 24 21:08 BucketClassList_crs.yaml -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 db-noobaa-db-pg-0-pvc-describe.txt -rw-r--r--. 1 odedviner odedviner 73 Jan 24 21:08 NamespaceStoreList_crs.yaml -rw-r--r--. 1 odedviner odedviner 72 Jan 24 21:08 NooBaaAccountList_crs.yaml -rw-r--r--. 1 odedviner odedviner 1627199 Jan 24 21:08 noobaa-core-0-core.log -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 noobaa-core-0-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 1468 Jan 24 21:08 noobaa-db-pg-0-db.log -rw-r--r--. 1 odedviner odedviner 8898 Jan 24 21:08 noobaa-db-pg-0-initialize-database.log -rw-r--r--. 1 odedviner odedviner 347 Jan 24 21:08 noobaa-db-pg-0-init.log -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 noobaa-db-pg-0-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 31174 Jan 24 21:08 noobaa-endpoint-68bf4dcdc8-xnlp4-endpoint.log -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 noobaa-endpoint-68bf4dcdc8-xnlp4-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 noobaa-endpoint-scc-describe.txt -rw-r--r--. 1 odedviner odedviner 8799 Jan 24 21:08 NooBaaList_crs.yaml -rw-r--r--. 1 odedviner odedviner 4198458 Jan 24 21:08 noobaa-operator-69ddc5868c-g6wxj-noobaa-operator.log -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 noobaa-operator-69ddc5868c-g6wxj-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 0 Jan 24 21:08 noobaa-scc-describe.txt 5.Check mg logs: ollecting dump of obc list time="2022-01-24T19:08:02Z" level=info msg="Running collection of diagnostics" time="2022-01-24T19:08:02Z" level=info msg="❌ Could not find kubectl, will try to use oc instead, error: exec: \"kubectl\": executable file not found in $PATH\n" time="2022-01-24T19:08:02Z" level=info msg="✅ oc exists - will use it for diagnostics\n" time="2022-01-24T19:08:03Z" level=info msg="Collecting pod logs" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe pod noobaa-core-0 in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe pod noobaa-db-pg-0 in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe pod noobaa-endpoint-68bf4dcdc8-xnlp4 in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe pod noobaa-operator-69ddc5868c-g6wxj in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH" time="2022-01-24T19:08:03Z" level=info msg="Collecting PV logs" time="2022-01-24T19:08:03Z" level=info msg="Collecting PVC logs" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe pvc db-noobaa-db-pg-0 in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH" time="2022-01-24T19:08:03Z" level=info msg="Collecting SCC logs" time="2022-01-24T19:08:03Z" level=info msg="✅ (Optional) Exists: \"noobaa\"\n" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe scc noobaa in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH" time="2022-01-24T19:08:03Z" level=info msg="✅ (Optional) Exists: \"noobaa-endpoint\"\n" time="2022-01-24T19:08:03Z" level=info msg="❌ cannot describe scc noobaa-endpoint in namespace openshift-storage: exec: \"kubectl\": executable file not found in $PATH"
Bug Fixed,all files on noobaa_diagnostics dir are not empty Setup: OCP version:4.10.0-0.nightly-2022-01-29-215708 ODF Version:4.10.0-128 Provider:Vmware Test Process: 1.Run MG command: $ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.10 2.Extract (Unzip) Tar Gz File [/noobaa/raw_output/noobaa_diagnostics.tar.gz] $ tar -xf noobaa_diagnostics_1643051282.tar.gz -C ./extract 3.Verify the file size is bigger than zero $ ll total 14540 -rw-r--r--. 1 odedviner odedviner 3246 Jan 30 14:00 BackingStoreList_crs.yaml -rw-r--r--. 1 odedviner odedviner 2587 Jan 30 14:00 BucketClassList_crs.yaml -rw-r--r--. 1 odedviner odedviner 1797 Jan 30 14:00 db-noobaa-db-pg-0-pvc-describe.txt -rw-r--r--. 1 odedviner odedviner 73 Jan 30 14:00 NamespaceStoreList_crs.yaml -rw-r--r--. 1 odedviner odedviner 72 Jan 30 14:00 NooBaaAccountList_crs.yaml -rw-r--r--. 1 odedviner odedviner 3868193 Jan 30 14:00 noobaa-core-0-core.log -rw-r--r--. 1 odedviner odedviner 6520 Jan 30 14:00 noobaa-core-0-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 1468 Jan 30 14:00 noobaa-db-pg-0-db.log -rw-r--r--. 1 odedviner odedviner 8898 Jan 30 14:00 noobaa-db-pg-0-initialize-database.log -rw-r--r--. 1 odedviner odedviner 347 Jan 30 14:00 noobaa-db-pg-0-init.log -rw-r--r--. 1 odedviner odedviner 8003 Jan 30 14:00 noobaa-db-pg-0-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 50250 Jan 30 14:00 noobaa-endpoint-6574b57c69-hpjb9-endpoint.log -rw-r--r--. 1 odedviner odedviner 6297 Jan 30 14:00 noobaa-endpoint-6574b57c69-hpjb9-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 1096 Jan 30 14:00 noobaa-endpoint-scc-describe.txt -rw-r--r--. 1 odedviner odedviner 8196 Jan 30 14:00 NooBaaList_crs.yaml -rw-r--r--. 1 odedviner odedviner 10868233 Jan 30 14:00 noobaa-operator-845959847-xpcks-noobaa-operator.log -rw-r--r--. 1 odedviner odedviner 6273 Jan 30 14:00 noobaa-operator-845959847-xpcks-pod-describe.txt -rw-r--r--. 1 odedviner odedviner 1085 Jan 30 14:00 noobaa-scc-describe.txt MG dir: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-034ai3c33-s/j-034ai3c33-s_20220126T021925/logs/failed_testcase_ocs_logs_1643167244/test_multiple_pvc_creation_deletion_scale%5bReadWriteMany-CephBlockPool%5d_ocs_logs/ocs_must_gather/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 RPM security,enhancement&bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1361