Bug 1958618

Summary: Must-gather, upgrading OCP/OCS version (4.6 -> 4.7), "ceph_fs_status" file does not exist
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Oded <oviner>
Component: must-gatherAssignee: Rewant <resoni>
Status: CLOSED WORKSFORME QA Contact: Raz Tamir <ratamir>
Severity: unspecified Docs Contact:
Priority: high    
Version: 4.7CC: akgunjal, godas, muagarwa, nberry, ocs-bugs, odf-bz-bot, resoni, sabose
Target Milestone: ---Flags: godas: needinfo? (oviner)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-06 13:07:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oded 2021-05-09 08:34:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):
After upgrade ocp version (4.6 -> 4.7), "ceph_fs_status" and "ceph_fs_status_--format_json-pretty" files do not exist on must-gather directory.


Version of all relevant components (if applicable):
OCP version: 4.7.0-0.nightly-2021-05-05-092347
OCS version: 4.7.0-383.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy cluster OCP4.6 + OCS4.7
2.Upgrde OCP version (4.6 ->4.7)
3.Collect must-gather 
$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.7
4.Files do not exist: ceph_fs_status, ceph_fs_status_--format_json-pretty

Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j014ai3c33-uo/j014ai3c33-uo_20210507T114154/logs/failed_testcase_ocs_logs_1620399610/test_must_gather%5bCEPH%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-464408ac1e1fafdd1d4476e5fd510a7ad5b8fe29780517483023e0c985b2c159/

Actual results:
Files do not exist: ceph_fs_status, ceph_fs_status_--format_json-pretty

Expected results:
ceph_fs_status and ceph_fs_status_--format_json-pretty files exist

Additional info:

Comment 8 akgunjal@in.ibm.com 2021-07-06 08:01:11 UTC
Adding a comment we also see this issue when we do CI test run. We did not do OCP upgrade in our case but see the same issue. Pasting details below.

Issue is for file “ceph_status” --  https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/view/Tier1/job/qe-trigger-ibmcloud-managed-1az-rhel-3w-tier1/38/testReport/tests.manage.z_cluster.test_must_gather/TestMustGather/test_must_gather_CEPH_/

Comment 9 akgunjal@in.ibm.com 2021-07-06 08:03:23 UTC
Sometimes random ceph files are not included in must gather directory.

Random files -->once the file ceph-X does not exist and in another run the file ceph-Y does not exist.

Comment 10 Sahina Bose 2021-07-07 13:47:05 UTC
(In reply to akgunjal.com from comment #9)
> Sometimes random ceph files are not included in must gather directory.
> 
> Random files -->once the file ceph-X does not exist and in another run the
> file ceph-Y does not exist.

Can you provide must-gather image version used in run and OCS version?

Comment 12 akgunjal@in.ibm.com 2021-07-13 10:47:16 UTC
This is the command used.

oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:latest
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j038icm1r3-t1/j038icm1r3-t1_20210701T141227/logs/ocs-ci-logs-1625159012/tests/manage/z_cluster/test_must_gather.py/TestMustGather/test_must_gather-CEPH/logs


OCS Version is 4.7

Comment 14 Sahina Bose 2021-08-03 06:22:17 UTC
Gobinda, can someone check this? (This bug is one of the causes for failed tests on IBM ROKS platform)

Comment 15 Rewant 2021-08-05 07:38:30 UTC
I ran 3 instances of must-gather on the cluster from IBM ROKS platform, the must-gathers run successfully, on comparing the ceph files, I found that there were no random files missing, all 3 had the same files.

https://drive.google.com/drive/folders/15wd45z8uLj1sYnn86o7oYXoVFwucP3Zx?usp=sharing

Comment 16 Rewant 2021-08-05 07:44:30 UTC
(In reply to Rewant from comment #15)
> I ran 3 instances of must-gather on the cluster from IBM ROKS platform, the
> must-gathers run successfully, on comparing the ceph files, I found that
> there were no random files missing, all 3 had the same files.
> 
> https://drive.google.com/drive/folders/
> 15wd45z8uLj1sYnn86o7oYXoVFwucP3Zx?usp=sharing

Created the image for must-gather from the master.

Comment 17 Gobinda Das 2021-08-05 08:58:53 UTC
@Oded Is there any specific files we need to check? With the latest must-gather image created from master is not having any issue, it gives all ceph resuorces files.