1810939 – [Tracking #1810959] must-gather with latest-4.3 doesn't collect openshift-storage pod logs

Bug 1810939 - [Tracking #1810959] must-gather with latest-4.3 doesn't collect openshift-storage pod logs

Summary: [Tracking #1810959] must-gather with latest-4.3 doesn't collect openshift-sto...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	must-gather
Sub Component:
Version:	4.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	OCS 4.3.0
Assignee:	Ashish Ranjan
QA Contact:	Aviad Polak
Docs Contact:
URL:
Whiteboard:
Depends On:	1810959
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-06 09:25 UTC by Neha Berry
Modified:	2020-04-14 09:48 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-04-14 09:46:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:1437	0	None	None	None	2020-04-14 09:48:15 UTC

Internal Links: 1810959

Description Neha Berry 2020-03-06 09:25:12 UTC

Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------
must-gather doesn't collect pod logs for most of the pods.

OpenShift Client version: Client Version: 4.3.0-0.nightly-2020-03-04-235307

OpenShift Installer version: /home/jenkins/bin/openshift-install 4.3.0-0.nightly-2020-03-04-235307

must-gather version: 
>> must-gather command : quay.io/rhceph-dev/ocs-must-gather:latest-4.3

oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.3 --dest-dir=/home/jenkins/current-cluster-


Sample must-gather : [1]

[1] - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/failed_testcase_ocs_logs_1583438366/test_nodes_restart%5bFalse%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0849ca3016ed9e69e0e1255a3bd95b55f2ef2f1d2e8107a7c3d2ee7ecef206fd/ceph/namespaces/openshift-storage/pods/


E.g. mon pod logs are missing here [2]

[2]- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/failed_testcase_ocs_logs_1583438366/test_nodes_restart%5bFalse%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0849ca3016ed9e69e0e1255a3bd95b55f2ef2f1d2e8107a7c3d2ee7ecef206fd/ceph/namespaces/openshift-storage/pods/rook-ceph-mon-a-577857fdc4-nxc2s/



Version of all relevant components (if applicable):
---------------------------------------------------------

OCS operator	v4.3.0-363.ci

Ceph Version	14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)


OCP version: Cluster Version	4.3.0-0.nightly-2020-03-04-235307

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------

Yes

Is there any workaround available to the best of your knowledge?
------------------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
------------------------------------------------------------------------
2

Can this issue reproducible?
---------------------------
yes

Can this issue reproduce from the UI?
------------------
No


If this is a regression, please provide more details to justify this:
----------------------------------------------
Yes


Steps to Reproduce:
1. Create an OCS 4.3 cluster 
2. Initiate ocs must-gather with 4.3 latest image
3.


Actual results:


Expected results:


Additional info:
-----------------------
Run id of ocs-ci : ocs-ci results for OCS4-3-Downstream-OCP4-3-AWS-IPI-3AZ-RHCOS-3M-3W-tier4a (BUILD ID: v4.3.0-363.ci RUN ID: 1583438366)


Full console logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/ocs-ci-logs-1583438366/tests/manage/pv_services/test_pvc_disruptive.py/TestPVCDisruption/test_pvc_disruptive-CephFileSystem-create_pod-cephfsplugin_provisioner/logs


>> Run information

BUILD_ID	5188
BUILD_NUMBER	5188
BUILD_TAG	jenkins-qe-deploy-ocs-cluster-5188
BUILD_URL	https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/5188/
Ceph Version	14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)
Cluster Version	4.3.0-0.nightly-2020-03-04-235307
EXECUTOR_NUMBER	0
GIT_BRANCH	origin/master
GIT_COMMIT	7b9c3c07134ff30119d4bd309207e84e4cc587ca
GIT_URL	${JOBS_REPOSITORY}
JENKINS_URL	https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/
JOB_NAME	qe-deploy-ocs-cluster
NODE_NAME	temp-slave-jnk-ai3c33-t4a-11
OCS operator	v4.3.0-363.ci
Test Run Name	OCS4-3-Downstream-OCP4-3-AWS-IPI-3AZ-RHCOS-3M-3W-tier4a
WORKSPACE	/home/jenkins/workspace/qe-deploy-ocs-cluster
ceph	rhceph@sha256:1ec55227084f058c468df5cfff2cd55623668a72ec742af3e8b1c05b52d44d0a
cephfsplugin	494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d
noobaa-operator	mcg-operator@sha256:69d9765917749a67fee08e8b96b83c9dfe8d77c4dcaf57aeee91379945448cb1
noobaa_cor	mcg-core@sha256:d5cbbb7fd95a5975472b2be0079108877861b90aadc05f1f5616db9a4cc072df
noobaa_db	mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0
rbdplugin	494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d
rook-ceph-operator	rook-ceph@sha256:17ebccae08ea7cc5be84f92878cbee520f911a7dc00ee448ecc8717a3b9cf228
rook_ceph	rook-ceph@sha256:17ebccae08ea7cc5be84f92878cbee520f911a7dc00ee448ecc8717a3b9cf228
rook_csi_attacher	ose-csi-external-attacher@sha256:3edf4d6f5d40233f453611be2796f8ad1a158ee489a3de7c7ceb0feb1f3f6771
rook_csi_ceph	cephcsi@sha256:494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d
rook_csi_provisioner	ose-csi-external-provisioner-rhel7@sha256:0fe73520b88fee7ad003d0d534bf8372dc2c1eb3068cabdb49c32fcd7927a66e
rook_csi_registrar	ose-csi-driver-registrar@sha256:685f523b3118f0686abc117330d1554c20d4893864283c54d1183ca6cbb7e6f2

Comment 4 Ashish Ranjan 2020-03-06 09:39:29 UTC

Its the issue with the base image that we use for the oc. The backport fix is still not merged https://github.com/openshift/oc/pull/290

Comment 5 Raz Tamir 2020-03-10 07:34:16 UTC

Raising severity to urgent as this bug preventing QE from doing proper automation analysis for failures

Comment 6 Michael Adam 2020-03-10 07:56:30 UTC

@Ashish

- If I get it right this should be an OCP bug rather than an OCS bug (OCS tracking OCP here).
- while waiting for OCP to merge the backport, is there anything we could in the meantime do in OCS itself to work around the issue?

Comment 7 Ashish Ranjan 2020-03-11 20:41:16 UTC

Well, it would have been difficult to work around on this issue. Luckily we got the PR https://github.com/openshift/oc/pull/290 merged, so moving this ON_QA. The fix can be found in this image ocs-must-gather image quay.io/rhceph-dev/ocs-must-gather:4.3-19.0e07206e.release_4.3

Comment 11 errata-xmlrpc 2020-04-14 09:46:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437

Note You need to log in before you can comment on or make changes to this bug.