Bug 1810939

Summary: [Tracking #1810959] must-gather with latest-4.3 doesn't collect openshift-storage pod logs
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: must-gatherAssignee: Ashish Ranjan <aranjan>
Status: CLOSED ERRATA QA Contact: Aviad Polak <apolak>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.3CC: aranjan, ebenahar, madam, ocs-bugs, sabose
Target Milestone: ---Keywords: Automation, Regression, TestBlocker
Target Release: OCS 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-14 09:46:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1810959    
Bug Blocks:    

Description Neha Berry 2020-03-06 09:25:12 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------
must-gather doesn't collect pod logs for most of the pods.

OpenShift Client version: Client Version: 4.3.0-0.nightly-2020-03-04-235307

OpenShift Installer version: /home/jenkins/bin/openshift-install 4.3.0-0.nightly-2020-03-04-235307

must-gather version: 
>> must-gather command : quay.io/rhceph-dev/ocs-must-gather:latest-4.3

oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.3 --dest-dir=/home/jenkins/current-cluster-


Sample must-gather : [1]

[1] - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/failed_testcase_ocs_logs_1583438366/test_nodes_restart%5bFalse%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0849ca3016ed9e69e0e1255a3bd95b55f2ef2f1d2e8107a7c3d2ee7ecef206fd/ceph/namespaces/openshift-storage/pods/


E.g. mon pod logs are missing here [2]

[2]- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/failed_testcase_ocs_logs_1583438366/test_nodes_restart%5bFalse%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0849ca3016ed9e69e0e1255a3bd95b55f2ef2f1d2e8107a7c3d2ee7ecef206fd/ceph/namespaces/openshift-storage/pods/rook-ceph-mon-a-577857fdc4-nxc2s/



Version of all relevant components (if applicable):
---------------------------------------------------------

OCS operator	v4.3.0-363.ci

Ceph Version	14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)


OCP version: Cluster Version	4.3.0-0.nightly-2020-03-04-235307

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------

Yes

Is there any workaround available to the best of your knowledge?
------------------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
------------------------------------------------------------------------
2

Can this issue reproducible?
---------------------------
yes

Can this issue reproduce from the UI?
------------------
No


If this is a regression, please provide more details to justify this:
----------------------------------------------
Yes


Steps to Reproduce:
1. Create an OCS 4.3 cluster 
2. Initiate ocs must-gather with 4.3 latest image
3.


Actual results:


Expected results:


Additional info:
-----------------------
Run id of ocs-ci : ocs-ci results for OCS4-3-Downstream-OCP4-3-AWS-IPI-3AZ-RHCOS-3M-3W-tier4a (BUILD ID: v4.3.0-363.ci RUN ID: 1583438366)


Full console logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/ocs-ci-logs-1583438366/tests/manage/pv_services/test_pvc_disruptive.py/TestPVCDisruption/test_pvc_disruptive-CephFileSystem-create_pod-cephfsplugin_provisioner/logs


>> Run information

BUILD_ID	5188
BUILD_NUMBER	5188
BUILD_TAG	jenkins-qe-deploy-ocs-cluster-5188
BUILD_URL	https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/5188/
Ceph Version	14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)
Cluster Version	4.3.0-0.nightly-2020-03-04-235307
EXECUTOR_NUMBER	0
GIT_BRANCH	origin/master
GIT_COMMIT	7b9c3c07134ff30119d4bd309207e84e4cc587ca
GIT_URL	${JOBS_REPOSITORY}
JENKINS_URL	https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/
JOB_NAME	qe-deploy-ocs-cluster
NODE_NAME	temp-slave-jnk-ai3c33-t4a-11
OCS operator	v4.3.0-363.ci
Test Run Name	OCS4-3-Downstream-OCP4-3-AWS-IPI-3AZ-RHCOS-3M-3W-tier4a
WORKSPACE	/home/jenkins/workspace/qe-deploy-ocs-cluster
ceph	rhceph@sha256:1ec55227084f058c468df5cfff2cd55623668a72ec742af3e8b1c05b52d44d0a
cephfsplugin	494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d
noobaa-operator	mcg-operator@sha256:69d9765917749a67fee08e8b96b83c9dfe8d77c4dcaf57aeee91379945448cb1
noobaa_cor	mcg-core@sha256:d5cbbb7fd95a5975472b2be0079108877861b90aadc05f1f5616db9a4cc072df
noobaa_db	mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0
rbdplugin	494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d
rook-ceph-operator	rook-ceph@sha256:17ebccae08ea7cc5be84f92878cbee520f911a7dc00ee448ecc8717a3b9cf228
rook_ceph	rook-ceph@sha256:17ebccae08ea7cc5be84f92878cbee520f911a7dc00ee448ecc8717a3b9cf228
rook_csi_attacher	ose-csi-external-attacher@sha256:3edf4d6f5d40233f453611be2796f8ad1a158ee489a3de7c7ceb0feb1f3f6771
rook_csi_ceph	cephcsi@sha256:494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d
rook_csi_provisioner	ose-csi-external-provisioner-rhel7@sha256:0fe73520b88fee7ad003d0d534bf8372dc2c1eb3068cabdb49c32fcd7927a66e
rook_csi_registrar	ose-csi-driver-registrar@sha256:685f523b3118f0686abc117330d1554c20d4893864283c54d1183ca6cbb7e6f2

Comment 4 Ashish Ranjan 2020-03-06 09:39:29 UTC
Its the issue with the base image that we use for the oc. The backport fix is still not merged https://github.com/openshift/oc/pull/290

Comment 5 Raz Tamir 2020-03-10 07:34:16 UTC
Raising severity to urgent as this bug preventing QE from doing proper automation analysis for failures

Comment 6 Michael Adam 2020-03-10 07:56:30 UTC
@Ashish

- If I get it right this should be an OCP bug rather than an OCS bug (OCS tracking OCP here).
- while waiting for OCP to merge the backport, is there anything we could in the meantime do in OCS itself to work around the issue?

Comment 7 Ashish Ranjan 2020-03-11 20:41:16 UTC
Well, it would have been difficult to work around on this issue. Luckily we got the PR https://github.com/openshift/oc/pull/290 merged, so moving this ON_QA. The fix can be found in this image ocs-must-gather image quay.io/rhceph-dev/ocs-must-gather:4.3-19.0e07206e.release_4.3

Comment 11 errata-xmlrpc 2020-04-14 09:46:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437