Description of problem (please be detailed as possible and provide log snippests): ---------------------------------------------- must-gather doesn't collect pod logs for most of the pods. OpenShift Client version: Client Version: 4.3.0-0.nightly-2020-03-04-235307 OpenShift Installer version: /home/jenkins/bin/openshift-install 4.3.0-0.nightly-2020-03-04-235307 must-gather version: >> must-gather command : quay.io/rhceph-dev/ocs-must-gather:latest-4.3 oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.3 --dest-dir=/home/jenkins/current-cluster- Sample must-gather : [1] [1] - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/failed_testcase_ocs_logs_1583438366/test_nodes_restart%5bFalse%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0849ca3016ed9e69e0e1255a3bd95b55f2ef2f1d2e8107a7c3d2ee7ecef206fd/ceph/namespaces/openshift-storage/pods/ E.g. mon pod logs are missing here [2] [2]- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/failed_testcase_ocs_logs_1583438366/test_nodes_restart%5bFalse%5d_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-0849ca3016ed9e69e0e1255a3bd95b55f2ef2f1d2e8107a7c3d2ee7ecef206fd/ceph/namespaces/openshift-storage/pods/rook-ceph-mon-a-577857fdc4-nxc2s/ Version of all relevant components (if applicable): --------------------------------------------------------- OCS operator v4.3.0-363.ci Ceph Version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable) OCP version: Cluster Version 4.3.0-0.nightly-2020-03-04-235307 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ---------------------------------------------------------- Yes Is there any workaround available to the best of your knowledge? ------------------------------------------------------ Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ------------------------------------------------------------------------ 2 Can this issue reproducible? --------------------------- yes Can this issue reproduce from the UI? ------------------ No If this is a regression, please provide more details to justify this: ---------------------------------------------- Yes Steps to Reproduce: 1. Create an OCS 4.3 cluster 2. Initiate ocs must-gather with 4.3 latest image 3. Actual results: Expected results: Additional info: ----------------------- Run id of ocs-ci : ocs-ci results for OCS4-3-Downstream-OCP4-3-AWS-IPI-3AZ-RHCOS-3M-3W-tier4a (BUILD ID: v4.3.0-363.ci RUN ID: 1583438366) Full console logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t4a/jnk-ai3c33-t4a_20200305T184613/logs/ocs-ci-logs-1583438366/tests/manage/pv_services/test_pvc_disruptive.py/TestPVCDisruption/test_pvc_disruptive-CephFileSystem-create_pod-cephfsplugin_provisioner/logs >> Run information BUILD_ID 5188 BUILD_NUMBER 5188 BUILD_TAG jenkins-qe-deploy-ocs-cluster-5188 BUILD_URL https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/5188/ Ceph Version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable) Cluster Version 4.3.0-0.nightly-2020-03-04-235307 EXECUTOR_NUMBER 0 GIT_BRANCH origin/master GIT_COMMIT 7b9c3c07134ff30119d4bd309207e84e4cc587ca GIT_URL ${JOBS_REPOSITORY} JENKINS_URL https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/ JOB_NAME qe-deploy-ocs-cluster NODE_NAME temp-slave-jnk-ai3c33-t4a-11 OCS operator v4.3.0-363.ci Test Run Name OCS4-3-Downstream-OCP4-3-AWS-IPI-3AZ-RHCOS-3M-3W-tier4a WORKSPACE /home/jenkins/workspace/qe-deploy-ocs-cluster ceph rhceph@sha256:1ec55227084f058c468df5cfff2cd55623668a72ec742af3e8b1c05b52d44d0a cephfsplugin 494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d noobaa-operator mcg-operator@sha256:69d9765917749a67fee08e8b96b83c9dfe8d77c4dcaf57aeee91379945448cb1 noobaa_cor mcg-core@sha256:d5cbbb7fd95a5975472b2be0079108877861b90aadc05f1f5616db9a4cc072df noobaa_db mongodb-36-rhel7@sha256:ad5dc22e6115adc0d875f6d2eb44b2ba594d07330a600e67bf3de49e02cab5b0 rbdplugin 494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d rook-ceph-operator rook-ceph@sha256:17ebccae08ea7cc5be84f92878cbee520f911a7dc00ee448ecc8717a3b9cf228 rook_ceph rook-ceph@sha256:17ebccae08ea7cc5be84f92878cbee520f911a7dc00ee448ecc8717a3b9cf228 rook_csi_attacher ose-csi-external-attacher@sha256:3edf4d6f5d40233f453611be2796f8ad1a158ee489a3de7c7ceb0feb1f3f6771 rook_csi_ceph cephcsi@sha256:494e491818dcdcc030c0dbbf7f7f45a019a51abf7b34d17e9b06b82c38d9e17d rook_csi_provisioner ose-csi-external-provisioner-rhel7@sha256:0fe73520b88fee7ad003d0d534bf8372dc2c1eb3068cabdb49c32fcd7927a66e rook_csi_registrar ose-csi-driver-registrar@sha256:685f523b3118f0686abc117330d1554c20d4893864283c54d1183ca6cbb7e6f2
Its the issue with the base image that we use for the oc. The backport fix is still not merged https://github.com/openshift/oc/pull/290
Raising severity to urgent as this bug preventing QE from doing proper automation analysis for failures
@Ashish - If I get it right this should be an OCP bug rather than an OCS bug (OCS tracking OCP here). - while waiting for OCP to merge the backport, is there anything we could in the meantime do in OCS itself to work around the issue?
Well, it would have been difficult to work around on this issue. Luckily we got the PR https://github.com/openshift/oc/pull/290 merged, so moving this ON_QA. The fix can be found in this image ocs-must-gather image quay.io/rhceph-dev/ocs-must-gather:4.3-19.0e07206e.release_4.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1437