Description of problem (please be detailed as possible and provide log snippests): External mode deployment failed with the error : E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage get backingstore noobaa-default-backing-store -n openshift-storage -o yaml. E Error is Error from server (NotFound): backingstores.noobaa.io "noobaa-default-backing-store" not found ocs_ci/utility/utils.py:511: CommandFailed logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-external/jijoy-external_20211014T061725/logs/failed_testcase_ocs_logs_1634192701/deployment_ocs_logs/ ================================================================== Version of all relevant components (if applicable): ODF 4.9.0-189.ci OCP 4.9.0-0.nightly-2021-10-13-170616 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? External mode(ceph) installation failed Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: External mode deployment was working in 4.8 Steps to Reproduce: 1.Install external mode cluster 2.Verify backingstore noobaa-default-backing-store Actual results: (NotFound): backingstores.noobaa.io "noobaa-default-backing-store" not found Expected results: backingstore "noobaa-default-backing-store" should be present Additional info:
Providing dev_ack based on Nimrod's comment, will move it to ON_QA once we have a build on Monday, can be reopened if the issue persists.
Moving it to ON_QA, please retest with the latest build.
@Shylesh FYI Testing was done using ocs-ci after adding a retry(based on comment #7) to check the presence of noobaa-default-backing-store. Installation was successful. Tested in version: ODF 4.9.0-195.ci OCP 4.9.0-0.nightly-2021-10-22-102153
I see the other job failed here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2330/ Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-003vu1ce33-t4a/j-003vu1ce33-t4a_20211124T090832/logs/failed_testcase_ocs_logs_1637745258/deployment_ocs_logs/ Re-triggering here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-vsphere-upi-1az-rhcos-external-3m-3w-tier4a/4/console With the fix with re-try from Jilju here: https://github.com/red-hat-storage/ocs-ci/pull/5132
Another occurrence here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2352/console http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-041vu1ce33-t1/j-041vu1ce33-t1_20211124T145639/logs/failed_testcase_ocs_logs_1637766268/test_deployment_ocs_logs/ We don't see this problem on other combinations - only on external mode deployment from what I see. Can someone from noobaa team have a second look please why it's happening only here? Moving back to Assigned
Hi Petr. in both occurrences, I see the same issue Romy mentioned - the backing store is created a few seconds after the test is trying to get the backing stores list. I do not see any retries. Am I missing anything?
As to why it only happens on external mode deployments - I can assume that there are timing differences between external and internal modes. since the test is not very resilient to timing issues and expects the backingstore to exist at a specific time then it fails.
I created one more fix for ocs-ci: https://github.com/red-hat-storage/ocs-ci/pull/5187 As Danny mentioned it is probably how checks are proceeded in ocs-ci and the check is happening to fast in the case of external mode. I hope that after merging of the PR we should not see this issue anymore. I think we can close this as a NOT A BUG.