Bug 2014026

Summary: [External Mode]Error backingstores.noobaa.io "noobaa-default-backing-store" not found
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Jilju Joy <jijoy>
Component: Multi-Cloud Object GatewayAssignee: Nimrod Becker <nbecker>
Status: CLOSED NOTABUG QA Contact: shylesh <shmohan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: dzaken, ebenahar, etamir, madam, muagarwa, nbecker, ocs-bugs, odf-bz-bot, pbalogh, rayalon, rcyriac, shmohan, sostapov
Target Milestone: ---Keywords: Automation, Regression, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.9.0-192.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-06 13:57:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jilju Joy 2021-10-14 10:04:33 UTC
Description of problem (please be detailed as possible and provide log
snippests):
External mode deployment failed with the error :

E           ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage get backingstore noobaa-default-backing-store -n openshift-storage -o yaml.
E           Error is Error from server (NotFound): backingstores.noobaa.io "noobaa-default-backing-store" not found

ocs_ci/utility/utils.py:511: CommandFailed


logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-external/jijoy-external_20211014T061725/logs/failed_testcase_ocs_logs_1634192701/deployment_ocs_logs/

==================================================================

Version of all relevant components (if applicable):
ODF 4.9.0-189.ci
OCP 4.9.0-0.nightly-2021-10-13-170616


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
External mode(ceph) installation failed

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
External mode deployment was working in 4.8

Steps to Reproduce:
1.Install external mode cluster
2.Verify backingstore noobaa-default-backing-store

Actual results:
(NotFound): backingstores.noobaa.io "noobaa-default-backing-store" not found

Expected results:
backingstore "noobaa-default-backing-store" should be present

Additional info:

Comment 4 Mudit Agarwal 2021-10-17 14:44:52 UTC
Providing dev_ack based on Nimrod's comment, will move it to ON_QA once we have a build on Monday, can be reopened if the issue persists.

Comment 8 Mudit Agarwal 2021-10-18 14:25:42 UTC
Moving it to ON_QA, please retest with the latest build.

Comment 10 Jilju Joy 2021-10-26 12:15:07 UTC
@Shylesh
FYI

Testing was done using ocs-ci after adding a retry(based on comment #7) to check the presence of noobaa-default-backing-store. Installation was successful.
Tested in version:
ODF 4.9.0-195.ci
OCP 4.9.0-0.nightly-2021-10-22-102153

Comment 12 Petr Balogh 2021-11-25 16:07:34 UTC
Another occurrence here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2352/console
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-041vu1ce33-t1/j-041vu1ce33-t1_20211124T145639/logs/failed_testcase_ocs_logs_1637766268/test_deployment_ocs_logs/

We don't see this problem on other combinations - only on external mode deployment from what I see.
Can someone from noobaa team have a second look please why it's happening only here?

Moving back to Assigned

Comment 13 Danny 2021-11-28 15:35:39 UTC
Hi Petr.

in both occurrences, I see the same issue Romy mentioned - the backing store is created a few seconds after the test is trying to get the backing stores list. I do not see any retries. 
Am I missing anything?

Comment 14 Danny 2021-11-28 15:44:00 UTC
As to why it only happens on external mode deployments - I can assume that there are timing differences between external and internal modes. since the test is not very resilient to timing issues and expects the backingstore to exist at a specific time then it fails.

Comment 15 Petr Balogh 2021-12-06 13:57:38 UTC
I created one more fix for ocs-ci:
https://github.com/red-hat-storage/ocs-ci/pull/5187

As Danny mentioned it is probably how checks are proceeded in ocs-ci and the check is happening to fast in the case of external mode.
I hope that after merging of the PR we should not see this issue anymore.

I think we can close this as a NOT A BUG.