2014026 – [External Mode]Error backingstores.noobaa.io "noobaa-default-backing-store" not found

Bug 2014026 - [External Mode]Error backingstores.noobaa.io "noobaa-default-backing-store" not found

Summary: [External Mode]Error backingstores.noobaa.io "noobaa-default-backing-store" n...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nimrod Becker
QA Contact:	shylesh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-10-14 10:04 UTC by Jilju Joy
Modified:	2023-08-09 16:49 UTC (History)
CC List:	13 users (show)
Fixed In Version:	v4.9.0-192.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-06 13:57:38 UTC
Embargoed:

Attachments	(Terms of Use)

Description Jilju Joy 2021-10-14 10:04:33 UTC

Description of problem (please be detailed as possible and provide log
snippests):
External mode deployment failed with the error :

E           ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n openshift-storage get backingstore noobaa-default-backing-store -n openshift-storage -o yaml.
E           Error is Error from server (NotFound): backingstores.noobaa.io "noobaa-default-backing-store" not found

ocs_ci/utility/utils.py:511: CommandFailed


logs - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-external/jijoy-external_20211014T061725/logs/failed_testcase_ocs_logs_1634192701/deployment_ocs_logs/

==================================================================

Version of all relevant components (if applicable):
ODF 4.9.0-189.ci
OCP 4.9.0-0.nightly-2021-10-13-170616


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
External mode(ceph) installation failed

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
External mode deployment was working in 4.8

Steps to Reproduce:
1.Install external mode cluster
2.Verify backingstore noobaa-default-backing-store

Actual results:
(NotFound): backingstores.noobaa.io "noobaa-default-backing-store" not found

Expected results:
backingstore "noobaa-default-backing-store" should be present

Additional info:

Comment 4 Mudit Agarwal 2021-10-17 14:44:52 UTC

Providing dev_ack based on Nimrod's comment, will move it to ON_QA once we have a build on Monday, can be reopened if the issue persists.

Comment 8 Mudit Agarwal 2021-10-18 14:25:42 UTC

Moving it to ON_QA, please retest with the latest build.

Comment 10 Jilju Joy 2021-10-26 12:15:07 UTC

@Shylesh
FYI

Testing was done using ocs-ci after adding a retry(based on comment #7) to check the presence of noobaa-default-backing-store. Installation was successful.
Tested in version:
ODF 4.9.0-195.ci
OCP 4.9.0-0.nightly-2021-10-22-102153

Comment 11 Petr Balogh 2021-11-24 11:46:38 UTC

I see the other job failed here:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2330/
Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-003vu1ce33-t4a/j-003vu1ce33-t4a_20211124T090832/logs/failed_testcase_ocs_logs_1637745258/deployment_ocs_logs/

Re-triggering here:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-trigger-vsphere-upi-1az-rhcos-external-3m-3w-tier4a/4/console
With the fix with re-try from Jilju here:
https://github.com/red-hat-storage/ocs-ci/pull/5132

Comment 12 Petr Balogh 2021-11-25 16:07:34 UTC

Another occurrence here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/2352/console
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-041vu1ce33-t1/j-041vu1ce33-t1_20211124T145639/logs/failed_testcase_ocs_logs_1637766268/test_deployment_ocs_logs/

We don't see this problem on other combinations - only on external mode deployment from what I see.
Can someone from noobaa team have a second look please why it's happening only here?

Moving back to Assigned

Comment 13 Danny 2021-11-28 15:35:39 UTC

Hi Petr.

in both occurrences, I see the same issue Romy mentioned - the backing store is created a few seconds after the test is trying to get the backing stores list. I do not see any retries. 
Am I missing anything?

Comment 14 Danny 2021-11-28 15:44:00 UTC

As to why it only happens on external mode deployments - I can assume that there are timing differences between external and internal modes. since the test is not very resilient to timing issues and expects the backingstore to exist at a specific time then it fails.

Comment 15 Petr Balogh 2021-12-06 13:57:38 UTC

I created one more fix for ocs-ci:
https://github.com/red-hat-storage/ocs-ci/pull/5187

As Danny mentioned it is probably how checks are proceeded in ocs-ci and the check is happening to fast in the case of external mode.
I hope that after merging of the PR we should not see this issue anymore.

I think we can close this as a NOT A BUG.

Note You need to log in before you can comment on or make changes to this bug.