Bug 2264039 - [4.14 Clone] Noobaa stuck configuring when deploying on IBM Cloud(IPI) with COS-backed backingstore
Summary: [4.14 Clone] Noobaa stuck configuring when deploying on IBM Cloud(IPI) with C...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.15
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.14.7
Assignee: Nimrod Becker
QA Contact: Sagi Hirshfeld
URL:
Whiteboard:
Depends On: 2255557
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-02-13 14:19 UTC by Danny
Modified: 2024-05-28 15:39 UTC (History)
13 users (show)

Fixed In Version: 4.14.7-1
Doc Type: Bug Fix
Doc Text:
Previously, while deploying OpenShift Data Foundation on IBM installer provisioned infrastructure, the cluster that is using a COS-backed Multicloud Object Gateway backingstore used to be stuck in the `Configuring` state. This was because the attempt to get label from IBMRegion on any node failed as only the worker nodes would have the label. This resulted in failure to create the default backingstore and subsequently failing to deploy Multicloud Object Gateway and OpenShift Data Foundation. With this fix, the label handling in the NooBaa operator is fixed so that Multicloud Object Gateway and OpenShift Data Foundation are deployed successfully.
Clone Of: 2255557
Environment:
Last Closed: 2024-05-28 15:39:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 1279 0 None Merged Fix the label-based IBM cluster region lookup 2024-04-16 09:14:07 UTC
Github noobaa noobaa-operator pull 1345 0 None Merged [Backport into 5.14] Fix the label-based IBM cluster region lookup (#1279) 2024-04-16 09:14:07 UTC
Red Hat Product Errata RHBA-2024:3443 0 None None None 2024-05-28 15:39:03 UTC

Description Danny 2024-02-13 14:19:03 UTC
+++ This bug was initially created as a clone of Bug #2255557 +++

Description of problem (please be detailed as possible and provide log
snippests):

When trying to deploy ODF to an IBM IPI cluster using a COS-backed backingstore, Noobaa becomes stuck in the Configuring state. 

-----------------
  phase: Configuring
  readme: "\n\n\tNooBaa operator is still working to reconcile this system.\n\tCheck
    out the system status.phase, status.conditions, and events with:\n\n\t\tkubectl
    -n openshift-storage describe noobaa\n\t\tkubectl -n openshift-storage get noobaa
    -o yaml\n\t\tkubectl -n openshift-storage get events --sort-by=metadata.creationTimestamp\n\n\tYou
    can wait for a specific condition with:\n\n\t\tkubectl -n openshift-storage wait
    noobaa/noobaa --for condition=available --timeout -1s\n\n\tNooBaa Core Version:
    \    master-20230920\n\tNooBaa Operator Version: 5.15.0\n"
  services:
-----------------
  - lastHeartbeatTime: "2023-12-18T10:39:32Z"
    lastTransitionTime: "2023-12-18T10:30:28Z"
    message: |-
      RequestError: send request failed
      caused by: Put "https://s3.direct..cloud-object-storage.appdomain.cloud/nb.1702895972648.apps.jnk-pr9072b6235.ibmcloud2.qe.rh-ocs.com": dial tcp: lookup s3.direct..cloud-object-storage.appdomain.cloud: no such host
    reason: TemporaryError
-----------------


Version of all relevant components (if applicable):
Server Version: 4.15.0-0.nightly-2023-12-19-033450


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, Noobaa is unable to be installed for this scenario (IBM IPI with COS-backed backingstore).


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2


Can this issue reproducible?
yes


Can this issue reproduce from the UI?
unknown


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP to IBM Cloud(IPI)
2. Label worker nodes with region information
'oc label node <worker name> ibm-cloud\.kubernetes\.io/region=<REGION>' in our case <REGION> is us-south.
3. Deploy ODF, creating the Secret via YAML as described here:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9/html-single/deploying_and_managing_openshift_data_foundation_using_google_cloud/index#creating-an-IBM-COS-backed-backingstore_rhodf


Actual results:
Noobaa is stuck in Creating phase.


Expected results:
Noobaa creation is successful, deployment succeeds.


Additional info:

https://cloud.ibm.com/docs/containers?topic=containers-storage_cos_install

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.9/html-single/deploying_and_managing_openshift_data_foundation_using_google_cloud/index#creating-an-IBM-COS-backed-backingstore_rhodf

noobaa operator logs: https://url.corp.redhat.com/d7c188f

noobaa.yaml: https://url.corp.redhat.com/a9c0127

full ocs must gather: https://url.corp.redhat.com/b9c7f09

--- Additional comment from RHEL Program Management on 2023-12-21 22:49:06 IST ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.15.0' to '?', and so is being proposed to be fixed at the ODF 4.15.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Ben Eli on 2024-01-02 16:19:06 IST ---

The node labels do not seem to be present - 
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr9072b6235/jnk-pr9072b6235_20231218T081756/logs/deployment_1702894821/jnk-pr9072b6235/ocs_must_gather/c2f9bca16b1fcd4caaeb21cefc9c0f5835b8c6d068b152724d7c54e451b83af3/cluster-scoped-resources/oc_output/desc_nodes

--- Additional comment from Petr Balogh on 2024-01-02 16:47:57 IST ---

I am not sure how Coady's PR validated this, but when I've tried manually before the shutdown I collected must gather here:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-cos/pbalogh-cos_20231221T102303/logs/deployment_1703160725/

Validation job was triggered here:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/32237/console

I see labeles on all 3 worker nodes:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/pbalogh-cos/pbalogh-cos_20231221T102303/logs/deployment_1703160725/pbalogh-cos/ocs_must_gather/21ea0203971406f15ee064be07fc5878d185202baf8e79078a0893b8db582ef5/cluster-scoped-resources/oc_output/get_nodes_-o_wide_--show-labels

ibm-cloud.kubernetes.io/region=us-south

Labeling I did manually right after OCP deployment like:
$ oc label node worker-X-node-name ibm-cloud.kubernetes.io/region=us-south

I will let Coady to check why his verification link job doesn't have labels, but when I tried manually I also didn't succeed to get it working.

--- Additional comment from Coady LaCroix on 2024-01-03 00:25:11 IST ---

We were able to get a successful deployment by adding the region label to all of the cluster nodes. Applying to the workers only was insufficient.

https://url.corp.redhat.com/8b2fc5d

Logs are linked to in the description of the job if necessary.

--- Additional comment from RHEL Program Management on 2024-01-09 11:37:35 IST ---

This BZ is being approved for ODF 4.15.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.15.0

--- Additional comment from RHEL Program Management on 2024-01-09 11:37:35 IST ---

Since this bug has been approved for ODF 4.15.0 release, through release flag 'odf-4.15.0+', the Target Release is being set to 'ODF 4.15.0

--- Additional comment from Coady LaCroix on 2024-01-11 23:11:03 IST ---

Verified the deployment was successful after only labeling the worker nodes.

Jenkins: https://url.corp.redhat.com/7db156b

Verified on Server Version: 4.15.0-0.nightly-2024-01-10-101042

--- Additional comment from errata-xmlrpc on 2024-02-06 16:49:43 IST ---

This bug has been added to advisory RHBA-2023:118688 by Boris Ranto (branto)

--- Additional comment from Mudit Agarwal on 2024-02-07 11:13:02 IST ---



--- Additional comment from Eran Tamir on 2024-02-11 09:48:46 IST ---

please backport to 4.14

Comment 9 errata-xmlrpc 2024-05-28 15:39:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.14.7 Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:3443


Note You need to log in before you can comment on or make changes to this bug.