Bug 2182644 - [IBM Z] MDR policy creation fails unless the ocs-operator pod is restarted on the managed clusters
Summary: [IBM Z] MDR policy creation fails unless the ocs-operator pod is restarted on...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.13
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.13.0
Assignee: umanga
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On: 2185188
Blocks: 2189864
TreeView+ depends on / blocked
 
Reported: 2023-03-29 07:57 UTC by Sravika
Modified: 2023-08-09 17:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2189864 (view as bug list)
Environment:
Last Closed: 2023-06-21 15:25:01 UTC
Embargoed:


Attachments (Terms of Use)
DRpolicy_creation_ss (236.81 KB, image/png)
2023-03-29 07:57 UTC, Sravika
no flags Details
ocs-operator-log-on-managed-cluster (11.87 KB, text/plain)
2023-03-29 08:01 UTC, Sravika
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2038 0 None open Bug 2182644:[release-4.13] fix clusterclaim reconcile failure 2023-04-26 10:53:10 UTC
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:25:26 UTC

Description Sravika 2023-03-29 07:57:53 UTC
Created attachment 1954358 [details]
DRpolicy_creation_ss

Description of problem (please be detailed as possible and provide log
snippests):
Metro Disaster Recovery policy creation fails unless the ocs-operator pod is restarted on the managed clusters. All the clusters (hub and managed) have the same ODF version v4.13.0-110.stable

Version of all relevant components (if applicable):
openshift-install: 4.13.0-rc.0
ODF: v4.13.0-110.stable
odr-hub-operator: v4.13.0-110.stable
odf-multicluster-orchestrator: v4.13.0-110.stable
ACM: 2.7.2

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
Restart the ocs-operator pod on managed clusters

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes, reproduced every time (5 times)

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create Metro DR setup with hub cluster and 2 managed clusters
2. Install ODF on managed clusters and connect to external storage
3. Install ODF Multicluster Operator on Hub cluster
4. Configure SSL access across clusters
5. Create Disaster recovery policy on hub cluster


Actual results:
Creation of DR policy reports unsupported version of ODF on the managed clusters, although the ODF version is supported and same on all the 3 clusters.
Disaster recovery policy creation fails unless the ocs-operator pod is restarted on the Managed clusters. 

Attaching the screenshot of the DR policy creation failure.

Expected results:
Disaster recovery policy creation should be possible without restarting the ocs-operator pod on Managed clusters.

Additional info:

Comment 2 Sravika 2023-03-29 08:01:46 UTC
Created attachment 1954359 [details]
ocs-operator-log-on-managed-cluster

Comment 3 umanga 2023-04-05 05:36:44 UTC
Looking at the log, this does not look like a DR issue.
It's a bug somewhere in ocs-operator. We should try to recreate it without DR to isolate the issue.

Logs shows StorageCluster has this namespace/name: `"Request.Namespace":"openshift-storage","Request.Name":"ocs-external-storagecluster"`
But, ocs-operator is looking for `"msg":"No StorageCluster resource.","Request.Namespace":"openshift-storage","Request.Name":"ocsinit"`

Moving it to ocs-operator for RCA.

Comment 4 umanga 2023-04-10 10:58:23 UTC
This might have occurred because of another bug: https://bugzilla.redhat.com/show_bug.cgi?id=2185188.
Once that is fixed, we need to check if this issue still exists.

Comment 5 umanga 2023-04-18 11:06:17 UTC
Please verify the issue with "4.13.0-166" build.

Comment 6 Mudit Agarwal 2023-04-24 08:42:42 UTC
Sravika, please reopen if this still exists.

Comment 7 Sravika 2023-04-25 09:41:07 UTC
@uchapaga @muagarwa : Currently I don't have 4.13 environment, will verify the BZ once I move to 4.13 verification later

Comment 8 umanga 2023-04-26 08:44:31 UTC
Reopening the issue as we were able to reproduce it on latest builds.

Comment 9 umanga 2023-04-26 08:46:13 UTC
We need to fix this in 4.13 as it blocks DR workflows on clusters with existing ODF deployments.

Comment 14 Abdul Kandathil (IBM) 2023-05-05 14:35:08 UTC
After applying the below CatalogSource I am able to install the MCO operator from the operator hub on the IBM Z platform.

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  labels:
    ocs-operator-internal: 'true'
  name: redhat-operators
  namespace: openshift-marketplace
spec:
  displayName: Openshift Container Storage
  icon:
    base64data: ''
    mediatype: ''
  image: quay.io/rhceph-dev/ocs-registry:latest-stable-4.13
  priority: 100
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 15m

Comment 15 Sravika 2023-05-23 10:22:38 UTC
With the latest ODF build 4.13.0-203.stable, MDR policy creation is successful without restarting the ocs-operator pods on the managed clusters

Comment 17 errata-xmlrpc 2023-06-21 15:25:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742


Note You need to log in before you can comment on or make changes to this bug.