Bug 2302507

Summary: Backingstore Stuck "Connecting" post ODF v4.15 Upgrade - INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Craig Wayman <crwayman>
Component: Multi-Cloud Object GatewayAssignee: Danny <dzaken>
Status: CLOSED ERRATA QA Contact: Mahesh Shetty <mashetty>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.15CC: dzaken, edonnell, kramdoss, lmauda, mashetty, nbecker, nravinas, odf-bz-bot, shirshfe
Target Milestone: ---   
Target Release: ODF 4.17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.17.0-77 Doc Type: Bug Fix
Doc Text:
Noobaa Backingstore no longer stuck in `Connecting` post upgrade Previoulsy, NooBaa backingstore blocked upgrade as it remained in the `Connecting` phase leaving the storagecluster.yaml in phase `Progressing`. This issue has been fixed, and upgrade progresses as expected.
Story Points: ---
Clone Of:
: 2303414 (view as bug list) Environment:
Last Closed: 2024-10-30 14:29:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2281703, 2303414    

Description Craig Wayman 2024-08-02 12:26:46 UTC
Description of problem (please be detailed as possible and provide log snippets):

  It looks like this is similar to Bug 2270090. This is the customer's production cluster and their Object Storage/NooBaa has been down with the NooBaa Backingstore stuck in Phase "Connecting" since Aug 1st.  

  The customer recently upgraded from ODF v4.14 to ODF v4.15.5 Once the upgrade was completed, MCG was down. The customer found this solution: https://access.redhat.com/articles/7079321 and applied it.

  After running the solution the Auto option failed, but the Manual option succeeded. However, although option #2 manual upgrade succeeded, the customer was still facing the following: Error: Code=INVALID_SCHEMA_REPLY Message=INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system".

  After troubleshooting, MCG remains down. I had the customer upload their db dump and NooBaa logs that can be found in supportshell under /cases/03893341/

  Is there a known workaround or solution to get MCG back online? I am not seeing it in the 2270090 bug. Any help would be appreciated. I will post some snippets of the error below and in the private notes.


time="2024-08-01T12:26:41Z" level=info msg="UpdateStatus: Done" bucketclass=openshift-storage/noobaa-default-bucket-class
time="2024-08-01T12:26:42Z" level=error msg="⚠️  RPC: system.read_system() Response Error: Code=INVALID_SCHEMA_REPLY Message=INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system"

  I also noticed when the customer was backing up secrets that the noobaa-root-master-key doesn't exist, which is normal for v4.15.. but the noobaa-root-master-key-backend, or the noobaa-root-master-key-volume secret doesn't exist either. Just an observation.


Version of all relevant components (if applicable):

OCP:

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.22   True        False         6d19h   Cluster version is 4.15.22


ODF:
NAME                                          DISPLAY                                    VERSION                REPLACES                                       PHASE
mcg-operator.v4.15.5-rhodf                    NooBaa Operator                            4.15.5-rhodf           mcg-operator.v4.14.9-rhodf                     Succeeded
metallb-operator.v4.15.0-202407221237         MetalLB Operator                           4.15.0-202407221237    metallb-operator.v4.15.0-202407191406          Succeeded
ocs-operator.v4.15.5-rhodf                    OpenShift Container Storage                4.15.5-rhodf           ocs-operator.v4.14.9-rhodf                     Succeeded
odf-csi-addons-operator.v4.15.5-rhodf         CSI Addons                                 4.15.5-rhodf           odf-csi-addons-operator.v4.14.9-rhodf          Succeeded
odf-operator.v4.15.5-rhodf                    OpenShift Data Foundation                  4.15.5-rhodf           odf-operator.v4.14.9-rhodf                     Succeeded


Ceph:
{
    "mon": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 120
    },
    "mds": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 2
    },
    "rgw": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 128
    }
}



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Production Down.


Is there any workaround available to the best of your knowledge?

No


Regards,


Craig Wayman
TSE Red Hat OpenShift Data Foundations (ODF) 
Customer Experience and Engagement, NA

Comment 28 errata-xmlrpc 2024-10-30 14:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Comment 29 Red Hat Bugzilla 2025-02-28 04:25:24 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days