2302507 – Backingstore Stuck "Connecting" post ODF v4.15 Upgrade - INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system

Bug 2302507 - Backingstore Stuck "Connecting" post ODF v4.15 Upgrade - INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system

Summary: Backingstore Stuck "Connecting" post ODF v4.15 Upgrade - INVALID_SCHEMA_REPLY...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.17.0
Assignee:	Danny
QA Contact:	Mahesh Shetty
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2270090 (view as bug list)
Depends On:
Blocks:	2281703 2303414
TreeView+	depends on / blocked

Reported:	2024-08-02 12:26 UTC by Craig Wayman
Modified:	2025-02-28 04:25 UTC (History)
CC List:	9 users (show)
Fixed In Version:	4.17.0-77
Doc Type:	Bug Fix
Doc Text:	Noobaa Backingstore no longer stuck in `Connecting` post upgrade Previoulsy, NooBaa backingstore blocked upgrade as it remained in the `Connecting` phase leaving the storagecluster.yaml in phase `Progressing`. This issue has been fixed, and upgrade progresses as expected.
Clone Of:
Clones:	2303414 (view as bug list)
Environment:
Last Closed:	2024-10-30 14:29:44 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-core pull 8261	None	Merged	moved upgrade script of upgrade_bucket_policy to 5.15.6 dir	2024-08-13 13:32:56 UTC
Github	noobaa noobaa-core pull 8277	None	Merged	Backport to 5.17	2024-08-13 13:34:44 UTC
Red Hat Issue Tracker	OCSBZM-8894	None	None	None	2024-08-30 06:17:27 UTC
Red Hat Product Errata	RHSA-2024:8676	None	None	None	2024-10-30 14:30:00 UTC

Description Craig Wayman 2024-08-02 12:26:46 UTC

Description of problem (please be detailed as possible and provide log snippets):

  It looks like this is similar to Bug 2270090. This is the customer's production cluster and their Object Storage/NooBaa has been down with the NooBaa Backingstore stuck in Phase "Connecting" since Aug 1st.  

  The customer recently upgraded from ODF v4.14 to ODF v4.15.5 Once the upgrade was completed, MCG was down. The customer found this solution: https://access.redhat.com/articles/7079321 and applied it.

  After running the solution the Auto option failed, but the Manual option succeeded. However, although option #2 manual upgrade succeeded, the customer was still facing the following: Error: Code=INVALID_SCHEMA_REPLY Message=INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system".

  After troubleshooting, MCG remains down. I had the customer upload their db dump and NooBaa logs that can be found in supportshell under /cases/03893341/

  Is there a known workaround or solution to get MCG back online? I am not seeing it in the 2270090 bug. Any help would be appreciated. I will post some snippets of the error below and in the private notes.


time="2024-08-01T12:26:41Z" level=info msg="UpdateStatus: Done" bucketclass=openshift-storage/noobaa-default-bucket-class
time="2024-08-01T12:26:42Z" level=error msg="⚠️  RPC: system.read_system() Response Error: Code=INVALID_SCHEMA_REPLY Message=INVALID_SCHEMA_REPLY SERVER system_api#/methods/read_system"

  I also noticed when the customer was backing up secrets that the noobaa-root-master-key doesn't exist, which is normal for v4.15.. but the noobaa-root-master-key-backend, or the noobaa-root-master-key-volume secret doesn't exist either. Just an observation.


Version of all relevant components (if applicable):

OCP:

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.22   True        False         6d19h   Cluster version is 4.15.22


ODF:
NAME                                          DISPLAY                                    VERSION                REPLACES                                       PHASE
mcg-operator.v4.15.5-rhodf                    NooBaa Operator                            4.15.5-rhodf           mcg-operator.v4.14.9-rhodf                     Succeeded
metallb-operator.v4.15.0-202407221237         MetalLB Operator                           4.15.0-202407221237    metallb-operator.v4.15.0-202407191406          Succeeded
ocs-operator.v4.15.5-rhodf                    OpenShift Container Storage                4.15.5-rhodf           ocs-operator.v4.14.9-rhodf                     Succeeded
odf-csi-addons-operator.v4.15.5-rhodf         CSI Addons                                 4.15.5-rhodf           odf-csi-addons-operator.v4.14.9-rhodf          Succeeded
odf-operator.v4.15.5-rhodf                    OpenShift Data Foundation                  4.15.5-rhodf           odf-operator.v4.14.9-rhodf                     Succeeded


Ceph:
{
    "mon": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 120
    },
    "mds": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 2
    },
    "rgw": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.6-216.el9cp (2787f204195806caaa97db2b363a263d5a9fa156) quincy (stable)": 128
    }
}



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Production Down.


Is there any workaround available to the best of your knowledge?

No


Regards,


Craig Wayman
TSE Red Hat OpenShift Data Foundations (ODF) 
Customer Experience and Engagement, NA

Comment 28 errata-xmlrpc 2024-10-30 14:29:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676

Comment 29 Red Hat Bugzilla 2025-02-28 04:25:24 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.