Bug 2264014 - ocs-storagecluster is in progressing state due to noobaa in configuring state
Summary: ocs-storagecluster is in progressing state due to noobaa in configuring state
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.15
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Nimrod Becker
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-02-13 11:15 UTC by Vijay Avuthu
Modified: 2024-06-24 13:44 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-01 06:08:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 1292 0 None Merged Moving COSI temporarily out of the code 2024-02-18 08:58:32 UTC
Github noobaa noobaa-operator pull 1301 0 None Merged [Backport into 5.15] Moving COSI temporarily out of the code 2024-02-18 08:58:33 UTC

Description Vijay Avuthu 2024-02-13 11:15:15 UTC
Description of problem (please be detailed as possible and provide log
snippests):

deployment type: VSPHERE UPI 1AZ RHCOS VSAN 3M 3W

ocs-storagecluster is in progressing state due to noobaa in configuring state

Version of all relevant components (if applicable):
ocs-registry:4.15.0-139


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
not sure but seen couple of times

Can this issue reproduce from the UI?
not tried


If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
1. install odf using ocs-ci
2. check storagecluster satus
3.


Actual results:

Status:
  Conditions:
    Last Heartbeat Time:   2024-02-13T06:03:24Z
    Last Transition Time:  2024-02-13T06:03:24Z
    Message:               Version check successful
    Reason:                VersionMatched
    Status:                False
    Type:                  VersionMismatch
    Last Heartbeat Time:   2024-02-13T06:11:26Z
    Last Transition Time:  2024-02-13T06:08:25Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2024-02-13T06:03:24Z
    Last Transition Time:  2024-02-13T06:03:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2024-02-13T06:11:26Z
    Last Transition Time:  2024-02-13T06:03:24Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2024-02-13T06:03:24Z
    Last Transition Time:  2024-02-13T06:03:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2024-02-13T06:03:24Z
    Last Transition Time:  2024-02-13T06:03:24Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                Unknown
    Type:                  Upgradeable

  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        compute-0
        compute-1
        compute-2
      topology.rook.io/rack:
        rack0
        rack1
        rack2
  Phase:  Progressing

Expected results:

storagecluster should be in Ready state

Additional info:

noobaa operator log:

2024-02-13T06:11:06.798830918Z time="2024-02-13T06:11:06Z" level=error msg="âš ï¸  RPC: account.read_account() Response Error: Code=UNAUTHORIZED Message=not anonymous method read_account"
2024-02-13T06:11:06.798830918Z time="2024-02-13T06:11:06Z" level=error msg="ReconcileObject: Error Secret  cannot read admin account info, error: not anonymous method read_account" sys=openshift-storage/noobaa
2024-02-13T06:11:06.798847658Z time="2024-02-13T06:11:06Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
2024-02-13T06:11:06.798880778Z time="2024-02-13T06:11:06Z" level=warning msg="â³ Temporary Error: cannot read admin account info, error: not anonymous method read_account" sys=openshift-storage/noobaa
2024-02-13T06:11:06.815377257Z time="2024-02-13T06:11:06Z" level=info msg="Update event detected for noobaa (openshift-storage), queuing Reconcile"
2024-02-13T06:11:06.819064005Z time="2024-02-13T06:11:06Z" level=info msg="UpdateStatus: Done generation 1" sys=openshift-storage/noobaa

job:  https://url.corp.redhat.com/c3bbc60
must gather: https://url.corp.redhat.com/4abcb30

Comment 9 Vijay Avuthu 2024-03-04 05:36:18 UTC
Update:
==========
seen same issue in AWS IPI 3AZ RHCOS 3M 3W 3I Cluster

build: ocs-registry:4.15.0-150


> 

2024-03-01 17:24:42  11:54:42 - MainThread - ocs_ci.utility.utils - INFO  - Executing command: oc --kubeconfig /home/jenkins/current-cluster-dir/openshift-cluster-dir/auth/kubeconfig -n openshift-storage get StorageCluster ocs-storagecluster -n openshift-storage -o yaml
2024-03-01 17:24:42  11:54:42 - MainThread - ocs_ci.ocs.ocp - INFO  - Resource ocs-storagecluster is in phase: Progressing!

> storagecluster status

Status:
  Conditions:
    Last Heartbeat Time:   2024-03-01T11:04:00Z
    Last Transition Time:  2024-03-01T11:04:00Z
    Message:               Version check successful
    Reason:                VersionMatched
    Status:                False
    Type:                  VersionMismatch
    Last Heartbeat Time:   2024-03-01T11:10:40Z
    Last Transition Time:  2024-03-01T11:06:55Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2024-03-01T11:04:00Z
    Last Transition Time:  2024-03-01T11:04:00Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2024-03-01T11:10:40Z
    Last Transition Time:  2024-03-01T11:04:00Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2024-03-01T11:04:00Z
    Last Transition Time:  2024-03-01T11:04:00Z
    Message:               Initializing StorageCluster
    Reason:                Init
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2024-03-01T11:07:50Z
    Last Transition Time:  2024-03-01T11:06:45Z
    Message:               CephCluster is creating: Processing OSD 2 on PVC "ocs-deviceset-0-data-0gg8z9"
    Reason:                ClusterStateCreating
    Status:                False
    Type:                  Upgradeable
  Current Mon Count:       3
  Failure Domain:          zone
  Failure Domain Key:      topology.kubernetes.io/zone
  Failure Domain Values:
    us-east-2a
    us-east-2b
    us-east-2c
  Images:
    Ceph:
      Actual Image:   registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226
      Desired Image:  registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9dbd051cfcdb334aad33a536cc115ae1954edaea5f8cb5943ad615f1b41b0226
    Noobaa Core:
      Actual Image:   registry.redhat.io/odf4/mcg-core-rhel9@sha256:79ca4ebf33fc91115fa5d5aa79c08c81c3df7df4f302b85ce6e8f8eba9d9e1bc
      Desired Image:  registry.redhat.io/odf4/mcg-core-rhel9@sha256:79ca4ebf33fc91115fa5d5aa79c08c81c3df7df4f302b85ce6e8f8eba9d9e1bc
    Noobaa DB:
      Actual Image:   registry.redhat.io/rhel9/postgresql-15@sha256:10e53e191e567248a514a7344c6d78432640aedbc1fa1f7b0364d3b88f8bde2c
      Desired Image:  registry.redhat.io/rhel9/postgresql-15@sha256:10e53e191e567248a514a7344c6d78432640aedbc1fa1f7b0364d3b88f8bde2c
  Kms Server Connection:
  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        ip-10-0-3-214.us-east-2.compute.internal
        ip-10-0-54-251.us-east-2.compute.internal
        ip-10-0-81-15.us-east-2.compute.internal
      topology.kubernetes.io/region:
        us-east-2
      topology.kubernetes.io/zone:
        us-east-2a
        us-east-2b
        us-east-2c
  Phase:  Progressing

> noobaa-operator log

time="2024-03-01T11:13:18Z" level=error msg="âš ï¸  RPC: account.read_account() Response Error: Code=UNAUTHORIZED Message=not anonymous method read_account"
time="2024-03-01T11:13:18Z" level=error msg="ReconcileObject: Error Secret  cannot read admin account info, error: not anonymous method read_account" sys=openshift-storage/noobaa
time="2024-03-01T11:13:18Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
time="2024-03-01T11:13:18Z" level=warning msg="â³ Temporary Error: cannot read admin account info, error: not anonymous method read_account" sys=openshift-storage/noobaa
time="2024-03-01T11:13:18Z" level=info msg="UpdateStatus: Done generation 1" sys=openshift-storage/noobaa
time="2024-03-01T11:13:18Z" level=info msg="Update event detected for noobaa (openshift-storage), queuing Reconcile"
time="2024-03-01T11:13:21Z" level=info msg="Start NooBaa system Reconcile ..." sys=openshift-storage/noobaa


job: https://url.corp.redhat.com/2deec32
must gather: https://url.corp.redhat.com/17e9fbb

Comment 10 Malay Kumar parida 2024-03-04 11:45:56 UTC
We had seen the same error dating back to as long as Dec 30,2023 on build v4.15.0-99, this BZ was created for the same error https://bugzilla.redhat.com/show_bug.cgi?id=2256410. We did a bit of investigation but later the issue was not reproducible so the mentioned BZ had to be closed. Apart from that on ocs-operator repo upstream bundle-e2e-aws test I have seen this error here and there. I also have seen this error while doing my own testing and AFAIK this is not reproducible doing a reinstall sometimes randomly fixes the problem. So to conclude this is not a recent problem.

Comment 19 Vijay Avuthu 2024-04-01 04:46:43 UTC
Re-opening as we have reproduced and live cluster available with build v4.16.0-57

> Platform IBM Cloud

 $ oc get csv
NAME                                        DISPLAY                            VERSION            REPLACES   PHASE
mcg-operator.v4.16.0-57.stable              NooBaa Operator                    4.16.0-57.stable              Succeeded
ocs-client-operator.v4.16.0-57.stable       OpenShift Data Foundation Client   4.16.0-57.stable              Succeeded
ocs-operator.v4.16.0-57.stable              OpenShift Container Storage        4.16.0-57.stable              Succeeded
odf-csi-addons-operator.v4.16.0-57.stable   CSI Addons                         4.16.0-57.stable              Succeeded
odf-operator.v4.16.0-57.stable              OpenShift Data Foundation          4.16.0-57.stable              Succeeded
odf-prometheus-operator.v4.16.0-57.stable   Prometheus Operator                4.16.0-57.stable              Succeeded
rook-ceph-operator.v4.16.0-57.stable        Rook-Ceph                          4.16.0-57.stable              Succeeded


> storage cluster status

status:
  conditions:
  - lastHeartbeatTime: "2024-03-29T13:03:26Z"
    lastTransitionTime: "2024-03-29T13:03:26Z"
    message: Version check successful
    reason: VersionMatched
    status: "False"
    type: VersionMismatch
  - lastHeartbeatTime: "2024-04-01T04:36:57Z"
    lastTransitionTime: "2024-04-01T04:00:10Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: ReconcileComplete
  - lastHeartbeatTime: "2024-03-29T13:03:26Z"
    lastTransitionTime: "2024-03-29T13:03:26Z"
    message: Initializing StorageCluster
    reason: Init
    status: "False"
    type: Available
  - lastHeartbeatTime: "2024-04-01T04:36:57Z"
    lastTransitionTime: "2024-03-29T13:03:26Z"
    message: Waiting on Nooba instance to finish initialization
    reason: NoobaaInitializing
    status: "True"
    type: Progressing
  - lastHeartbeatTime: "2024-03-29T13:03:26Z"
    lastTransitionTime: "2024-03-29T13:03:26Z"
    message: Initializing StorageCluster
    reason: Init
    status: "False"
    type: Degraded
  - lastHeartbeatTime: "2024-03-29T13:03:26Z"
    lastTransitionTime: "2024-03-29T13:03:26Z"
    message: Initializing StorageCluster
    reason: Init
    status: Unknown
    type: Upgradeable
  currentMonCount: 3
  failureDomain: zone
  failureDomainKey: topology.kubernetes.io/zone


> noobaa status

status:
  accounts:
    admin:
      secretRef:
        name: noobaa-admin
        namespace: openshift-storage
  actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:cd796909c5641bd1cec4135856b72e082cbf840d0a5f35bbbc675e38e7812a7a
  conditions:
  - lastHeartbeatTime: "2024-04-01T04:37:13Z"
    lastTransitionTime: "2024-03-29T13:06:59Z"
    message: 'cannot read admin account info, error: not anonymous method read_account'
    reason: TemporaryError
    status: "False"
    type: Available
  - lastHeartbeatTime: "2024-04-01T04:37:13Z"
    lastTransitionTime: "2024-03-29T13:06:59Z"
    message: 'cannot read admin account info, error: not anonymous method read_account'
    reason: TemporaryError
    status: "True"
    type: Progressing
  - lastHeartbeatTime: "2024-04-01T04:37:13Z"
    lastTransitionTime: "2024-03-29T13:06:59Z"
    message: 'cannot read admin account info, error: not anonymous method read_account'
    reason: TemporaryError
    status: "False"
    type: Degraded
  - lastHeartbeatTime: "2024-04-01T04:37:13Z"
    lastTransitionTime: "2024-03-29T13:06:59Z"
    message: 'cannot read admin account info, error: not anonymous method read_account'
    reason: TemporaryError
    status: "False"
    type: Upgradeable
  - lastHeartbeatTime: "2024-04-01T04:37:13Z"
    lastTransitionTime: "2024-03-29T13:06:59Z"
    status: k8s
    type: KMS-Type
  - lastHeartbeatTime: "2024-04-01T04:37:13Z"
    lastTransitionTime: "2024-03-29T13:07:00Z"
    status: Sync
    type: KMS-Status

> noobaa operator log

time="2024-04-01T04:41:44Z" level=error msg="⚠️  RPC: account.read_account() Response Error: Code=UNAUTHORIZED Message=not anonymous method read_account"
time="2024-04-01T04:41:44Z" level=error msg="ReconcileObject: Error Secret  cannot read admin account info, error: not anonymous method read_account" sys=openshift-storage/noobaa
time="2024-04-01T04:41:44Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
time="2024-04-01T04:41:44Z" level=warning msg="⏳ Temporary Error: cannot read admin account info, error: not anonymous method read_account" sys=openshift-storage/noobaa
time="2024-04-01T04:41:44Z" level=info msg="Update event detected for noobaa (openshift-storage), queuing Reconcile"
time="2024-04-01T04:41:44Z" level=info msg="UpdateStatus: Done generation 1" sys=openshift-storage/noobaa

> Cluster is in same state for live debugging


Note You need to log in before you can comment on or make changes to this bug.