2256410 – Storagecluster is stuck in Progressing state in latest build v4.15.0-99

Bug 2256410 - Storagecluster is stuck in Progressing state in latest build v4.15.0-99

Summary: Storagecluster is stuck in Progressing state in latest build v4.15.0-99

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.15
Hardware:	ppc64le
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Malay Kumar parida
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-01-02 06:36 UTC by narayanspg
Modified:	2024-02-11 10:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-01-22 06:01:32 UTC
Embargoed:

Attachments	(Terms of Use)

Description narayanspg 2024-01-02 06:36:17 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Storagecluster is stuck in Progressing state in latest build v4.15.0-99
the previous build were working fine. the build 99 is not working.
 below is the error in when we describe the storagecluster - "CephCluster error: failed to create cluster: failed to start ceph osds: failed to update/create OSDs: context canceled"

Version of all relevant components (if applicable):

[root@nara4-2edb-bastion-0 ~]# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-ec.3   True        False         23h     Cluster version is 4.15.0-ec.3
[root@nara4-2edb-bastion-0 ~]#
[root@nara4-2edb-bastion-0 ~]#
[root@nara4-2edb-bastion-0 ~]# oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   33m   Progressing              2024-01-02T06:00:15Z   4.15.0

[root@nara4-2edb-bastion-0 ~]# oc get csv
NAME                                        DISPLAY                       VERSION            REPLACES   PHASE
mcg-operator.v4.15.0-99.stable              NooBaa Operator               4.15.0-99.stable              Succeeded
ocs-operator.v4.15.0-99.stable              OpenShift Container Storage   4.15.0-99.stable              Succeeded
odf-csi-addons-operator.v4.15.0-99.stable   CSI Addons                    4.15.0-99.stable              Succeeded
odf-operator.v4.15.0-99.stable              OpenShift Data Foundation     4.15.0-99.stable              Succeeded


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes. we are not able to continue feature testing on new builds.

Is there any workaround available to the best of your knowledge?
NO

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. create OCP 4.15 cluster
2. install the lso and odf operators. create localvolume
3. create storagesystem and storagecluster will be stuck in progressing state


Actual results:
storagecluster is stuck in progressing state

Expected results:
storagecluster will be set to ready state

Additional info:

[root@nara4-2edb-bastion-0 ~]# oc get pods
NAME                                                              READY   STATUS      RESTARTS        AGE
csi-addons-controller-manager-5f59bb6fc4-nmdmq                    2/2     Running     0               10m
csi-cephfsplugin-5gzhx                                            2/2     Running     0               8m37s
csi-cephfsplugin-6xzzr                                            2/2     Running     1 (7m59s ago)   8m37s
csi-cephfsplugin-provisioner-ff8bb6b44-ptqgs                      6/6     Running     2 (7m58s ago)   8m37s
csi-cephfsplugin-provisioner-ff8bb6b44-xrtrj                      6/6     Running     4 (7m54s ago)   8m37s
csi-cephfsplugin-tjqkq                                            2/2     Running     1 (8m ago)      8m37s
csi-rbdplugin-6s9h5                                               3/3     Running     0               8m37s
csi-rbdplugin-provisioner-567b58b8ff-6vbxx                        6/6     Running     4 (7m54s ago)   8m37s
csi-rbdplugin-provisioner-567b58b8ff-rp7tv                        6/6     Running     0               8m37s
csi-rbdplugin-tckq8                                               3/3     Running     1 (8m ago)      8m37s
csi-rbdplugin-vk458                                               3/3     Running     1 (7m59s ago)   8m37s
noobaa-core-0                                                     1/1     Running     0               4m47s
noobaa-db-pg-0                                                    1/1     Running     0               5m45s
noobaa-operator-5b5bd9b87c-n6npf                                  2/2     Running     0               10m
ocs-metrics-exporter-65d789b85f-9mb6d                             1/1     Running     0               5m55s
ocs-operator-67dfc4b997-75876                                     1/1     Running     0               10m
odf-console-bb57b6f6-jkwrj                                        1/1     Running     0               11m
odf-operator-controller-manager-7bdbc5c7fd-76tkc                  2/2     Running     0               11m
rook-ceph-crashcollector-worker-0-7447cfc595-rfxbl                1/1     Running     0               6m14s
rook-ceph-crashcollector-worker-1-7946896c88-k5vzn                1/1     Running     0               5m59s
rook-ceph-crashcollector-worker-2-6d7b7d78f7-kdx75                1/1     Running     0               6m18s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6698c6f4ncvdr   2/2     Running     0               6m18s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-59fd874dfh64v   2/2     Running     0               6m15s
rook-ceph-mgr-a-969f78995-zmg4l                                   3/3     Running     0               7m24s
rook-ceph-mgr-b-54c8b967f6-cfc95                                  3/3     Running     0               7m23s
rook-ceph-mon-a-f7dbd6cc-xnrwd                                    2/2     Running     0               8m12s
rook-ceph-mon-b-79b77d47bf-hck2f                                  2/2     Running     0               7m47s
rook-ceph-mon-c-6ccbfcbf7-dk4z4                                   2/2     Running     0               7m36s
rook-ceph-operator-7c45cd9474-p5mfp                               1/1     Running     0               8m37s
rook-ceph-osd-0-5c497554c4-tb9ks                                  2/2     Running     0               6m49s
rook-ceph-osd-1-6857ccd444-ln4vb                                  2/2     Running     0               6m49s
rook-ceph-osd-2-5cc5c6779c-l27vs                                  2/2     Running     0               6m47s
rook-ceph-osd-prepare-4ae47a7430335c087c9140b4de7e3ba9-5hfs8      0/1     Completed   0               7m
rook-ceph-osd-prepare-e7fded9a680cffaca41872ffa7197819-xnd5x      0/1     Completed   0               7m1s
rook-ceph-osd-prepare-f3bb1584f0bc543cb4524d67ded2ec19-ls9pt      0/1     Completed   0               7m1s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-f96ccf8fzlv7   2/2     Running     0               5m59s
[root@nara4-2edb-bastion-0 ~]#

Comment 3 Malay Kumar parida 2024-01-04 06:35:03 UTC

Upon going through the must-gather & also after a live call with Naraynswami, I found that Noobaa CR is stuck in the configuring phase.
I see that the CephCluster is ready & all related pods are up and running.
Noobaa CR is stuck in the configuring phase so the storagecluster never gets ready.
The conditions message on noobaa CR says "cannot read admin account info, error: not anonymous method read_account".

Comment 4 Malay Kumar parida 2024-01-04 06:36:02 UTC

[root@nara4-2edb-bastion-0 ~]# oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   2d    Progressing              2024-01-02T06:00:15Z   4.15.0
[root@nara4-2edb-bastion-0 ~]#
[root@nara4-2edb-bastion-0 ~]# oc get storagecluster -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
  kind: StorageCluster
  metadata:
    annotations:
      cluster.ocs.openshift.io/local-devices: "true"
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
    creationTimestamp: "2024-01-02T06:00:15Z"
    finalizers:
    - storagecluster.ocs.openshift.io
    generation: 2
    name: ocs-storagecluster
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: odf.openshift.io/v1alpha1
      kind: StorageSystem
      name: ocs-storagecluster-storagesystem
      uid: b23143df-26bb-40b1-a51c-f4d21a336014
    resourceVersion: "2355179"
    uid: c14f50a4-8f42-4c0a-b2a6-bbe8dbc95ea6
  spec:
    arbiter: {}
    encryption:
      kms: {}
    externalStorage: {}
    flexibleScaling: true
    managedResources:
      cephBlockPools:
        defaultStorageClass: true
      cephCluster: {}
      cephConfig: {}
      cephDashboard: {}
      cephFilesystems: {}
      cephNonResilientPools: {}
      cephObjectStoreUsers: {}
      cephObjectStores: {}
      cephRBDMirror:
        daemonCount: 1
      cephToolbox: {}
    mirroring: {}
    monDataDirHostPath: /var/lib/rook
    network:
      connections:
        encryption: {}
      multiClusterService: {}
    nodeTopologies: {}
    resourceProfile: balanced
    storageDeviceSets:
    - config: {}
      count: 3
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: "1"
          storageClassName: localblock
          volumeMode: Block
        status: {}
      name: ocs-deviceset-localblock
      placement: {}
      preparePlacement: {}
      replica: 1
      resources: {}
  status:
    conditions:
    - lastHeartbeatTime: "2024-01-02T06:00:16Z"
      lastTransitionTime: "2024-01-02T06:00:16Z"
      message: Version check successful
      reason: VersionMatched
      status: "False"
      type: VersionMismatch
    - lastHeartbeatTime: "2024-01-04T06:22:38Z"
      lastTransitionTime: "2024-01-04T05:22:32Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: ReconcileComplete
    - lastHeartbeatTime: "2024-01-02T06:02:59Z"
      lastTransitionTime: "2024-01-02T06:00:16Z"
      message: 'CephCluster error: failed to create cluster: failed to start ceph
        osds: failed to update/create OSDs: context canceled'
      reason: ClusterStateError
      status: "False"
      type: Available
    - lastHeartbeatTime: "2024-01-04T06:22:38Z"
      lastTransitionTime: "2024-01-02T06:00:16Z"
      message: Waiting on Nooba instance to finish initialization
      reason: NoobaaInitializing
      status: "True"
      type: Progressing
    - lastHeartbeatTime: "2024-01-02T06:02:59Z"
      lastTransitionTime: "2024-01-02T06:02:59Z"
      message: 'CephCluster error: failed to create cluster: failed to start ceph
        osds: failed to update/create OSDs: context canceled'
      reason: ClusterStateError
      status: "True"
      type: Degraded
    - lastHeartbeatTime: "2024-01-02T06:03:54Z"
      lastTransitionTime: "2024-01-02T06:02:57Z"
      message: 'CephCluster is creating: Processing OSD 2 on PVC "ocs-deviceset-localblock-0-data-09kwfn"'
      reason: ClusterStateCreating
      status: "False"
      type: Upgradeable
    failureDomain: host
    failureDomainKey: kubernetes.io/hostname
    failureDomainValues:
    - worker-0
    - worker-1
    - worker-2
    images:
      ceph:
        actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:1b1870ca13fc52d3c1a6c603e471e230a90cba94baaef9cf56c02b6c7dac35ca
        desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:1b1870ca13fc52d3c1a6c603e471e230a90cba94baaef9cf56c02b6c7dac35ca
      noobaaCore:
        actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:e84250acc66b169d54f64872df683033a74a71c5757808aff1a98e3fffc18a54
        desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:e84250acc66b169d54f64872df683033a74a71c5757808aff1a98e3fffc18a54
      noobaaDB:
        actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:cd5b8cb243a0b233a08bdf807df7bc6192a18e1dc322789d6d2e064e9721d8f0
        desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:cd5b8cb243a0b233a08bdf807df7bc6192a18e1dc322789d6d2e064e9721d8f0
    kmsServerConnection: {}
    lastAppliedResourceProfile: balanced
    nodeTopologies:
      labels:
        kubernetes.io/hostname:
        - worker-0
        - worker-1
        - worker-2
    phase: Progressing
    relatedObjects:
    - apiVersion: ceph.rook.io/v1
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      namespace: openshift-storage
      resourceVersion: "2354536"
      uid: c03b3258-84d5-4ff8-a7ba-459aef7ce42b
    - apiVersion: noobaa.io/v1alpha1
      kind: NooBaa
      name: noobaa
      namespace: openshift-storage
      resourceVersion: "2355174"
      uid: 815db159-46dd-4136-8636-f035e0139ed8
    version: 4.15.0
kind: List
metadata:
  resourceVersion: ""
[root@nara4-2edb-bastion-0 ~]# oc get pods
NAME                                                              READY   STATUS      RESTARTS     AGE
csi-addons-controller-manager-7865c8f5f4-nvzcf                    2/2     Running     0            44h
csi-cephfsplugin-5gzhx                                            2/2     Running     0            2d
csi-cephfsplugin-6xzzr                                            2/2     Running     1 (2d ago)   2d
csi-cephfsplugin-provisioner-ff8bb6b44-ptqgs                      6/6     Running     2 (2d ago)   2d
csi-cephfsplugin-provisioner-ff8bb6b44-xrtrj                      6/6     Running     4 (2d ago)   2d
csi-cephfsplugin-tjqkq                                            2/2     Running     1 (2d ago)   2d
csi-rbdplugin-6s9h5                                               3/3     Running     0            2d
csi-rbdplugin-provisioner-567b58b8ff-6vbxx                        6/6     Running     4 (2d ago)   2d
csi-rbdplugin-provisioner-567b58b8ff-rp7tv                        6/6     Running     0            2d
csi-rbdplugin-tckq8                                               3/3     Running     1 (2d ago)   2d
csi-rbdplugin-vk458                                               3/3     Running     1 (2d ago)   2d
noobaa-core-0                                                     1/1     Running     0            2d
noobaa-db-pg-0                                                    1/1     Running     0            2d
noobaa-operator-65b7c5fcbd-qx6nt                                  2/2     Running     0            44h
ocs-metrics-exporter-65d789b85f-9mb6d                             1/1     Running     0            2d
ocs-operator-67dfc4b997-75876                                     1/1     Running     0            2d
odf-console-bb57b6f6-jkwrj                                        1/1     Running     0            2d
odf-operator-controller-manager-7bdbc5c7fd-76tkc                  2/2     Running     0            2d
rook-ceph-crashcollector-worker-0-7447cfc595-rfxbl                1/1     Running     0            2d
rook-ceph-crashcollector-worker-1-7946896c88-k5vzn                1/1     Running     0            2d
rook-ceph-crashcollector-worker-2-6d7b7d78f7-kdx75                1/1     Running     0            2d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6698c6f4ncvdr   2/2     Running     0            2d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-59fd874dfh64v   2/2     Running     0            2d
rook-ceph-mgr-a-969f78995-zmg4l                                   3/3     Running     0            2d
rook-ceph-mgr-b-54c8b967f6-cfc95                                  3/3     Running     0            2d
rook-ceph-mon-a-f7dbd6cc-xnrwd                                    2/2     Running     0            2d
rook-ceph-mon-b-79b77d47bf-hck2f                                  2/2     Running     0            2d
rook-ceph-mon-c-6ccbfcbf7-dk4z4                                   2/2     Running     0            2d
rook-ceph-operator-7c45cd9474-p5mfp                               1/1     Running     0            2d
rook-ceph-osd-0-5c497554c4-tb9ks                                  2/2     Running     0            2d
rook-ceph-osd-1-6857ccd444-ln4vb                                  2/2     Running     0            2d
rook-ceph-osd-2-5cc5c6779c-l27vs                                  2/2     Running     0            2d
rook-ceph-osd-prepare-4ae47a7430335c087c9140b4de7e3ba9-5hfs8      0/1     Completed   0            2d
rook-ceph-osd-prepare-e7fded9a680cffaca41872ffa7197819-xnd5x      0/1     Completed   0            2d
rook-ceph-osd-prepare-f3bb1584f0bc543cb4524d67ded2ec19-ls9pt      0/1     Completed   0            2d
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-f96ccf8fzlv7   2/2     Running     0            2d
[root@nara4-2edb-bastion-0 ~]#
[root@nara4-2edb-bastion-0 ~]# oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE                        HEALTH      EXTERNAL   FSID
ocs-storagecluster-cephcluster   /var/lib/rook     3          2d    Ready   Cluster created successfully   HEALTH_OK              8dab246a-923c-47b9-88c9-4b6c7935da57
[root@nara4-2edb-bastion-0 ~]#
[root@nara4-2edb-bastion-0 ~]# oc get noobaa
NAME     S3-ENDPOINTS   STS-ENDPOINTS   IMAGE                                                                                                            PHASE         AGE
noobaa                                  registry.redhat.io/odf4/mcg-core-rhel9@sha256:e84250acc66b169d54f64872df683033a74a71c5757808aff1a98e3fffc18a54   Configuring   2d
[root@nara4-2edb-bastion-0 ~]# oc get noobaa -o yaml
apiVersion: v1
items:
- apiVersion: noobaa.io/v1alpha1
  kind: NooBaa
  metadata:
    creationTimestamp: "2024-01-02T06:03:06Z"
    finalizers:
    - noobaa.io/graceful_finalizer
    generation: 1
    labels:
      app: noobaa
    name: noobaa
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: ocs.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: StorageCluster
      name: ocs-storagecluster
      uid: c14f50a4-8f42-4c0a-b2a6-bbe8dbc95ea6
    resourceVersion: "2355451"
    uid: 815db159-46dd-4136-8636-f035e0139ed8
  spec:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: cluster.ocs.openshift.io/openshift-storage
              operator: Exists
    autoscaler:
      autoscalerType: hpav2
      prometheusNamespace: openshift-monitoring
    cleanupPolicy: {}
    coreResources:
      limits:
        cpu: 999m
        memory: 4Gi
      requests:
        cpu: 999m
        memory: 4Gi
    dbImage: registry.redhat.io/rhel8/postgresql-12@sha256:cd5b8cb243a0b233a08bdf807df7bc6192a18e1dc322789d6d2e064e9721d8f0
    dbResources:
      limits:
        cpu: 500m
        memory: 4Gi
      requests:
        cpu: 500m
        memory: 4Gi
    dbStorageClass: ocs-storagecluster-ceph-rbd
    dbType: postgres
    dbVolumeResources:
      requests:
        storage: 50Gi
    endpoints:
      maxCount: 2
      minCount: 1
      resources:
        limits:
          cpu: 999m
          memory: 2Gi
        requests:
          cpu: 999m
          memory: 2Gi
    image: registry.redhat.io/odf4/mcg-core-rhel9@sha256:e84250acc66b169d54f64872df683033a74a71c5757808aff1a98e3fffc18a54
    labels:
      monitoring: {}
    loadBalancerSourceSubnets: {}
    pvPoolDefaultStorageClass: ocs-storagecluster-ceph-rbd
    security:
      kms: {}
    tolerations:
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
  status:
    accounts:
      admin:
        secretRef:
          name: noobaa-admin
          namespace: openshift-storage
    actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:e84250acc66b169d54f64872df683033a74a71c5757808aff1a98e3fffc18a54
    conditions:
    - lastHeartbeatTime: "2024-01-04T06:23:04Z"
      lastTransitionTime: "2024-01-02T06:03:06Z"
      message: 'cannot read admin account info, error: not anonymous method read_account'
      reason: TemporaryError
      status: "False"
      type: Available
    - lastHeartbeatTime: "2024-01-04T06:23:04Z"
      lastTransitionTime: "2024-01-02T06:03:06Z"
      message: 'cannot read admin account info, error: not anonymous method read_account'
      reason: TemporaryError
      status: "True"
      type: Progressing
    - lastHeartbeatTime: "2024-01-04T06:23:04Z"
      lastTransitionTime: "2024-01-02T06:03:06Z"
      message: 'cannot read admin account info, error: not anonymous method read_account'
      reason: TemporaryError
      status: "False"
      type: Degraded
    - lastHeartbeatTime: "2024-01-04T06:23:04Z"
      lastTransitionTime: "2024-01-02T06:03:06Z"
      message: 'cannot read admin account info, error: not anonymous method read_account'
      reason: TemporaryError
      status: "False"
      type: Upgradeable
    - lastHeartbeatTime: "2024-01-04T06:23:04Z"
      lastTransitionTime: "2024-01-02T06:03:07Z"
      status: k8s
      type: KMS-Type
    - lastHeartbeatTime: "2024-01-04T06:23:04Z"
      lastTransitionTime: "2024-01-02T06:03:08Z"
      status: Sync
      type: KMS-Status
    observedGeneration: 1
    phase: Configuring
    readme: "\n\n\tNooBaa operator is still working to reconcile this system.\n\tCheck
      out the system status.phase, status.conditions, and events with:\n\n\t\tkubectl
      -n openshift-storage describe noobaa\n\t\tkubectl -n openshift-storage get noobaa
      -o yaml\n\t\tkubectl -n openshift-storage get events --sort-by=metadata.creationTimestamp\n\n\tYou
      can wait for a specific condition with:\n\n\t\tkubectl -n openshift-storage
      wait noobaa/noobaa --for condition=available --timeout -1s\n\n\tNooBaa Core
      Version:     master-20230920\n\tNooBaa Operator Version: 5.15.0\n"
    services:
      serviceMgmt:
        externalDNS:
        - https://noobaa-mgmt-openshift-storage.apps.nara4-2edb.redhat.com:443
        internalDNS:
        - https://noobaa-mgmt.openshift-storage.svc:443
        internalIP:
        - https://172.30.220.62:443
        nodePorts:
        - https://10.20.187.252:0
        podPorts:
        - https://10.131.0.41:8443
      serviceS3:
        externalDNS:
        - https://s3-openshift-storage.apps.nara4-2edb.redhat.com:443
        internalDNS:
        - https://s3.openshift-storage.svc:443
        internalIP:
        - https://172.30.58.186:443
      serviceSts:
        externalDNS:
        - https://sts-openshift-storage.apps.nara4-2edb.redhat.com:443
        internalDNS:
        - https://sts.openshift-storage.svc:443
        internalIP:
        - https://172.30.142.245:443
    upgradePhase: NoUpgrade
kind: List
metadata:
  resourceVersion: ""
[root@nara4-2edb-bastion-0 ~]#

Comment 8 Malay Kumar parida 2024-01-11 11:35:36 UTC

As per discussion with Naranyanaswami & the comment above, closing the BZ.

Comment 9 Sagi Hirshfeld 2024-01-16 12:22:21 UTC

Reopening due to a possible reproduce on ODF build 104. Detailing findings in the follow-up comment.

Comment 11 avdhoot 2024-01-17 05:15:37 UTC

Initially ocs operator was in error state on the live cluster which is mentioned above.

➜  clust2 oc get pods -n openshift-storage
NAME                                               READY   STATUS             RESTARTS        AGE
csi-addons-controller-manager-855544975d-pbc84     2/2     Running            88 (65m ago)    3d18h
csi-cephfsplugin-9fvdw                             2/2     Running            1 (3d19h ago)   3d19h
csi-cephfsplugin-provisioner-f486cc4c8-6gnwp       6/6     Running            2 (3d19h ago)   3d19h
csi-cephfsplugin-provisioner-f486cc4c8-q845x       6/6     Running            4 (3d19h ago)   3d19h
csi-cephfsplugin-sr4wm                             2/2     Running            1 (3d19h ago)   3d19h
csi-cephfsplugin-wf5dl                             2/2     Running            0               3d19h
csi-rbdplugin-7mmq4                                3/3     Running            1 (3d19h ago)   3d19h
csi-rbdplugin-kfmf7                                3/3     Running            0               3d19h
csi-rbdplugin-provisioner-84cd9d7bb7-556p2         6/6     Running            2 (3d19h ago)   3d19h
csi-rbdplugin-provisioner-84cd9d7bb7-sx7z5         6/6     Running            5 (3d19h ago)   3d19h
csi-rbdplugin-w5xfl                                3/3     Running            1 (3d19h ago)   3d19h
maintenance-agent-755ccdbb47-gsbvt                 0/1     CrashLoopBackOff   732 (61s ago)   3d15h
noobaa-core-0                                      1/1     Running            0               3d19h
noobaa-db-pg-0                                     1/1     Running            0               3d19h
noobaa-operator-568c8d7bdc-kwcxr                   2/2     Running            17 (83m ago)    3d19h
ocs-metrics-exporter-67846dc54b-qzgww              1/1     Running            0               3d19h
ocs-operator-bd767766f-svl5j                       0/1     Error              101 (47m ago)   3d19h
odf-console-5fdb76657d-h46t8                       1/1     Running            0               3d19h
odf-operator-controller-manager-689c57969b-62284   2/2     Running            78 (47m ago)    3d19h
rook-ceph-operator-55c564df6b-4xjbc                1/1     Running            0               3d19h
token-exchange-agent-6c4f658fcb-8zltf              1/1     Running            0

Comment 12 Malay Kumar parida 2024-01-18 02:56:18 UTC

I tried to see if the cluster is still there but it's gone now—a few questions while I look at the must-gather.
. Is this always reproducible with the said build always?
. Which platform is this on? Initially, the BUG was reported from IBM power cluster.
. Naraynaswami reported that on #104 they had the issue but with latest builds it did not happen can you try with the latest build once & let me know if it still happens

Comment 14 Malay Kumar parida 2024-01-20 16:05:31 UTC

I see the build on the above link has succeeded. I assume there was some intermittent issue with builds in-between 99-104 which was causing the issue.

Note You need to log in before you can comment on or make changes to this bug.