Description of problem (please be detailed as possible and provide log snippests): Health is going in Warning state after patching the storagecluster for replica-1 in ODF4.15 on IBM Power cluster Version of all relevant components (if applicable): OCP: 4.15 ODF: 4.15.0-120 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create OCP4.15 cluster. Install LSO4.15 and ODF4.15 cluster. 2. Create localvolume with one disk and Create storagesystem. 3. Update localvolume with additional disk which will create 3 new PVs. 4. Patch storagecluster yaml using following command: oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephNonResilientPools/enable", "value": true }]' Actual results: Ceph health is in Warning state. Expected results: Ceph health should be in OK state. Additional info:
CSV: [root@rdr-odfpool-bastion-0 ~]# oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-local-storage local-storage-operator.v4.15.0-202311280332 Local Storage 4.15.0-202311280332 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.0.1-snapshot Succeeded openshift-storage mcg-operator.v4.15.0-120.stable NooBaa Operator 4.15.0-120.stable Succeeded openshift-storage ocs-operator.v4.15.0-120.stable OpenShift Container Storage 4.15.0-120.stable Succeeded openshift-storage odf-csi-addons-operator.v4.15.0-120.stable CSI Addons 4.15.0-120.stable Succeeded openshift-storage odf-operator.v4.15.0-120.stable OpenShift Data Foundation 4.15.0-120.stable Succeeded [root@rdr-odfpool-bastion-0 ~]# Pods: [root@rdr-odfpool-bastion-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-58d5498995-8wlsp 2/2 Running 15 (10h ago) 5d22h csi-cephfsplugin-92wts 2/2 Running 0 4d3h csi-cephfsplugin-jnpdd 2/2 Running 0 4d3h csi-cephfsplugin-provisioner-747587df87-bb86z 6/6 Running 0 4d3h csi-cephfsplugin-provisioner-747587df87-nkzns 6/6 Running 0 4d3h csi-cephfsplugin-w426w 2/2 Running 1 (4d3h ago) 4d3h csi-rbdplugin-lxc7q 3/3 Running 0 4d2h csi-rbdplugin-provisioner-7b7c74c7dd-fms6z 6/6 Running 0 4d2h csi-rbdplugin-provisioner-7b7c74c7dd-x8dbc 6/6 Running 0 4d2h csi-rbdplugin-rx8tb 3/3 Running 0 4d2h csi-rbdplugin-z78mb 3/3 Running 0 4d2h noobaa-core-0 1/1 Running 0 4d3h noobaa-db-pg-0 1/1 Running 0 4d3h noobaa-endpoint-6485c65647-n9btr 1/1 Running 0 4d3h noobaa-operator-6d7d5b477-mn95n 2/2 Running 0 5d22h ocs-metrics-exporter-67dc65cbcb-fp56t 1/1 Running 0 4d3h ocs-operator-5cb4f78cb6-l9t97 1/1 Running 13 (8h ago) 5d22h odf-console-6b58b9fdd7-9stzt 1/1 Running 0 5d22h odf-operator-controller-manager-7857965fbc-j2246 2/2 Running 13 (8h ago) 5d22h rook-ceph-crashcollector-worker-0-5f8d4944-46mj9 1/1 Running 0 4d3h rook-ceph-crashcollector-worker-1-6bbfd975f9-sklt6 1/1 Running 0 4d3h rook-ceph-crashcollector-worker-2-5c4f9ddbfd-w6r7x 1/1 Running 0 4d3h rook-ceph-exporter-worker-0-bdb959b6d-5nbqp 1/1 Running 0 4d3h rook-ceph-exporter-worker-1-576fd75979-d56wz 1/1 Running 0 4d3h rook-ceph-exporter-worker-2-d94f78766-t8cqs 1/1 Running 0 4d3h rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7db974dbj5bft 2/2 Running 0 4d3h rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6bbcbb958vrf5 2/2 Running 0 4d3h rook-ceph-mgr-a-6f77cff446-5dphm 3/3 Running 0 4d3h rook-ceph-mgr-b-6cb75f8f57-ddvfr 3/3 Running 0 4d3h rook-ceph-mon-a-b84b6c548-qdd7g 2/2 Running 0 4d3h rook-ceph-mon-b-5694c6cb74-bfsng 2/2 Running 0 4d3h rook-ceph-mon-c-57c6dc4b8c-4lbx9 2/2 Running 0 4d3h rook-ceph-operator-8679b956f6-qnqcr 1/1 Running 0 4d2h rook-ceph-osd-0-d8b5d68b9-wcwcv 2/2 Running 0 4d3h rook-ceph-osd-1-dc5f84dc5-n4rhk 2/2 Running 0 4d3h rook-ceph-osd-2-597c7495b6-zn9xp 2/2 Running 0 4d3h rook-ceph-osd-3-6fcbf6997b-thbmz 2/2 Running 0 4d2h rook-ceph-osd-4-57c898cb94-mdmjx 2/2 Running 0 4d2h rook-ceph-osd-5-84bdc44c49-g6s2c 2/2 Running 0 4d2h rook-ceph-osd-prepare-3ab83856ecd22df4394b63644f3d0dae-4f8fv 0/1 Completed 0 4d3h rook-ceph-osd-prepare-c1fc0f829c0fe6db017d50cbe304d1b7-qlzgc 0/1 Completed 0 4d3h rook-ceph-osd-prepare-db2d67412944f23ce104c640e62a289a-f6rwd 0/1 Completed 0 4d3h rook-ceph-osd-prepare-worker-0-data-0ppwll-x4t59 0/1 Completed 0 4d2h rook-ceph-osd-prepare-worker-1-data-0wt9rm-wvx4j 0/1 Completed 0 4d2h rook-ceph-osd-prepare-worker-2-data-047mm9-gvdh7 0/1 Completed 0 4d2h rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-55d87b5jr9fj 2/2 Running 0 4d3h rook-ceph-tools-746d95679-z84c2 1/1 Running 0 4d3h ux-backend-server-b7f97d97b-xlfhd 2/2 Running 0 5d22h [root@rdr-odfpool-bastion-0 ~]# PVC and PV: -------- [root@rdr-odfpool-bastion-0 ~]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-3c2d5675-0a5d-4283-aafb-d0c26dec1032 50Gi RWO ocs-storagecluster-ceph-rbd 4d3h ocs-deviceset-localblock-0-data-07g2nk Bound local-pv-3efe86e9 500Gi RWO localblock 4d3h ocs-deviceset-localblock-0-data-1s6js6 Bound local-pv-da8faa7a 500Gi RWO localblock 4d3h ocs-deviceset-localblock-0-data-2p9n4h Bound local-pv-603ff5c5 500Gi RWO localblock 4d3h worker-0-data-0ppwll Bound local-pv-9603076d 500Gi RWO localblock 4d2h worker-1-data-0wt9rm Bound local-pv-c03b4c4b 500Gi RWO localblock 4d2h worker-2-data-047mm9 Bound local-pv-95556960 500Gi RWO localblock 4d2h [root@rdr-odfpool-bastion-0 ~]# oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-3efe86e9 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-07g2nk localblock 5d8h local-pv-603ff5c5 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-2p9n4h localblock 5d8h local-pv-95556960 500Gi RWO Delete Bound openshift-storage/worker-2-data-047mm9 localblock 4d2h local-pv-9603076d 500Gi RWO Delete Bound openshift-storage/worker-0-data-0ppwll localblock 4d2h local-pv-c03b4c4b 500Gi RWO Delete Bound openshift-storage/worker-1-data-0wt9rm localblock 4d2h local-pv-da8faa7a 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-1s6js6 localblock 5d8h pvc-3c2d5675-0a5d-4283-aafb-d0c26dec1032 50Gi RWO Delete Bound openshift-storage/db-noobaa-db-pg-0 ocs-storagecluster-ceph-rbd 4d3h pvc-b804b41e-4671-4db2-992a-ae4b04ea7121 1Gi RWO Delete Bound test/non-resilient-rbd-pvc ocs-storagecluster-ceph-non-resilient-rbd 3d7h [root@rdr-odfpool-bastion-0 ~]# oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE localblock kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 5d8h ocs-storagecluster-ceph-non-resilient-rbd openshift-storage.rbd.csi.ceph.com Delete WaitForFirstConsumer true 4d2h ocs-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 4d3h ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 4d3h ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 4d3h openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 4d3h Storagecluster yaml: [root@rdr-odfpool-bastion-0 ~]# oc get storagecluster -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: annotations: cluster.ocs.openshift.io/local-devices: "true" uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful creationTimestamp: "2024-01-20T12:34:37Z" finalizers: - storagecluster.ocs.openshift.io generation: 4 name: ocs-storagecluster namespace: openshift-storage ownerReferences: - apiVersion: odf.openshift.io/v1alpha1 kind: StorageSystem name: ocs-storagecluster-storagesystem uid: 62f2cd5f-d4ac-4907-b275-4b26f5a7def0 resourceVersion: "4611540" uid: ec252bc9-1608-41d1-93fe-c28d59774d5d spec: arbiter: {} enableCephTools: true encryption: kms: {} externalStorage: {} flexibleScaling: true managedResources: cephBlockPools: {} cephCluster: {} cephConfig: {} cephDashboard: {} cephFilesystems: {} cephNonResilientPools: enable: true cephObjectStoreUsers: {} cephObjectStores: {} cephRBDMirror: daemonCount: 1 cephToolbox: {} mirroring: {} monDataDirHostPath: /var/lib/rook network: connections: encryption: {} multiClusterService: {} nodeTopologies: {} resourceProfile: balanced storageDeviceSets: - config: {} count: 3 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: "1" storageClassName: localblock volumeMode: Block status: {} name: ocs-deviceset-localblock placement: {} preparePlacement: {} replica: 1 resources: {} status: conditions: - lastHeartbeatTime: "2024-01-20T12:34:39Z" lastTransitionTime: "2024-01-20T12:34:39Z" message: Version check successful reason: VersionMatched status: "False" type: VersionMismatch - lastHeartbeatTime: "2024-01-24T16:01:11Z" lastTransitionTime: "2024-01-24T10:59:45Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete - lastHeartbeatTime: "2024-01-24T16:01:11Z" lastTransitionTime: "2024-01-20T12:39:37Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Available - lastHeartbeatTime: "2024-01-24T16:01:11Z" lastTransitionTime: "2024-01-20T13:47:42Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Progressing - lastHeartbeatTime: "2024-01-24T16:01:11Z" lastTransitionTime: "2024-01-20T12:39:37Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Degraded - lastHeartbeatTime: "2024-01-24T16:01:11Z" lastTransitionTime: "2024-01-20T13:47:42Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Upgradeable currentMonCount: 3 failureDomain: host failureDomainKey: kubernetes.io/hostname failureDomainValues: - worker-0 - worker-1 - worker-2 images: ceph: actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775 desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775 noobaaCore: actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f noobaaDB: actualImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293 desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293 kmsServerConnection: {} lastAppliedResourceProfile: balanced nodeTopologies: labels: kubernetes.io/hostname: - worker-0 - worker-1 - worker-2 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster namespace: openshift-storage resourceVersion: "4611282" uid: 41f09d90-5f41-44f6-b361-6fac57336dd1 - apiVersion: noobaa.io/v1alpha1 kind: NooBaa name: noobaa namespace: openshift-storage resourceVersion: "4611533" uid: 18cb4ced-a6c5-4eca-8bf6-7a502b50d006 version: 4.15.0 kind: List metadata: resourceVersion: "" Ceph health:: [root@rdr-odfpool-bastion-0 ~]# oc rsh rook-ceph-tools-746d95679-z84c2 sh-5.1$ sh-5.1$ ceph -s cluster: id: b44987ec-bf85-4ce8-9fd1-98f94f0abb6b health: HEALTH_WARN 3 pool(s) have no replicas configured services: mon: 3 daemons, quorum a,b,c (age 2h) mgr: a(active, since 3d), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 6 osds: 6 up (since 3d), 6 in (since 3d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 15 pools, 188 pgs objects: 568 objects, 405 MiB usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail pgs: 188 active+clean io: client: 1.4 KiB/s rd, 6.0 KiB/s wr, 2 op/s rd, 0 op/s wr sh-5.1$
The warning is about pools having no redundancy(which is the case in replica-1). So when enabling replica-1 we set a value to suppress this warning mon_warn_on_pool_no_redundancy = false It's working when replica-1 is enabled from the beginning but seems like the value is not being set properly when the storagecluster is patched to enable the replica-1 feature.
With ODF build 4.15.0-143.stable I am still seeing the same warning message. [root@rdr-rhcs-bastion-0 ~]# oc rsh rook-ceph-tools-55584dc469-z76fm sh-5.1$ ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 1.5 TiB 1.5 TiB 435 MiB 435 MiB 0.03 worker-0 500 GiB 500 GiB 103 MiB 103 MiB 0.02 worker-1 500 GiB 500 GiB 96 MiB 96 MiB 0.02 worker-2 500 GiB 500 GiB 75 MiB 75 MiB 0.01 TOTAL 2.9 TiB 2.9 TiB 710 MiB 710 MiB 0.02 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL ocs-storagecluster-cephblockpool 1 32 124 MiB 85 372 MiB 0.01 850 GiB .mgr 2 1 961 KiB 2 2.8 MiB 0 850 GiB .rgw.root 3 8 5.8 KiB 16 180 KiB 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.buckets.index 4 8 0 B 11 0 B 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.log 5 8 25 KiB 308 1.9 MiB 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 6 8 0 B 0 0 B 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.otp 7 8 0 B 0 0 B 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.meta 8 8 2.8 KiB 14 144 KiB 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.control 9 8 0 B 8 0 B 0 850 GiB ocs-storagecluster-cephfilesystem-metadata 10 16 8.7 MiB 26 26 MiB 0 850 GiB ocs-storagecluster-cephobjectstore.rgw.buckets.data 11 32 1 KiB 1 12 KiB 0 850 GiB ocs-storagecluster-cephfilesystem-data0 12 32 0 B 0 0 B 0 850 GiB ocs-storagecluster-cephblockpool-worker-0 13 1 19 B 1 4 KiB 0 425 GiB ocs-storagecluster-cephblockpool-worker-1 14 1 19 B 1 4 KiB 0 425 GiB ocs-storagecluster-cephblockpool-worker-2 15 1 19 B 1 4 KiB 0 425 GiB sh-5.1$ sh-5.1$ sh-5.1$ ceph osd pool ls ocs-storagecluster-cephblockpool .mgr .rgw.root ocs-storagecluster-cephobjectstore.rgw.buckets.index ocs-storagecluster-cephobjectstore.rgw.log ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec ocs-storagecluster-cephobjectstore.rgw.otp ocs-storagecluster-cephobjectstore.rgw.meta ocs-storagecluster-cephobjectstore.rgw.control ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephobjectstore.rgw.buckets.data ocs-storagecluster-cephfilesystem-data0 ocs-storagecluster-cephblockpool-worker-0 ocs-storagecluster-cephblockpool-worker-1 ocs-storagecluster-cephblockpool-worker-2 sh-5.1$ sh-5.1$ ceph osd pool ls detail pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 132 lfor 0/0/40 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd pool 2 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 12 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 10 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 134 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 4 'ocs-storagecluster-cephobjectstore.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 13 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 5 'ocs-storagecluster-cephobjectstore.rgw.log' replicated size 3 min_size 2 crush_rule 12 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 132 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 6 'ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 17 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 7 'ocs-storagecluster-cephobjectstore.rgw.otp' replicated size 3 min_size 2 crush_rule 15 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 134 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 8 'ocs-storagecluster-cephobjectstore.rgw.meta' replicated size 3 min_size 2 crush_rule 16 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 9 'ocs-storagecluster-cephobjectstore.rgw.control' replicated size 3 min_size 2 crush_rule 14 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw pool 10 'ocs-storagecluster-cephfilesystem-metadata' replicated size 3 min_size 2 crush_rule 18 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 136 lfor 0/0/40 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 11 'ocs-storagecluster-cephobjectstore.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 22 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 135 lfor 0/0/42 flags hashpspool stripe_width 0 target_size_ratio 0.49 application rook-ceph-rgw pool 12 'ocs-storagecluster-cephfilesystem-data0' replicated size 3 min_size 2 crush_rule 21 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 137 lfor 0/0/42 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs pool 13 'ocs-storagecluster-cephblockpool-worker-0' replicated size 1 min_size 1 crush_rule 24 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 188 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 14 'ocs-storagecluster-cephblockpool-worker-1' replicated size 1 min_size 1 crush_rule 26 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 162 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 15 'ocs-storagecluster-cephblockpool-worker-2' replicated size 1 min_size 1 crush_rule 28 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 172 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd sh-5.1$ sh-5.1$ sh-5.1$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.92978 root default -7 0.97659 host worker-0 2 ssd 0.48830 osd.2 up 1.00000 1.00000 3 worker-0 0.48830 osd.3 up 1.00000 1.00000 -3 0.97659 host worker-1 0 ssd 0.48830 osd.0 up 1.00000 1.00000 4 worker-1 0.48830 osd.4 up 1.00000 1.00000 -5 0.97659 host worker-2 1 ssd 0.48830 osd.1 up 1.00000 1.00000 5 worker-2 0.48830 osd.5 up 1.00000 1.00000 sh-5.1$ sh-5.1$ sh-5.1$ ceph health HEALTH_WARN 3 pool(s) have no replicas configured sh-5.1$ sh-5.1$ ceph -s cluster: id: fc753d71-9791-43ff-9f43-67e1ef84c32c health: HEALTH_WARN 3 pool(s) have no replicas configured services: mon: 3 daemons, quorum b,c,d (age 95m) mgr: a(active, since 6h), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 6 osds: 6 up (since 12m), 6 in (since 13m) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 15 pools, 172 pgs objects: 474 objects, 199 MiB usage: 716 MiB used, 2.9 TiB / 2.9 TiB avail pgs: 172 active+clean io: client: 1023 B/s rd, 1023 B/s wr, 1 op/s rd, 0 op/s wr sh-5.1$ pods: [root@rdr-rhcs-bastion-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-75c885b44c-pw2d7 2/2 Running 0 4h43m csi-cephfsplugin-7z8ww 2/2 Running 0 6h44m csi-cephfsplugin-bzzj5 2/2 Running 1 (6h44m ago) 6h44m csi-cephfsplugin-m6778 2/2 Running 0 85m csi-cephfsplugin-provisioner-7bfdcdd855-n692v 6/6 Running 0 4h43m csi-cephfsplugin-provisioner-7bfdcdd855-v957v 6/6 Running 0 6h44m csi-rbdplugin-867wr 3/3 Running 0 3m48s csi-rbdplugin-provisioner-d5c8c7cc4-pwqcg 6/6 Running 0 3m48s csi-rbdplugin-provisioner-d5c8c7cc4-wd4sw 6/6 Running 0 3m48s csi-rbdplugin-s7xlp 3/3 Running 0 3m42s csi-rbdplugin-shqz6 3/3 Running 0 3m45s noobaa-core-0 1/1 Running 0 6h40m noobaa-db-pg-0 1/1 Running 0 6h41m noobaa-endpoint-7b8bff5fd4-95ht9 1/1 Running 0 2m52s noobaa-operator-6d5c65dc7d-c6szh 1/1 Running 0 7h28m ocs-metrics-exporter-5d54875b4c-h2t9f 1/1 Running 0 6h41m ocs-operator-765f85d7fc-xw7d9 1/1 Running 0 7h28m odf-console-66fff9846-ljs7j 1/1 Running 0 7h28m odf-operator-controller-manager-86f9787f9-t4zmb 2/2 Running 0 7h28m rook-ceph-crashcollector-worker-0-77cff6b86c-c92gz 1/1 Running 0 6h42m rook-ceph-crashcollector-worker-1-7559dc47dd-944dj 1/1 Running 0 6h42m rook-ceph-crashcollector-worker-2-7d87957c55-l8fcp 1/1 Running 0 85m rook-ceph-exporter-worker-0-544c48b7b8-tch9l 1/1 Running 0 6h41m rook-ceph-exporter-worker-1-79d446fcdc-clvgq 1/1 Running 0 6h42m rook-ceph-exporter-worker-2-66bb46f844-mxmz4 1/1 Running 0 85m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-849755f49qxt6 2/2 Running 8 (4h19m ago) 6h42m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5b887978klhls 2/2 Running 8 (4h19m ago) 4h43m rook-ceph-mgr-a-7cd4fcb6cc-78pp2 3/3 Running 0 6h43m rook-ceph-mgr-b-75cb574fcd-8kqfj 3/3 Running 0 4h43m rook-ceph-mon-b-6fdcdd944d-tgvgr 2/2 Running 0 6h43m rook-ceph-mon-c-58db847b4d-b5ltj 2/2 Running 0 6h43m rook-ceph-mon-d-56f99d4b58-9vchj 2/2 Running 0 85m rook-ceph-operator-64d86f55fc-b9sfk 1/1 Running 0 3m56s rook-ceph-osd-0-569b444b46-ht7w6 2/2 Running 0 6h42m rook-ceph-osd-1-5b7f86586d-v76bz 2/2 Running 0 4h43m rook-ceph-osd-2-5676fc5d55-6dmhj 2/2 Running 0 6h42m rook-ceph-osd-3-5c68b4c658-n2t8h 2/2 Running 0 2m52s rook-ceph-osd-4-78b4b4fd74-dktp8 2/2 Running 0 2m51s rook-ceph-osd-5-77c7b58f8f-s9qmj 2/2 Running 0 2m50s rook-ceph-osd-prepare-0721f34325c9b3d7c7ac6da4f641f80c-g7fmz 0/1 Completed 0 6h43m rook-ceph-osd-prepare-d99b2443ee3d5b4d4acdb8773b4acc55-5b25p 0/1 Completed 0 6h43m rook-ceph-osd-prepare-worker-0-data-0qrzq2-plxpj 0/1 Completed 0 3m4s rook-ceph-osd-prepare-worker-1-data-0tzdxh-cw6d8 0/1 Completed 0 3m4s rook-ceph-osd-prepare-worker-2-data-0gnv62-pg9vs 0/1 Completed 0 3m3s rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-fbdf74d54w9t 2/2 Running 0 6h42m rook-ceph-tools-55584dc469-z76fm 1/1 Running 0 4h43m ux-backend-server-fc45c47-9dsd8 2/2 Running 0 7h28m SC, PVC, PV, Cephblockpool::: [root@rdr-rhcs-bastion-0 ~]# oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE localblock kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 7h7m ocs-storagecluster-ceph-non-resilient-rbd openshift-storage.rbd.csi.ceph.com Delete WaitForFirstConsumer true 1m ocs-storagecluster-ceph-rbd (default) openshift-storage.rbd.csi.ceph.com Delete Immediate true 6h58m ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 7h1m ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 6h58m openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 6h56m [root@rdr-rhcs-bastion-0 ~]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-f938ca4c-b062-4c91-adab-74fcfe91f7d0 50Gi RWO ocs-storagecluster-ceph-rbd 6h38m ocs-deviceset-localblock-0-data-0tm2tx Bound local-pv-2eddb94 500Gi RWO localblock 6h40m ocs-deviceset-localblock-0-data-1bwnz9 Bound local-pv-191c364e 500Gi RWO localblock 6h40m ocs-deviceset-localblock-0-data-24m7hl Bound local-pv-1ed0eb36 500Gi RWO localblock 6h40m worker-0-data-0qrzq2 Bound local-pv-a3a590cd 500Gi RWO localblock 8s worker-1-data-0tzdxh Bound local-pv-a6199fcf 500Gi RWO localblock 8s worker-2-data-0gnv62 Bound local-pv-e58bded7 500Gi RWO localblock 8s [root@rdr-rhcs-bastion-0 ~]# [root@rdr-rhcs-bastion-0 ~]# [root@rdr-rhcs-bastion-0 ~]# oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-191c364e 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-1bwnz9 localblock 6h48m local-pv-1ed0eb36 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-24m7hl localblock 6h48m local-pv-2eddb94 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-0tm2tx localblock 6h48m local-pv-a3a590cd 500Gi RWO Delete Bound openshift-storage/worker-0-data-0qrzq2 localblock 3m16s local-pv-a6199fcf 500Gi RWO Delete Bound openshift-storage/worker-1-data-0tzdxh localblock 3m16s local-pv-e58bded7 500Gi RWO Delete Bound openshift-storage/worker-2-data-0gnv62 localblock 3m16s pvc-f938ca4c-b062-4c91-adab-74fcfe91f7d0 50Gi RWO Delete Bound openshift-storage/db-noobaa-db-pg-0 ocs-storagecluster-ceph-rbd 6h38m [root@rdr-rhcs-bastion-0 ~]# oc get cephblockpools NAME PHASE ocs-storagecluster-cephblockpool Ready ocs-storagecluster-cephblockpool-worker-0 Ready ocs-storagecluster-cephblockpool-worker-1 Ready ocs-storagecluster-cephblockpool-worker-2 Ready
Can you please share the output of oc get cm rook-config-override -o yaml
[root@rdr-rhcs-bastion-0 ~]# oc get cm rook-config-override -o yaml apiVersion: v1 data: config: | [global] bdev_flock_retry = 20 mon_osd_full_ratio = .85 mon_osd_backfillfull_ratio = .8 mon_osd_nearfull_ratio = .75 mon_max_pg_per_osd = 600 mon_pg_warn_max_object_skew = 0 mon_data_avail_warn = 15 bluestore_prefer_deferred_size_hdd = 0 mon_warn_on_pool_no_redundancy = false [osd] osd_memory_target_cgroup_limit_ratio = 0.8 kind: ConfigMap metadata: creationTimestamp: "2024-02-19T12:20:26Z" name: rook-config-override namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1 blockOwnerDeletion: true controller: true kind: StorageCluster name: ocs-storagecluster uid: bb0c3885-edc0-480a-a3ee-c2ebe5d6fd6e resourceVersion: "3310331" uid: 38f2e840-4596-4e5d-9e24-001b16bfa29c
Hi Travis, as you can see despite the mon_warn_on_pool_no_redundancy = false value being present in the CM, we are still seeing the warning about no redundancy. Can you please take a look.
The CM is only applied to the mons when the mons restart. But when the feature is enabled, we obviously don't want to restart the mons just to apply this setting. I'd suggest we always suppress this warning for all ODF clusters, since we anyway only expect users to use replica 1 pools when they have been configured properly through the non-resilient feature.
Moving it to ASSIGNED state as it FAILED_QA
Raised a follow up fix as per https://bugzilla.redhat.com/show_bug.cgi?id=2260131#c12, This time have tested it myself. Can't see the health warning anymore, so should be good.
Re-tested Replica-1 with ODF build: 4.15.0-150 and I can't see Warning in my cluster. Ceph health shows HEALTH_OK [root@rdr-replica-bastion-0 ~]# oc get pods -n openshift-storage |grep osd rook-ceph-osd-0-69f99cbb47-2s95x 2/2 Running 0 119m rook-ceph-osd-1-897cf8687-r6qb9 2/2 Running 0 119m rook-ceph-osd-2-864ccff67-xp4qf 2/2 Running 0 119m rook-ceph-osd-3-9dfd577f7-vbj6g 2/2 Running 0 35m rook-ceph-osd-4-8667cdf5cb-m4fmb 2/2 Running 0 35m rook-ceph-osd-5-5c48784794-4kkrj 2/2 Running 0 35m rook-ceph-osd-prepare-1c083b3e5a996b47de7615107ffa6d71-k7pmj 0/1 Completed 0 119m rook-ceph-osd-prepare-c04ab62c9f6cb3ea614ed610d70f056d-d9762 0/1 Completed 0 119m rook-ceph-osd-prepare-cf050d3c7600d9718336045378c2c4fd-tnszq 0/1 Completed 0 119m rook-ceph-osd-prepare-worker-0-data-07gxk5-6twbd 0/1 Completed 0 35m rook-ceph-osd-prepare-worker-1-data-0r9qjx-h662d 0/1 Completed 0 35m rook-ceph-osd-prepare-worker-2-data-044vhw-nq6pd 0/1 Completed 0 35m [root@rdr-replica-bastion-0 ~]# oc get cephblockpools NAME PHASE ocs-storagecluster-cephblockpool Ready ocs-storagecluster-cephblockpool-worker-0 Ready ocs-storagecluster-cephblockpool-worker-1 Ready ocs-storagecluster-cephblockpool-worker-2 Ready [root@rdr-replica-bastion-0 ~]# oc rsh rook-ceph-tools-dbddf8896-jt9kn sh-5.1$ sh-5.1$ ceph health HEALTH_OK sh-5.1$ sh-5.1$ ceph -s cluster: id: af365cd2-27f2-49ea-a47f-8a185a4adc15 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 2h) mgr: a(active, since 2h), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 6 osds: 6 up (since 37m), 6 in (since 38m) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 15 pools, 172 pgs objects: 461 objects, 162 MiB usage: 472 MiB used, 2.9 TiB / 2.9 TiB avail pgs: 172 active+clean io: client: 1.2 KiB/s rd, 1.7 KiB/s wr, 2 op/s rd, 0 op/s wr sh-5.1$ ceph osd pool ls .mgr ocs-storagecluster-cephblockpool ocs-storagecluster-cephobjectstore.rgw.otp ocs-storagecluster-cephobjectstore.rgw.buckets.index .rgw.root ocs-storagecluster-cephobjectstore.rgw.log ocs-storagecluster-cephobjectstore.rgw.control ocs-storagecluster-cephobjectstore.rgw.meta ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephobjectstore.rgw.buckets.data ocs-storagecluster-cephfilesystem-data0 ocs-storagecluster-cephblockpool-worker-0 ocs-storagecluster-cephblockpool-worker-1 ocs-storagecluster-cephblockpool-worker-2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383