Bug 2260131 - Health is going in Warning state after patching the storagecluster for replica-1 in ODF4.15 on IBM Power cluster
Summary: Health is going in Warning state after patching the storagecluster for replic...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.15
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ODF 4.15.0
Assignee: Malay Kumar parida
QA Contact: Aviad Polak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-24 15:58 UTC by Aaruni Aggarwal
Modified: 2024-03-19 15:32 UTC (History)
4 users (show)

Fixed In Version: 4.15.0-149
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:32:16 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2425 0 None open Bug 2260131: [release-4.15] Refactor the ensureCreated func for rook-config-override configmap 2024-01-25 11:08:02 UTC
Github red-hat-storage ocs-operator pull 2473 0 None open Add mon_warn_on_pool_no_redundancy option to rook-config-override always 2024-02-21 08:04:13 UTC
Github red-hat-storage ocs-operator pull 2474 0 None open Bug 2260131: [release-4.15] Add mon_warn_on_pool_no_redundancy option to rook-config-override always 2024-02-21 09:28:34 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:32:20 UTC

Description Aaruni Aggarwal 2024-01-24 15:58:28 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Health is going in Warning state after patching the storagecluster for replica-1 in ODF4.15 on IBM Power cluster

Version of all relevant components (if applicable):

OCP: 4.15
ODF: 4.15.0-120

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create OCP4.15 cluster. Install LSO4.15 and ODF4.15 cluster.
2. Create localvolume with one disk and Create storagesystem.
3. Update localvolume with additional disk which will create 3 new PVs. 
4. Patch storagecluster yaml using following command: 

oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephNonResilientPools/enable", "value": true }]'


Actual results:
Ceph health is in Warning state. 

Expected results:
Ceph health should be in OK state.

Additional info:

Comment 2 Aaruni Aggarwal 2024-01-24 16:02:05 UTC
CSV: 


[root@rdr-odfpool-bastion-0 ~]# oc get csv -A
NAMESPACE                              NAME                                          DISPLAY                       VERSION               REPLACES   PHASE
openshift-local-storage                local-storage-operator.v4.15.0-202311280332   Local Storage                 4.15.0-202311280332              Succeeded
openshift-operator-lifecycle-manager   packageserver                                 Package Server                0.0.1-snapshot                   Succeeded
openshift-storage                      mcg-operator.v4.15.0-120.stable               NooBaa Operator               4.15.0-120.stable                Succeeded
openshift-storage                      ocs-operator.v4.15.0-120.stable               OpenShift Container Storage   4.15.0-120.stable                Succeeded
openshift-storage                      odf-csi-addons-operator.v4.15.0-120.stable    CSI Addons                    4.15.0-120.stable                Succeeded
openshift-storage                      odf-operator.v4.15.0-120.stable               OpenShift Data Foundation     4.15.0-120.stable                Succeeded
[root@rdr-odfpool-bastion-0 ~]# 

Pods: 

[root@rdr-odfpool-bastion-0 ~]# oc get pods
NAME                                                              READY   STATUS      RESTARTS       AGE
csi-addons-controller-manager-58d5498995-8wlsp                    2/2     Running     15 (10h ago)   5d22h
csi-cephfsplugin-92wts                                            2/2     Running     0              4d3h
csi-cephfsplugin-jnpdd                                            2/2     Running     0              4d3h
csi-cephfsplugin-provisioner-747587df87-bb86z                     6/6     Running     0              4d3h
csi-cephfsplugin-provisioner-747587df87-nkzns                     6/6     Running     0              4d3h
csi-cephfsplugin-w426w                                            2/2     Running     1 (4d3h ago)   4d3h
csi-rbdplugin-lxc7q                                               3/3     Running     0              4d2h
csi-rbdplugin-provisioner-7b7c74c7dd-fms6z                        6/6     Running     0              4d2h
csi-rbdplugin-provisioner-7b7c74c7dd-x8dbc                        6/6     Running     0              4d2h
csi-rbdplugin-rx8tb                                               3/3     Running     0              4d2h
csi-rbdplugin-z78mb                                               3/3     Running     0              4d2h
noobaa-core-0                                                     1/1     Running     0              4d3h
noobaa-db-pg-0                                                    1/1     Running     0              4d3h
noobaa-endpoint-6485c65647-n9btr                                  1/1     Running     0              4d3h
noobaa-operator-6d7d5b477-mn95n                                   2/2     Running     0              5d22h
ocs-metrics-exporter-67dc65cbcb-fp56t                             1/1     Running     0              4d3h
ocs-operator-5cb4f78cb6-l9t97                                     1/1     Running     13 (8h ago)    5d22h
odf-console-6b58b9fdd7-9stzt                                      1/1     Running     0              5d22h
odf-operator-controller-manager-7857965fbc-j2246                  2/2     Running     13 (8h ago)    5d22h
rook-ceph-crashcollector-worker-0-5f8d4944-46mj9                  1/1     Running     0              4d3h
rook-ceph-crashcollector-worker-1-6bbfd975f9-sklt6                1/1     Running     0              4d3h
rook-ceph-crashcollector-worker-2-5c4f9ddbfd-w6r7x                1/1     Running     0              4d3h
rook-ceph-exporter-worker-0-bdb959b6d-5nbqp                       1/1     Running     0              4d3h
rook-ceph-exporter-worker-1-576fd75979-d56wz                      1/1     Running     0              4d3h
rook-ceph-exporter-worker-2-d94f78766-t8cqs                       1/1     Running     0              4d3h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7db974dbj5bft   2/2     Running     0              4d3h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6bbcbb958vrf5   2/2     Running     0              4d3h
rook-ceph-mgr-a-6f77cff446-5dphm                                  3/3     Running     0              4d3h
rook-ceph-mgr-b-6cb75f8f57-ddvfr                                  3/3     Running     0              4d3h
rook-ceph-mon-a-b84b6c548-qdd7g                                   2/2     Running     0              4d3h
rook-ceph-mon-b-5694c6cb74-bfsng                                  2/2     Running     0              4d3h
rook-ceph-mon-c-57c6dc4b8c-4lbx9                                  2/2     Running     0              4d3h
rook-ceph-operator-8679b956f6-qnqcr                               1/1     Running     0              4d2h
rook-ceph-osd-0-d8b5d68b9-wcwcv                                   2/2     Running     0              4d3h
rook-ceph-osd-1-dc5f84dc5-n4rhk                                   2/2     Running     0              4d3h
rook-ceph-osd-2-597c7495b6-zn9xp                                  2/2     Running     0              4d3h
rook-ceph-osd-3-6fcbf6997b-thbmz                                  2/2     Running     0              4d2h
rook-ceph-osd-4-57c898cb94-mdmjx                                  2/2     Running     0              4d2h
rook-ceph-osd-5-84bdc44c49-g6s2c                                  2/2     Running     0              4d2h
rook-ceph-osd-prepare-3ab83856ecd22df4394b63644f3d0dae-4f8fv      0/1     Completed   0              4d3h
rook-ceph-osd-prepare-c1fc0f829c0fe6db017d50cbe304d1b7-qlzgc      0/1     Completed   0              4d3h
rook-ceph-osd-prepare-db2d67412944f23ce104c640e62a289a-f6rwd      0/1     Completed   0              4d3h
rook-ceph-osd-prepare-worker-0-data-0ppwll-x4t59                  0/1     Completed   0              4d2h
rook-ceph-osd-prepare-worker-1-data-0wt9rm-wvx4j                  0/1     Completed   0              4d2h
rook-ceph-osd-prepare-worker-2-data-047mm9-gvdh7                  0/1     Completed   0              4d2h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-55d87b5jr9fj   2/2     Running     0              4d3h
rook-ceph-tools-746d95679-z84c2                                   1/1     Running     0              4d3h
ux-backend-server-b7f97d97b-xlfhd                                 2/2     Running     0              5d22h
[root@rdr-odfpool-bastion-0 ~]# 


PVC and PV: 
--------

[root@rdr-odfpool-bastion-0 ~]# oc get pvc
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                        Bound    pvc-3c2d5675-0a5d-4283-aafb-d0c26dec1032   50Gi       RWO            ocs-storagecluster-ceph-rbd   4d3h
ocs-deviceset-localblock-0-data-07g2nk   Bound    local-pv-3efe86e9                          500Gi      RWO            localblock                    4d3h
ocs-deviceset-localblock-0-data-1s6js6   Bound    local-pv-da8faa7a                          500Gi      RWO            localblock                    4d3h
ocs-deviceset-localblock-0-data-2p9n4h   Bound    local-pv-603ff5c5                          500Gi      RWO            localblock                    4d3h
worker-0-data-0ppwll                     Bound    local-pv-9603076d                          500Gi      RWO            localblock                    4d2h
worker-1-data-0wt9rm                     Bound    local-pv-c03b4c4b                          500Gi      RWO            localblock                    4d2h
worker-2-data-047mm9                     Bound    local-pv-95556960                          500Gi      RWO            localblock                    4d2h

[root@rdr-odfpool-bastion-0 ~]# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                      STORAGECLASS                                REASON   AGE
local-pv-3efe86e9                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-07g2nk   localblock                                           5d8h
local-pv-603ff5c5                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-2p9n4h   localblock                                           5d8h
local-pv-95556960                          500Gi      RWO            Delete           Bound    openshift-storage/worker-2-data-047mm9                     localblock                                           4d2h
local-pv-9603076d                          500Gi      RWO            Delete           Bound    openshift-storage/worker-0-data-0ppwll                     localblock                                           4d2h
local-pv-c03b4c4b                          500Gi      RWO            Delete           Bound    openshift-storage/worker-1-data-0wt9rm                     localblock                                           4d2h
local-pv-da8faa7a                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-1s6js6   localblock                                           5d8h
pvc-3c2d5675-0a5d-4283-aafb-d0c26dec1032   50Gi       RWO            Delete           Bound    openshift-storage/db-noobaa-db-pg-0                        ocs-storagecluster-ceph-rbd                          4d3h
pvc-b804b41e-4671-4db2-992a-ae4b04ea7121   1Gi        RWO            Delete           Bound    test/non-resilient-rbd-pvc                                 ocs-storagecluster-ceph-non-resilient-rbd            3d7h
 
[root@rdr-odfpool-bastion-0 ~]# oc get sc
NAME                                        PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
localblock                                  kubernetes.io/no-provisioner            Delete          WaitForFirstConsumer   false                  5d8h
ocs-storagecluster-ceph-non-resilient-rbd   openshift-storage.rbd.csi.ceph.com      Delete          WaitForFirstConsumer   true                   4d2h
ocs-storagecluster-ceph-rbd                 openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   4d3h
ocs-storagecluster-ceph-rgw                 openshift-storage.ceph.rook.io/bucket   Delete          Immediate              false                  4d3h
ocs-storagecluster-cephfs                   openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   4d3h
openshift-storage.noobaa.io                 openshift-storage.noobaa.io/obc         Delete          Immediate              false                  4d3h


Storagecluster yaml: 

[root@rdr-odfpool-bastion-0 ~]# oc get storagecluster -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
  kind: StorageCluster
  metadata:
    annotations:
      cluster.ocs.openshift.io/local-devices: "true"
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
    creationTimestamp: "2024-01-20T12:34:37Z"
    finalizers:
    - storagecluster.ocs.openshift.io
    generation: 4
    name: ocs-storagecluster
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: odf.openshift.io/v1alpha1
      kind: StorageSystem
      name: ocs-storagecluster-storagesystem
      uid: 62f2cd5f-d4ac-4907-b275-4b26f5a7def0
    resourceVersion: "4611540"
    uid: ec252bc9-1608-41d1-93fe-c28d59774d5d
  spec:
    arbiter: {}
    enableCephTools: true
    encryption:
      kms: {}
    externalStorage: {}
    flexibleScaling: true
    managedResources:
      cephBlockPools: {}
      cephCluster: {}
      cephConfig: {}
      cephDashboard: {}
      cephFilesystems: {}
      cephNonResilientPools:
        enable: true
      cephObjectStoreUsers: {}
      cephObjectStores: {}
      cephRBDMirror:
        daemonCount: 1
      cephToolbox: {}
    mirroring: {}
    monDataDirHostPath: /var/lib/rook
    network:
      connections:
        encryption: {}
      multiClusterService: {}
    nodeTopologies: {}
    resourceProfile: balanced
    storageDeviceSets:
    - config: {}
      count: 3
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: "1"
          storageClassName: localblock
          volumeMode: Block
        status: {}
      name: ocs-deviceset-localblock
      placement: {}
      preparePlacement: {}
      replica: 1
      resources: {}
  status:
    conditions:
    - lastHeartbeatTime: "2024-01-20T12:34:39Z"
      lastTransitionTime: "2024-01-20T12:34:39Z"
      message: Version check successful
      reason: VersionMatched
      status: "False"
      type: VersionMismatch
    - lastHeartbeatTime: "2024-01-24T16:01:11Z"
      lastTransitionTime: "2024-01-24T10:59:45Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: ReconcileComplete
    - lastHeartbeatTime: "2024-01-24T16:01:11Z"
      lastTransitionTime: "2024-01-20T12:39:37Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: Available
    - lastHeartbeatTime: "2024-01-24T16:01:11Z"
      lastTransitionTime: "2024-01-20T13:47:42Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "False"
      type: Progressing
    - lastHeartbeatTime: "2024-01-24T16:01:11Z"
      lastTransitionTime: "2024-01-20T12:39:37Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "False"
      type: Degraded
    - lastHeartbeatTime: "2024-01-24T16:01:11Z"
      lastTransitionTime: "2024-01-20T13:47:42Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: Upgradeable
    currentMonCount: 3
    failureDomain: host
    failureDomainKey: kubernetes.io/hostname
    failureDomainValues:
    - worker-0
    - worker-1
    - worker-2
    images:
      ceph:
        actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
        desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
      noobaaCore:
        actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f
        desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f
      noobaaDB:
        actualImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293
        desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293
    kmsServerConnection: {}
    lastAppliedResourceProfile: balanced
    nodeTopologies:
      labels:
        kubernetes.io/hostname:
        - worker-0
        - worker-1
        - worker-2
    phase: Ready
    relatedObjects:
    - apiVersion: ceph.rook.io/v1
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      namespace: openshift-storage
      resourceVersion: "4611282"
      uid: 41f09d90-5f41-44f6-b361-6fac57336dd1
    - apiVersion: noobaa.io/v1alpha1
      kind: NooBaa
      name: noobaa
      namespace: openshift-storage
      resourceVersion: "4611533"
      uid: 18cb4ced-a6c5-4eca-8bf6-7a502b50d006
    version: 4.15.0
kind: List
metadata:
  resourceVersion: ""


Ceph health::

[root@rdr-odfpool-bastion-0 ~]# oc rsh rook-ceph-tools-746d95679-z84c2
sh-5.1$ 
sh-5.1$ ceph -s
  cluster:
    id:     b44987ec-bf85-4ce8-9fd1-98f94f0abb6b
    health: HEALTH_WARN
            3 pool(s) have no replicas configured
 
  services:
    mon: 3 daemons, quorum a,b,c (age 2h)
    mgr: a(active, since 3d), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 3d), 6 in (since 3d)
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   15 pools, 188 pgs
    objects: 568 objects, 405 MiB
    usage:   3.0 GiB used, 2.9 TiB / 2.9 TiB avail
    pgs:     188 active+clean
 
  io:
    client:   1.4 KiB/s rd, 6.0 KiB/s wr, 2 op/s rd, 0 op/s wr
 
sh-5.1$

Comment 3 Malay Kumar parida 2024-01-25 05:28:21 UTC
The warning is about pools having no redundancy(which is the case in replica-1). So when enabling replica-1 we set a value to suppress this warning mon_warn_on_pool_no_redundancy   = false

It's working when replica-1 is enabled from the beginning but seems like the value is not being set properly when the storagecluster is patched to enable the replica-1 feature.

Comment 8 Aaruni Aggarwal 2024-02-19 19:41:40 UTC
With ODF build 4.15.0-143.stable

I am still seeing the same warning message. 

[root@rdr-rhcs-bastion-0 ~]# oc rsh rook-ceph-tools-55584dc469-z76fm

sh-5.1$ ceph df
--- RAW STORAGE ---
CLASS        SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd       1.5 TiB  1.5 TiB  435 MiB   435 MiB       0.03
worker-0  500 GiB  500 GiB  103 MiB   103 MiB       0.02
worker-1  500 GiB  500 GiB   96 MiB    96 MiB       0.02
worker-2  500 GiB  500 GiB   75 MiB    75 MiB       0.01
TOTAL     2.9 TiB  2.9 TiB  710 MiB   710 MiB       0.02

--- POOLS ---
POOL                                                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
ocs-storagecluster-cephblockpool                        1   32  124 MiB       85  372 MiB   0.01    850 GiB
.mgr                                                    2    1  961 KiB        2  2.8 MiB      0    850 GiB
.rgw.root                                               3    8  5.8 KiB       16  180 KiB      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.buckets.index    4    8      0 B       11      0 B      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.log              5    8   25 KiB      308  1.9 MiB      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec   6    8      0 B        0      0 B      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.otp              7    8      0 B        0      0 B      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.meta             8    8  2.8 KiB       14  144 KiB      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.control          9    8      0 B        8      0 B      0    850 GiB
ocs-storagecluster-cephfilesystem-metadata             10   16  8.7 MiB       26   26 MiB      0    850 GiB
ocs-storagecluster-cephobjectstore.rgw.buckets.data    11   32    1 KiB        1   12 KiB      0    850 GiB
ocs-storagecluster-cephfilesystem-data0                12   32      0 B        0      0 B      0    850 GiB
ocs-storagecluster-cephblockpool-worker-0              13    1     19 B        1    4 KiB      0    425 GiB
ocs-storagecluster-cephblockpool-worker-1              14    1     19 B        1    4 KiB      0    425 GiB
ocs-storagecluster-cephblockpool-worker-2              15    1     19 B        1    4 KiB      0    425 GiB
sh-5.1$
sh-5.1$
sh-5.1$ ceph osd pool ls
ocs-storagecluster-cephblockpool
.mgr
.rgw.root
ocs-storagecluster-cephobjectstore.rgw.buckets.index
ocs-storagecluster-cephobjectstore.rgw.log
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec
ocs-storagecluster-cephobjectstore.rgw.otp
ocs-storagecluster-cephobjectstore.rgw.meta
ocs-storagecluster-cephobjectstore.rgw.control
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephobjectstore.rgw.buckets.data
ocs-storagecluster-cephfilesystem-data0
ocs-storagecluster-cephblockpool-worker-0
ocs-storagecluster-cephblockpool-worker-1
ocs-storagecluster-cephblockpool-worker-2
sh-5.1$
sh-5.1$ ceph osd pool ls detail
pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 132 lfor 0/0/40 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd
pool 2 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 12 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 10 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 134 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 4 'ocs-storagecluster-cephobjectstore.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 13 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 5 'ocs-storagecluster-cephobjectstore.rgw.log' replicated size 3 min_size 2 crush_rule 12 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 132 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 6 'ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 17 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 7 'ocs-storagecluster-cephobjectstore.rgw.otp' replicated size 3 min_size 2 crush_rule 15 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 134 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 8 'ocs-storagecluster-cephobjectstore.rgw.meta' replicated size 3 min_size 2 crush_rule 16 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 9 'ocs-storagecluster-cephobjectstore.rgw.control' replicated size 3 min_size 2 crush_rule 14 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 10 'ocs-storagecluster-cephfilesystem-metadata' replicated size 3 min_size 2 crush_rule 18 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 136 lfor 0/0/40 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 11 'ocs-storagecluster-cephobjectstore.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 22 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 135 lfor 0/0/42 flags hashpspool stripe_width 0 target_size_ratio 0.49 application rook-ceph-rgw
pool 12 'ocs-storagecluster-cephfilesystem-data0' replicated size 3 min_size 2 crush_rule 21 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 137 lfor 0/0/42 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs
pool 13 'ocs-storagecluster-cephblockpool-worker-0' replicated size 1 min_size 1 crush_rule 24 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 188 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 14 'ocs-storagecluster-cephblockpool-worker-1' replicated size 1 min_size 1 crush_rule 26 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 162 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 15 'ocs-storagecluster-cephblockpool-worker-2' replicated size 1 min_size 1 crush_rule 28 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 172 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

sh-5.1$
sh-5.1$
sh-5.1$ ceph osd tree
ID  CLASS     WEIGHT   TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1            2.92978  root default
-7            0.97659      host worker-0
 2       ssd  0.48830          osd.2          up   1.00000  1.00000
 3  worker-0  0.48830          osd.3          up   1.00000  1.00000
-3            0.97659      host worker-1
 0       ssd  0.48830          osd.0          up   1.00000  1.00000
 4  worker-1  0.48830          osd.4          up   1.00000  1.00000
-5            0.97659      host worker-2
 1       ssd  0.48830          osd.1          up   1.00000  1.00000
 5  worker-2  0.48830          osd.5          up   1.00000  1.00000
sh-5.1$
sh-5.1$
sh-5.1$ ceph health
HEALTH_WARN 3 pool(s) have no replicas configured
sh-5.1$
sh-5.1$ ceph -s
  cluster:
    id:     fc753d71-9791-43ff-9f43-67e1ef84c32c
    health: HEALTH_WARN
            3 pool(s) have no replicas configured

  services:
    mon: 3 daemons, quorum b,c,d (age 95m)
    mgr: a(active, since 6h), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 12m), 6 in (since 13m)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   15 pools, 172 pgs
    objects: 474 objects, 199 MiB
    usage:   716 MiB used, 2.9 TiB / 2.9 TiB avail
    pgs:     172 active+clean

  io:
    client:   1023 B/s rd, 1023 B/s wr, 1 op/s rd, 0 op/s wr

sh-5.1$


pods: 

[root@rdr-rhcs-bastion-0 ~]# oc get pods
NAME                                                              READY   STATUS      RESTARTS        AGE
csi-addons-controller-manager-75c885b44c-pw2d7                    2/2     Running     0               4h43m
csi-cephfsplugin-7z8ww                                            2/2     Running     0               6h44m
csi-cephfsplugin-bzzj5                                            2/2     Running     1 (6h44m ago)   6h44m
csi-cephfsplugin-m6778                                            2/2     Running     0               85m
csi-cephfsplugin-provisioner-7bfdcdd855-n692v                     6/6     Running     0               4h43m
csi-cephfsplugin-provisioner-7bfdcdd855-v957v                     6/6     Running     0               6h44m
csi-rbdplugin-867wr                                               3/3     Running     0               3m48s
csi-rbdplugin-provisioner-d5c8c7cc4-pwqcg                         6/6     Running     0               3m48s
csi-rbdplugin-provisioner-d5c8c7cc4-wd4sw                         6/6     Running     0               3m48s
csi-rbdplugin-s7xlp                                               3/3     Running     0               3m42s
csi-rbdplugin-shqz6                                               3/3     Running     0               3m45s
noobaa-core-0                                                     1/1     Running     0               6h40m
noobaa-db-pg-0                                                    1/1     Running     0               6h41m
noobaa-endpoint-7b8bff5fd4-95ht9                                  1/1     Running     0               2m52s
noobaa-operator-6d5c65dc7d-c6szh                                  1/1     Running     0               7h28m
ocs-metrics-exporter-5d54875b4c-h2t9f                             1/1     Running     0               6h41m
ocs-operator-765f85d7fc-xw7d9                                     1/1     Running     0               7h28m
odf-console-66fff9846-ljs7j                                       1/1     Running     0               7h28m
odf-operator-controller-manager-86f9787f9-t4zmb                   2/2     Running     0               7h28m
rook-ceph-crashcollector-worker-0-77cff6b86c-c92gz                1/1     Running     0               6h42m
rook-ceph-crashcollector-worker-1-7559dc47dd-944dj                1/1     Running     0               6h42m
rook-ceph-crashcollector-worker-2-7d87957c55-l8fcp                1/1     Running     0               85m
rook-ceph-exporter-worker-0-544c48b7b8-tch9l                      1/1     Running     0               6h41m
rook-ceph-exporter-worker-1-79d446fcdc-clvgq                      1/1     Running     0               6h42m
rook-ceph-exporter-worker-2-66bb46f844-mxmz4                      1/1     Running     0               85m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-849755f49qxt6   2/2     Running     8 (4h19m ago)   6h42m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5b887978klhls   2/2     Running     8 (4h19m ago)   4h43m
rook-ceph-mgr-a-7cd4fcb6cc-78pp2                                  3/3     Running     0               6h43m
rook-ceph-mgr-b-75cb574fcd-8kqfj                                  3/3     Running     0               4h43m
rook-ceph-mon-b-6fdcdd944d-tgvgr                                  2/2     Running     0               6h43m
rook-ceph-mon-c-58db847b4d-b5ltj                                  2/2     Running     0               6h43m
rook-ceph-mon-d-56f99d4b58-9vchj                                  2/2     Running     0               85m
rook-ceph-operator-64d86f55fc-b9sfk                               1/1     Running     0               3m56s
rook-ceph-osd-0-569b444b46-ht7w6                                  2/2     Running     0               6h42m
rook-ceph-osd-1-5b7f86586d-v76bz                                  2/2     Running     0               4h43m
rook-ceph-osd-2-5676fc5d55-6dmhj                                  2/2     Running     0               6h42m
rook-ceph-osd-3-5c68b4c658-n2t8h                                  2/2     Running     0               2m52s
rook-ceph-osd-4-78b4b4fd74-dktp8                                  2/2     Running     0               2m51s
rook-ceph-osd-5-77c7b58f8f-s9qmj                                  2/2     Running     0               2m50s
rook-ceph-osd-prepare-0721f34325c9b3d7c7ac6da4f641f80c-g7fmz      0/1     Completed   0               6h43m
rook-ceph-osd-prepare-d99b2443ee3d5b4d4acdb8773b4acc55-5b25p      0/1     Completed   0               6h43m
rook-ceph-osd-prepare-worker-0-data-0qrzq2-plxpj                  0/1     Completed   0               3m4s
rook-ceph-osd-prepare-worker-1-data-0tzdxh-cw6d8                  0/1     Completed   0               3m4s
rook-ceph-osd-prepare-worker-2-data-0gnv62-pg9vs                  0/1     Completed   0               3m3s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-fbdf74d54w9t   2/2     Running     0               6h42m
rook-ceph-tools-55584dc469-z76fm                                  1/1     Running     0               4h43m
ux-backend-server-fc45c47-9dsd8                                   2/2     Running     0               7h28m

SC, PVC, PV, Cephblockpool:::

[root@rdr-rhcs-bastion-0 ~]# oc get sc
NAME                                        PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
localblock                                  kubernetes.io/no-provisioner            Delete          WaitForFirstConsumer   false                  7h7m
ocs-storagecluster-ceph-non-resilient-rbd   openshift-storage.rbd.csi.ceph.com      Delete          WaitForFirstConsumer   true                   1m
ocs-storagecluster-ceph-rbd (default)       openshift-storage.rbd.csi.ceph.com      Delete          Immediate              true                   6h58m
ocs-storagecluster-ceph-rgw                 openshift-storage.ceph.rook.io/bucket   Delete          Immediate              false                  7h1m
ocs-storagecluster-cephfs                   openshift-storage.cephfs.csi.ceph.com   Delete          Immediate              true                   6h58m
openshift-storage.noobaa.io                 openshift-storage.noobaa.io/obc         Delete          Immediate              false                  6h56m

[root@rdr-rhcs-bastion-0 ~]# oc get pvc
NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                        Bound    pvc-f938ca4c-b062-4c91-adab-74fcfe91f7d0   50Gi       RWO            ocs-storagecluster-ceph-rbd   6h38m
ocs-deviceset-localblock-0-data-0tm2tx   Bound    local-pv-2eddb94                           500Gi      RWO            localblock                    6h40m
ocs-deviceset-localblock-0-data-1bwnz9   Bound    local-pv-191c364e                          500Gi      RWO            localblock                    6h40m
ocs-deviceset-localblock-0-data-24m7hl   Bound    local-pv-1ed0eb36                          500Gi      RWO            localblock                    6h40m
worker-0-data-0qrzq2                     Bound    local-pv-a3a590cd                          500Gi      RWO            localblock                    8s
worker-1-data-0tzdxh                     Bound    local-pv-a6199fcf                          500Gi      RWO            localblock                    8s
worker-2-data-0gnv62                     Bound    local-pv-e58bded7                          500Gi      RWO            localblock                    8s
[root@rdr-rhcs-bastion-0 ~]#
[root@rdr-rhcs-bastion-0 ~]#
[root@rdr-rhcs-bastion-0 ~]# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                      STORAGECLASS                  REASON   AGE
local-pv-191c364e                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-1bwnz9   localblock                             6h48m
local-pv-1ed0eb36                          500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-24m7hl   localblock                             6h48m
local-pv-2eddb94                           500Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-localblock-0-data-0tm2tx   localblock                             6h48m
local-pv-a3a590cd                          500Gi      RWO            Delete           Bound    openshift-storage/worker-0-data-0qrzq2                     localblock                             3m16s
local-pv-a6199fcf                          500Gi      RWO            Delete           Bound    openshift-storage/worker-1-data-0tzdxh                     localblock                             3m16s
local-pv-e58bded7                          500Gi      RWO            Delete           Bound    openshift-storage/worker-2-data-0gnv62                     localblock                             3m16s
pvc-f938ca4c-b062-4c91-adab-74fcfe91f7d0   50Gi       RWO            Delete           Bound    openshift-storage/db-noobaa-db-pg-0                        ocs-storagecluster-ceph-rbd            6h38m

[root@rdr-rhcs-bastion-0 ~]# oc get cephblockpools
NAME                                        PHASE
ocs-storagecluster-cephblockpool            Ready
ocs-storagecluster-cephblockpool-worker-0   Ready
ocs-storagecluster-cephblockpool-worker-1   Ready
ocs-storagecluster-cephblockpool-worker-2   Ready

Comment 9 Malay Kumar parida 2024-02-20 04:02:27 UTC
Can you please share the output of 
oc get cm rook-config-override -o yaml

Comment 10 Aaruni Aggarwal 2024-02-20 05:53:59 UTC
[root@rdr-rhcs-bastion-0 ~]# oc get cm rook-config-override -o yaml
apiVersion: v1
data:
  config: |
    [global]
    bdev_flock_retry                   = 20
    mon_osd_full_ratio                 = .85
    mon_osd_backfillfull_ratio         = .8
    mon_osd_nearfull_ratio             = .75
    mon_max_pg_per_osd                 = 600
    mon_pg_warn_max_object_skew        = 0
    mon_data_avail_warn                = 15
    bluestore_prefer_deferred_size_hdd = 0
    mon_warn_on_pool_no_redundancy     = false

    [osd]
    osd_memory_target_cgroup_limit_ratio = 0.8
kind: ConfigMap
metadata:
  creationTimestamp: "2024-02-19T12:20:26Z"
  name: rook-config-override
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ocs.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: StorageCluster
    name: ocs-storagecluster
    uid: bb0c3885-edc0-480a-a3ee-c2ebe5d6fd6e
  resourceVersion: "3310331"
  uid: 38f2e840-4596-4e5d-9e24-001b16bfa29c

Comment 11 Malay Kumar parida 2024-02-20 06:20:07 UTC
Hi Travis, as you can see despite the mon_warn_on_pool_no_redundancy = false value being present in the CM, we are still seeing the warning about no redundancy. Can you please take a look.

Comment 12 Travis Nielsen 2024-02-20 20:01:23 UTC
The CM is only applied to the mons when the mons restart. But when the feature is enabled, we obviously don't want to restart the mons just to apply this setting. 

I'd suggest we always suppress this warning for all ODF clusters, since we anyway only expect users to use replica 1 pools when they have been configured properly through the non-resilient feature.

Comment 13 Aaruni Aggarwal 2024-02-21 08:02:05 UTC
Moving it to ASSIGNED state as it FAILED_QA

Comment 14 Malay Kumar parida 2024-02-21 08:04:14 UTC
Raised a follow up fix as per https://bugzilla.redhat.com/show_bug.cgi?id=2260131#c12, This time have tested it myself. Can't see the health warning anymore, so should be good.

Comment 16 Aaruni Aggarwal 2024-03-07 17:14:06 UTC
Re-tested Replica-1 with ODF build: 4.15.0-150 and I can't see Warning in my cluster. Ceph health shows HEALTH_OK

[root@rdr-replica-bastion-0 ~]# oc get pods -n openshift-storage |grep osd
rook-ceph-osd-0-69f99cbb47-2s95x                                  2/2     Running     0          119m
rook-ceph-osd-1-897cf8687-r6qb9                                   2/2     Running     0          119m
rook-ceph-osd-2-864ccff67-xp4qf                                   2/2     Running     0          119m
rook-ceph-osd-3-9dfd577f7-vbj6g                                   2/2     Running     0          35m
rook-ceph-osd-4-8667cdf5cb-m4fmb                                  2/2     Running     0          35m
rook-ceph-osd-5-5c48784794-4kkrj                                  2/2     Running     0          35m
rook-ceph-osd-prepare-1c083b3e5a996b47de7615107ffa6d71-k7pmj      0/1     Completed   0          119m
rook-ceph-osd-prepare-c04ab62c9f6cb3ea614ed610d70f056d-d9762      0/1     Completed   0          119m
rook-ceph-osd-prepare-cf050d3c7600d9718336045378c2c4fd-tnszq      0/1     Completed   0          119m
rook-ceph-osd-prepare-worker-0-data-07gxk5-6twbd                  0/1     Completed   0          35m
rook-ceph-osd-prepare-worker-1-data-0r9qjx-h662d                  0/1     Completed   0          35m
rook-ceph-osd-prepare-worker-2-data-044vhw-nq6pd                  0/1     Completed   0          35m

[root@rdr-replica-bastion-0 ~]# oc get cephblockpools
NAME                                        PHASE
ocs-storagecluster-cephblockpool            Ready
ocs-storagecluster-cephblockpool-worker-0   Ready
ocs-storagecluster-cephblockpool-worker-1   Ready
ocs-storagecluster-cephblockpool-worker-2   Ready

[root@rdr-replica-bastion-0 ~]# oc rsh rook-ceph-tools-dbddf8896-jt9kn
sh-5.1$
sh-5.1$ ceph health
HEALTH_OK
sh-5.1$
sh-5.1$ ceph -s
  cluster:
    id:     af365cd2-27f2-49ea-a47f-8a185a4adc15
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 2h)
    mgr: a(active, since 2h), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 37m), 6 in (since 38m)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   15 pools, 172 pgs
    objects: 461 objects, 162 MiB
    usage:   472 MiB used, 2.9 TiB / 2.9 TiB avail
    pgs:     172 active+clean

  io:
    client:   1.2 KiB/s rd, 1.7 KiB/s wr, 2 op/s rd, 0 op/s wr


sh-5.1$ ceph osd pool ls
.mgr
ocs-storagecluster-cephblockpool
ocs-storagecluster-cephobjectstore.rgw.otp
ocs-storagecluster-cephobjectstore.rgw.buckets.index
.rgw.root
ocs-storagecluster-cephobjectstore.rgw.log
ocs-storagecluster-cephobjectstore.rgw.control
ocs-storagecluster-cephobjectstore.rgw.meta
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephobjectstore.rgw.buckets.data
ocs-storagecluster-cephfilesystem-data0
ocs-storagecluster-cephblockpool-worker-0
ocs-storagecluster-cephblockpool-worker-1
ocs-storagecluster-cephblockpool-worker-2

Comment 17 errata-xmlrpc 2024-03-19 15:32:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.