Bug 2260131
| Summary: | Health is going in Warning state after patching the storagecluster for replica-1 in ODF4.15 on IBM Power cluster | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Aaruni Aggarwal <aaaggarw> |
| Component: | ocs-operator | Assignee: | Malay Kumar parida <mparida> |
| Status: | CLOSED ERRATA | QA Contact: | Aviad Polak <apolak> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.15 | CC: | mparida, nberry, odf-bz-bot, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.15.0 | ||
| Hardware: | ppc64le | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 4.15.0-149 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-03-19 15:32:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
CSV:
[root@rdr-odfpool-bastion-0 ~]# oc get csv -A
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE
openshift-local-storage local-storage-operator.v4.15.0-202311280332 Local Storage 4.15.0-202311280332 Succeeded
openshift-operator-lifecycle-manager packageserver Package Server 0.0.1-snapshot Succeeded
openshift-storage mcg-operator.v4.15.0-120.stable NooBaa Operator 4.15.0-120.stable Succeeded
openshift-storage ocs-operator.v4.15.0-120.stable OpenShift Container Storage 4.15.0-120.stable Succeeded
openshift-storage odf-csi-addons-operator.v4.15.0-120.stable CSI Addons 4.15.0-120.stable Succeeded
openshift-storage odf-operator.v4.15.0-120.stable OpenShift Data Foundation 4.15.0-120.stable Succeeded
[root@rdr-odfpool-bastion-0 ~]#
Pods:
[root@rdr-odfpool-bastion-0 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-58d5498995-8wlsp 2/2 Running 15 (10h ago) 5d22h
csi-cephfsplugin-92wts 2/2 Running 0 4d3h
csi-cephfsplugin-jnpdd 2/2 Running 0 4d3h
csi-cephfsplugin-provisioner-747587df87-bb86z 6/6 Running 0 4d3h
csi-cephfsplugin-provisioner-747587df87-nkzns 6/6 Running 0 4d3h
csi-cephfsplugin-w426w 2/2 Running 1 (4d3h ago) 4d3h
csi-rbdplugin-lxc7q 3/3 Running 0 4d2h
csi-rbdplugin-provisioner-7b7c74c7dd-fms6z 6/6 Running 0 4d2h
csi-rbdplugin-provisioner-7b7c74c7dd-x8dbc 6/6 Running 0 4d2h
csi-rbdplugin-rx8tb 3/3 Running 0 4d2h
csi-rbdplugin-z78mb 3/3 Running 0 4d2h
noobaa-core-0 1/1 Running 0 4d3h
noobaa-db-pg-0 1/1 Running 0 4d3h
noobaa-endpoint-6485c65647-n9btr 1/1 Running 0 4d3h
noobaa-operator-6d7d5b477-mn95n 2/2 Running 0 5d22h
ocs-metrics-exporter-67dc65cbcb-fp56t 1/1 Running 0 4d3h
ocs-operator-5cb4f78cb6-l9t97 1/1 Running 13 (8h ago) 5d22h
odf-console-6b58b9fdd7-9stzt 1/1 Running 0 5d22h
odf-operator-controller-manager-7857965fbc-j2246 2/2 Running 13 (8h ago) 5d22h
rook-ceph-crashcollector-worker-0-5f8d4944-46mj9 1/1 Running 0 4d3h
rook-ceph-crashcollector-worker-1-6bbfd975f9-sklt6 1/1 Running 0 4d3h
rook-ceph-crashcollector-worker-2-5c4f9ddbfd-w6r7x 1/1 Running 0 4d3h
rook-ceph-exporter-worker-0-bdb959b6d-5nbqp 1/1 Running 0 4d3h
rook-ceph-exporter-worker-1-576fd75979-d56wz 1/1 Running 0 4d3h
rook-ceph-exporter-worker-2-d94f78766-t8cqs 1/1 Running 0 4d3h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7db974dbj5bft 2/2 Running 0 4d3h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6bbcbb958vrf5 2/2 Running 0 4d3h
rook-ceph-mgr-a-6f77cff446-5dphm 3/3 Running 0 4d3h
rook-ceph-mgr-b-6cb75f8f57-ddvfr 3/3 Running 0 4d3h
rook-ceph-mon-a-b84b6c548-qdd7g 2/2 Running 0 4d3h
rook-ceph-mon-b-5694c6cb74-bfsng 2/2 Running 0 4d3h
rook-ceph-mon-c-57c6dc4b8c-4lbx9 2/2 Running 0 4d3h
rook-ceph-operator-8679b956f6-qnqcr 1/1 Running 0 4d2h
rook-ceph-osd-0-d8b5d68b9-wcwcv 2/2 Running 0 4d3h
rook-ceph-osd-1-dc5f84dc5-n4rhk 2/2 Running 0 4d3h
rook-ceph-osd-2-597c7495b6-zn9xp 2/2 Running 0 4d3h
rook-ceph-osd-3-6fcbf6997b-thbmz 2/2 Running 0 4d2h
rook-ceph-osd-4-57c898cb94-mdmjx 2/2 Running 0 4d2h
rook-ceph-osd-5-84bdc44c49-g6s2c 2/2 Running 0 4d2h
rook-ceph-osd-prepare-3ab83856ecd22df4394b63644f3d0dae-4f8fv 0/1 Completed 0 4d3h
rook-ceph-osd-prepare-c1fc0f829c0fe6db017d50cbe304d1b7-qlzgc 0/1 Completed 0 4d3h
rook-ceph-osd-prepare-db2d67412944f23ce104c640e62a289a-f6rwd 0/1 Completed 0 4d3h
rook-ceph-osd-prepare-worker-0-data-0ppwll-x4t59 0/1 Completed 0 4d2h
rook-ceph-osd-prepare-worker-1-data-0wt9rm-wvx4j 0/1 Completed 0 4d2h
rook-ceph-osd-prepare-worker-2-data-047mm9-gvdh7 0/1 Completed 0 4d2h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-55d87b5jr9fj 2/2 Running 0 4d3h
rook-ceph-tools-746d95679-z84c2 1/1 Running 0 4d3h
ux-backend-server-b7f97d97b-xlfhd 2/2 Running 0 5d22h
[root@rdr-odfpool-bastion-0 ~]#
PVC and PV:
--------
[root@rdr-odfpool-bastion-0 ~]# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
db-noobaa-db-pg-0 Bound pvc-3c2d5675-0a5d-4283-aafb-d0c26dec1032 50Gi RWO ocs-storagecluster-ceph-rbd 4d3h
ocs-deviceset-localblock-0-data-07g2nk Bound local-pv-3efe86e9 500Gi RWO localblock 4d3h
ocs-deviceset-localblock-0-data-1s6js6 Bound local-pv-da8faa7a 500Gi RWO localblock 4d3h
ocs-deviceset-localblock-0-data-2p9n4h Bound local-pv-603ff5c5 500Gi RWO localblock 4d3h
worker-0-data-0ppwll Bound local-pv-9603076d 500Gi RWO localblock 4d2h
worker-1-data-0wt9rm Bound local-pv-c03b4c4b 500Gi RWO localblock 4d2h
worker-2-data-047mm9 Bound local-pv-95556960 500Gi RWO localblock 4d2h
[root@rdr-odfpool-bastion-0 ~]# oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-3efe86e9 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-07g2nk localblock 5d8h
local-pv-603ff5c5 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-2p9n4h localblock 5d8h
local-pv-95556960 500Gi RWO Delete Bound openshift-storage/worker-2-data-047mm9 localblock 4d2h
local-pv-9603076d 500Gi RWO Delete Bound openshift-storage/worker-0-data-0ppwll localblock 4d2h
local-pv-c03b4c4b 500Gi RWO Delete Bound openshift-storage/worker-1-data-0wt9rm localblock 4d2h
local-pv-da8faa7a 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-1s6js6 localblock 5d8h
pvc-3c2d5675-0a5d-4283-aafb-d0c26dec1032 50Gi RWO Delete Bound openshift-storage/db-noobaa-db-pg-0 ocs-storagecluster-ceph-rbd 4d3h
pvc-b804b41e-4671-4db2-992a-ae4b04ea7121 1Gi RWO Delete Bound test/non-resilient-rbd-pvc ocs-storagecluster-ceph-non-resilient-rbd 3d7h
[root@rdr-odfpool-bastion-0 ~]# oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
localblock kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 5d8h
ocs-storagecluster-ceph-non-resilient-rbd openshift-storage.rbd.csi.ceph.com Delete WaitForFirstConsumer true 4d2h
ocs-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 4d3h
ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 4d3h
ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 4d3h
openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 4d3h
Storagecluster yaml:
[root@rdr-odfpool-bastion-0 ~]# oc get storagecluster -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
annotations:
cluster.ocs.openshift.io/local-devices: "true"
uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
creationTimestamp: "2024-01-20T12:34:37Z"
finalizers:
- storagecluster.ocs.openshift.io
generation: 4
name: ocs-storagecluster
namespace: openshift-storage
ownerReferences:
- apiVersion: odf.openshift.io/v1alpha1
kind: StorageSystem
name: ocs-storagecluster-storagesystem
uid: 62f2cd5f-d4ac-4907-b275-4b26f5a7def0
resourceVersion: "4611540"
uid: ec252bc9-1608-41d1-93fe-c28d59774d5d
spec:
arbiter: {}
enableCephTools: true
encryption:
kms: {}
externalStorage: {}
flexibleScaling: true
managedResources:
cephBlockPools: {}
cephCluster: {}
cephConfig: {}
cephDashboard: {}
cephFilesystems: {}
cephNonResilientPools:
enable: true
cephObjectStoreUsers: {}
cephObjectStores: {}
cephRBDMirror:
daemonCount: 1
cephToolbox: {}
mirroring: {}
monDataDirHostPath: /var/lib/rook
network:
connections:
encryption: {}
multiClusterService: {}
nodeTopologies: {}
resourceProfile: balanced
storageDeviceSets:
- config: {}
count: 3
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1"
storageClassName: localblock
volumeMode: Block
status: {}
name: ocs-deviceset-localblock
placement: {}
preparePlacement: {}
replica: 1
resources: {}
status:
conditions:
- lastHeartbeatTime: "2024-01-20T12:34:39Z"
lastTransitionTime: "2024-01-20T12:34:39Z"
message: Version check successful
reason: VersionMatched
status: "False"
type: VersionMismatch
- lastHeartbeatTime: "2024-01-24T16:01:11Z"
lastTransitionTime: "2024-01-24T10:59:45Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: ReconcileComplete
- lastHeartbeatTime: "2024-01-24T16:01:11Z"
lastTransitionTime: "2024-01-20T12:39:37Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: Available
- lastHeartbeatTime: "2024-01-24T16:01:11Z"
lastTransitionTime: "2024-01-20T13:47:42Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "False"
type: Progressing
- lastHeartbeatTime: "2024-01-24T16:01:11Z"
lastTransitionTime: "2024-01-20T12:39:37Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "False"
type: Degraded
- lastHeartbeatTime: "2024-01-24T16:01:11Z"
lastTransitionTime: "2024-01-20T13:47:42Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: Upgradeable
currentMonCount: 3
failureDomain: host
failureDomainKey: kubernetes.io/hostname
failureDomainValues:
- worker-0
- worker-1
- worker-2
images:
ceph:
actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
noobaaCore:
actualImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f
desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f
noobaaDB:
actualImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293
desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293
kmsServerConnection: {}
lastAppliedResourceProfile: balanced
nodeTopologies:
labels:
kubernetes.io/hostname:
- worker-0
- worker-1
- worker-2
phase: Ready
relatedObjects:
- apiVersion: ceph.rook.io/v1
kind: CephCluster
name: ocs-storagecluster-cephcluster
namespace: openshift-storage
resourceVersion: "4611282"
uid: 41f09d90-5f41-44f6-b361-6fac57336dd1
- apiVersion: noobaa.io/v1alpha1
kind: NooBaa
name: noobaa
namespace: openshift-storage
resourceVersion: "4611533"
uid: 18cb4ced-a6c5-4eca-8bf6-7a502b50d006
version: 4.15.0
kind: List
metadata:
resourceVersion: ""
Ceph health::
[root@rdr-odfpool-bastion-0 ~]# oc rsh rook-ceph-tools-746d95679-z84c2
sh-5.1$
sh-5.1$ ceph -s
cluster:
id: b44987ec-bf85-4ce8-9fd1-98f94f0abb6b
health: HEALTH_WARN
3 pool(s) have no replicas configured
services:
mon: 3 daemons, quorum a,b,c (age 2h)
mgr: a(active, since 3d), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 3d), 6 in (since 3d)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 15 pools, 188 pgs
objects: 568 objects, 405 MiB
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 188 active+clean
io:
client: 1.4 KiB/s rd, 6.0 KiB/s wr, 2 op/s rd, 0 op/s wr
sh-5.1$
The warning is about pools having no redundancy(which is the case in replica-1). So when enabling replica-1 we set a value to suppress this warning mon_warn_on_pool_no_redundancy = false It's working when replica-1 is enabled from the beginning but seems like the value is not being set properly when the storagecluster is patched to enable the replica-1 feature. With ODF build 4.15.0-143.stable
I am still seeing the same warning message.
[root@rdr-rhcs-bastion-0 ~]# oc rsh rook-ceph-tools-55584dc469-z76fm
sh-5.1$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 1.5 TiB 1.5 TiB 435 MiB 435 MiB 0.03
worker-0 500 GiB 500 GiB 103 MiB 103 MiB 0.02
worker-1 500 GiB 500 GiB 96 MiB 96 MiB 0.02
worker-2 500 GiB 500 GiB 75 MiB 75 MiB 0.01
TOTAL 2.9 TiB 2.9 TiB 710 MiB 710 MiB 0.02
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
ocs-storagecluster-cephblockpool 1 32 124 MiB 85 372 MiB 0.01 850 GiB
.mgr 2 1 961 KiB 2 2.8 MiB 0 850 GiB
.rgw.root 3 8 5.8 KiB 16 180 KiB 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.buckets.index 4 8 0 B 11 0 B 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.log 5 8 25 KiB 308 1.9 MiB 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 6 8 0 B 0 0 B 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.otp 7 8 0 B 0 0 B 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.meta 8 8 2.8 KiB 14 144 KiB 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.control 9 8 0 B 8 0 B 0 850 GiB
ocs-storagecluster-cephfilesystem-metadata 10 16 8.7 MiB 26 26 MiB 0 850 GiB
ocs-storagecluster-cephobjectstore.rgw.buckets.data 11 32 1 KiB 1 12 KiB 0 850 GiB
ocs-storagecluster-cephfilesystem-data0 12 32 0 B 0 0 B 0 850 GiB
ocs-storagecluster-cephblockpool-worker-0 13 1 19 B 1 4 KiB 0 425 GiB
ocs-storagecluster-cephblockpool-worker-1 14 1 19 B 1 4 KiB 0 425 GiB
ocs-storagecluster-cephblockpool-worker-2 15 1 19 B 1 4 KiB 0 425 GiB
sh-5.1$
sh-5.1$
sh-5.1$ ceph osd pool ls
ocs-storagecluster-cephblockpool
.mgr
.rgw.root
ocs-storagecluster-cephobjectstore.rgw.buckets.index
ocs-storagecluster-cephobjectstore.rgw.log
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec
ocs-storagecluster-cephobjectstore.rgw.otp
ocs-storagecluster-cephobjectstore.rgw.meta
ocs-storagecluster-cephobjectstore.rgw.control
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephobjectstore.rgw.buckets.data
ocs-storagecluster-cephfilesystem-data0
ocs-storagecluster-cephblockpool-worker-0
ocs-storagecluster-cephblockpool-worker-1
ocs-storagecluster-cephblockpool-worker-2
sh-5.1$
sh-5.1$ ceph osd pool ls detail
pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 132 lfor 0/0/40 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd
pool 2 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 12 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 10 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 134 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 4 'ocs-storagecluster-cephobjectstore.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 13 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 5 'ocs-storagecluster-cephobjectstore.rgw.log' replicated size 3 min_size 2 crush_rule 12 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 132 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 6 'ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 17 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 7 'ocs-storagecluster-cephobjectstore.rgw.otp' replicated size 3 min_size 2 crush_rule 15 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 134 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 8 'ocs-storagecluster-cephobjectstore.rgw.meta' replicated size 3 min_size 2 crush_rule 16 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 9 'ocs-storagecluster-cephobjectstore.rgw.control' replicated size 3 min_size 2 crush_rule 14 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 133 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 10 'ocs-storagecluster-cephfilesystem-metadata' replicated size 3 min_size 2 crush_rule 18 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 136 lfor 0/0/40 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 11 'ocs-storagecluster-cephobjectstore.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 22 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 135 lfor 0/0/42 flags hashpspool stripe_width 0 target_size_ratio 0.49 application rook-ceph-rgw
pool 12 'ocs-storagecluster-cephfilesystem-data0' replicated size 3 min_size 2 crush_rule 21 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 137 lfor 0/0/42 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs
pool 13 'ocs-storagecluster-cephblockpool-worker-0' replicated size 1 min_size 1 crush_rule 24 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 188 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 14 'ocs-storagecluster-cephblockpool-worker-1' replicated size 1 min_size 1 crush_rule 26 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 162 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 15 'ocs-storagecluster-cephblockpool-worker-2' replicated size 1 min_size 1 crush_rule 28 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 172 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
sh-5.1$
sh-5.1$
sh-5.1$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.92978 root default
-7 0.97659 host worker-0
2 ssd 0.48830 osd.2 up 1.00000 1.00000
3 worker-0 0.48830 osd.3 up 1.00000 1.00000
-3 0.97659 host worker-1
0 ssd 0.48830 osd.0 up 1.00000 1.00000
4 worker-1 0.48830 osd.4 up 1.00000 1.00000
-5 0.97659 host worker-2
1 ssd 0.48830 osd.1 up 1.00000 1.00000
5 worker-2 0.48830 osd.5 up 1.00000 1.00000
sh-5.1$
sh-5.1$
sh-5.1$ ceph health
HEALTH_WARN 3 pool(s) have no replicas configured
sh-5.1$
sh-5.1$ ceph -s
cluster:
id: fc753d71-9791-43ff-9f43-67e1ef84c32c
health: HEALTH_WARN
3 pool(s) have no replicas configured
services:
mon: 3 daemons, quorum b,c,d (age 95m)
mgr: a(active, since 6h), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 12m), 6 in (since 13m)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 15 pools, 172 pgs
objects: 474 objects, 199 MiB
usage: 716 MiB used, 2.9 TiB / 2.9 TiB avail
pgs: 172 active+clean
io:
client: 1023 B/s rd, 1023 B/s wr, 1 op/s rd, 0 op/s wr
sh-5.1$
pods:
[root@rdr-rhcs-bastion-0 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-75c885b44c-pw2d7 2/2 Running 0 4h43m
csi-cephfsplugin-7z8ww 2/2 Running 0 6h44m
csi-cephfsplugin-bzzj5 2/2 Running 1 (6h44m ago) 6h44m
csi-cephfsplugin-m6778 2/2 Running 0 85m
csi-cephfsplugin-provisioner-7bfdcdd855-n692v 6/6 Running 0 4h43m
csi-cephfsplugin-provisioner-7bfdcdd855-v957v 6/6 Running 0 6h44m
csi-rbdplugin-867wr 3/3 Running 0 3m48s
csi-rbdplugin-provisioner-d5c8c7cc4-pwqcg 6/6 Running 0 3m48s
csi-rbdplugin-provisioner-d5c8c7cc4-wd4sw 6/6 Running 0 3m48s
csi-rbdplugin-s7xlp 3/3 Running 0 3m42s
csi-rbdplugin-shqz6 3/3 Running 0 3m45s
noobaa-core-0 1/1 Running 0 6h40m
noobaa-db-pg-0 1/1 Running 0 6h41m
noobaa-endpoint-7b8bff5fd4-95ht9 1/1 Running 0 2m52s
noobaa-operator-6d5c65dc7d-c6szh 1/1 Running 0 7h28m
ocs-metrics-exporter-5d54875b4c-h2t9f 1/1 Running 0 6h41m
ocs-operator-765f85d7fc-xw7d9 1/1 Running 0 7h28m
odf-console-66fff9846-ljs7j 1/1 Running 0 7h28m
odf-operator-controller-manager-86f9787f9-t4zmb 2/2 Running 0 7h28m
rook-ceph-crashcollector-worker-0-77cff6b86c-c92gz 1/1 Running 0 6h42m
rook-ceph-crashcollector-worker-1-7559dc47dd-944dj 1/1 Running 0 6h42m
rook-ceph-crashcollector-worker-2-7d87957c55-l8fcp 1/1 Running 0 85m
rook-ceph-exporter-worker-0-544c48b7b8-tch9l 1/1 Running 0 6h41m
rook-ceph-exporter-worker-1-79d446fcdc-clvgq 1/1 Running 0 6h42m
rook-ceph-exporter-worker-2-66bb46f844-mxmz4 1/1 Running 0 85m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-849755f49qxt6 2/2 Running 8 (4h19m ago) 6h42m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5b887978klhls 2/2 Running 8 (4h19m ago) 4h43m
rook-ceph-mgr-a-7cd4fcb6cc-78pp2 3/3 Running 0 6h43m
rook-ceph-mgr-b-75cb574fcd-8kqfj 3/3 Running 0 4h43m
rook-ceph-mon-b-6fdcdd944d-tgvgr 2/2 Running 0 6h43m
rook-ceph-mon-c-58db847b4d-b5ltj 2/2 Running 0 6h43m
rook-ceph-mon-d-56f99d4b58-9vchj 2/2 Running 0 85m
rook-ceph-operator-64d86f55fc-b9sfk 1/1 Running 0 3m56s
rook-ceph-osd-0-569b444b46-ht7w6 2/2 Running 0 6h42m
rook-ceph-osd-1-5b7f86586d-v76bz 2/2 Running 0 4h43m
rook-ceph-osd-2-5676fc5d55-6dmhj 2/2 Running 0 6h42m
rook-ceph-osd-3-5c68b4c658-n2t8h 2/2 Running 0 2m52s
rook-ceph-osd-4-78b4b4fd74-dktp8 2/2 Running 0 2m51s
rook-ceph-osd-5-77c7b58f8f-s9qmj 2/2 Running 0 2m50s
rook-ceph-osd-prepare-0721f34325c9b3d7c7ac6da4f641f80c-g7fmz 0/1 Completed 0 6h43m
rook-ceph-osd-prepare-d99b2443ee3d5b4d4acdb8773b4acc55-5b25p 0/1 Completed 0 6h43m
rook-ceph-osd-prepare-worker-0-data-0qrzq2-plxpj 0/1 Completed 0 3m4s
rook-ceph-osd-prepare-worker-1-data-0tzdxh-cw6d8 0/1 Completed 0 3m4s
rook-ceph-osd-prepare-worker-2-data-0gnv62-pg9vs 0/1 Completed 0 3m3s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-fbdf74d54w9t 2/2 Running 0 6h42m
rook-ceph-tools-55584dc469-z76fm 1/1 Running 0 4h43m
ux-backend-server-fc45c47-9dsd8 2/2 Running 0 7h28m
SC, PVC, PV, Cephblockpool:::
[root@rdr-rhcs-bastion-0 ~]# oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
localblock kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 7h7m
ocs-storagecluster-ceph-non-resilient-rbd openshift-storage.rbd.csi.ceph.com Delete WaitForFirstConsumer true 1m
ocs-storagecluster-ceph-rbd (default) openshift-storage.rbd.csi.ceph.com Delete Immediate true 6h58m
ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 7h1m
ocs-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 6h58m
openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 6h56m
[root@rdr-rhcs-bastion-0 ~]# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
db-noobaa-db-pg-0 Bound pvc-f938ca4c-b062-4c91-adab-74fcfe91f7d0 50Gi RWO ocs-storagecluster-ceph-rbd 6h38m
ocs-deviceset-localblock-0-data-0tm2tx Bound local-pv-2eddb94 500Gi RWO localblock 6h40m
ocs-deviceset-localblock-0-data-1bwnz9 Bound local-pv-191c364e 500Gi RWO localblock 6h40m
ocs-deviceset-localblock-0-data-24m7hl Bound local-pv-1ed0eb36 500Gi RWO localblock 6h40m
worker-0-data-0qrzq2 Bound local-pv-a3a590cd 500Gi RWO localblock 8s
worker-1-data-0tzdxh Bound local-pv-a6199fcf 500Gi RWO localblock 8s
worker-2-data-0gnv62 Bound local-pv-e58bded7 500Gi RWO localblock 8s
[root@rdr-rhcs-bastion-0 ~]#
[root@rdr-rhcs-bastion-0 ~]#
[root@rdr-rhcs-bastion-0 ~]# oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-191c364e 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-1bwnz9 localblock 6h48m
local-pv-1ed0eb36 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-24m7hl localblock 6h48m
local-pv-2eddb94 500Gi RWO Delete Bound openshift-storage/ocs-deviceset-localblock-0-data-0tm2tx localblock 6h48m
local-pv-a3a590cd 500Gi RWO Delete Bound openshift-storage/worker-0-data-0qrzq2 localblock 3m16s
local-pv-a6199fcf 500Gi RWO Delete Bound openshift-storage/worker-1-data-0tzdxh localblock 3m16s
local-pv-e58bded7 500Gi RWO Delete Bound openshift-storage/worker-2-data-0gnv62 localblock 3m16s
pvc-f938ca4c-b062-4c91-adab-74fcfe91f7d0 50Gi RWO Delete Bound openshift-storage/db-noobaa-db-pg-0 ocs-storagecluster-ceph-rbd 6h38m
[root@rdr-rhcs-bastion-0 ~]# oc get cephblockpools
NAME PHASE
ocs-storagecluster-cephblockpool Ready
ocs-storagecluster-cephblockpool-worker-0 Ready
ocs-storagecluster-cephblockpool-worker-1 Ready
ocs-storagecluster-cephblockpool-worker-2 Ready
Can you please share the output of oc get cm rook-config-override -o yaml [root@rdr-rhcs-bastion-0 ~]# oc get cm rook-config-override -o yaml
apiVersion: v1
data:
config: |
[global]
bdev_flock_retry = 20
mon_osd_full_ratio = .85
mon_osd_backfillfull_ratio = .8
mon_osd_nearfull_ratio = .75
mon_max_pg_per_osd = 600
mon_pg_warn_max_object_skew = 0
mon_data_avail_warn = 15
bluestore_prefer_deferred_size_hdd = 0
mon_warn_on_pool_no_redundancy = false
[osd]
osd_memory_target_cgroup_limit_ratio = 0.8
kind: ConfigMap
metadata:
creationTimestamp: "2024-02-19T12:20:26Z"
name: rook-config-override
namespace: openshift-storage
ownerReferences:
- apiVersion: ocs.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: StorageCluster
name: ocs-storagecluster
uid: bb0c3885-edc0-480a-a3ee-c2ebe5d6fd6e
resourceVersion: "3310331"
uid: 38f2e840-4596-4e5d-9e24-001b16bfa29c
Hi Travis, as you can see despite the mon_warn_on_pool_no_redundancy = false value being present in the CM, we are still seeing the warning about no redundancy. Can you please take a look. The CM is only applied to the mons when the mons restart. But when the feature is enabled, we obviously don't want to restart the mons just to apply this setting. I'd suggest we always suppress this warning for all ODF clusters, since we anyway only expect users to use replica 1 pools when they have been configured properly through the non-resilient feature. Moving it to ASSIGNED state as it FAILED_QA Raised a follow up fix as per https://bugzilla.redhat.com/show_bug.cgi?id=2260131#c12, This time have tested it myself. Can't see the health warning anymore, so should be good. Re-tested Replica-1 with ODF build: 4.15.0-150 and I can't see Warning in my cluster. Ceph health shows HEALTH_OK
[root@rdr-replica-bastion-0 ~]# oc get pods -n openshift-storage |grep osd
rook-ceph-osd-0-69f99cbb47-2s95x 2/2 Running 0 119m
rook-ceph-osd-1-897cf8687-r6qb9 2/2 Running 0 119m
rook-ceph-osd-2-864ccff67-xp4qf 2/2 Running 0 119m
rook-ceph-osd-3-9dfd577f7-vbj6g 2/2 Running 0 35m
rook-ceph-osd-4-8667cdf5cb-m4fmb 2/2 Running 0 35m
rook-ceph-osd-5-5c48784794-4kkrj 2/2 Running 0 35m
rook-ceph-osd-prepare-1c083b3e5a996b47de7615107ffa6d71-k7pmj 0/1 Completed 0 119m
rook-ceph-osd-prepare-c04ab62c9f6cb3ea614ed610d70f056d-d9762 0/1 Completed 0 119m
rook-ceph-osd-prepare-cf050d3c7600d9718336045378c2c4fd-tnszq 0/1 Completed 0 119m
rook-ceph-osd-prepare-worker-0-data-07gxk5-6twbd 0/1 Completed 0 35m
rook-ceph-osd-prepare-worker-1-data-0r9qjx-h662d 0/1 Completed 0 35m
rook-ceph-osd-prepare-worker-2-data-044vhw-nq6pd 0/1 Completed 0 35m
[root@rdr-replica-bastion-0 ~]# oc get cephblockpools
NAME PHASE
ocs-storagecluster-cephblockpool Ready
ocs-storagecluster-cephblockpool-worker-0 Ready
ocs-storagecluster-cephblockpool-worker-1 Ready
ocs-storagecluster-cephblockpool-worker-2 Ready
[root@rdr-replica-bastion-0 ~]# oc rsh rook-ceph-tools-dbddf8896-jt9kn
sh-5.1$
sh-5.1$ ceph health
HEALTH_OK
sh-5.1$
sh-5.1$ ceph -s
cluster:
id: af365cd2-27f2-49ea-a47f-8a185a4adc15
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 2h)
mgr: a(active, since 2h), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 37m), 6 in (since 38m)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 15 pools, 172 pgs
objects: 461 objects, 162 MiB
usage: 472 MiB used, 2.9 TiB / 2.9 TiB avail
pgs: 172 active+clean
io:
client: 1.2 KiB/s rd, 1.7 KiB/s wr, 2 op/s rd, 0 op/s wr
sh-5.1$ ceph osd pool ls
.mgr
ocs-storagecluster-cephblockpool
ocs-storagecluster-cephobjectstore.rgw.otp
ocs-storagecluster-cephobjectstore.rgw.buckets.index
.rgw.root
ocs-storagecluster-cephobjectstore.rgw.log
ocs-storagecluster-cephobjectstore.rgw.control
ocs-storagecluster-cephobjectstore.rgw.meta
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec
ocs-storagecluster-cephfilesystem-metadata
ocs-storagecluster-cephobjectstore.rgw.buckets.data
ocs-storagecluster-cephfilesystem-data0
ocs-storagecluster-cephblockpool-worker-0
ocs-storagecluster-cephblockpool-worker-1
ocs-storagecluster-cephblockpool-worker-2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383 |
Description of problem (please be detailed as possible and provide log snippests): Health is going in Warning state after patching the storagecluster for replica-1 in ODF4.15 on IBM Power cluster Version of all relevant components (if applicable): OCP: 4.15 ODF: 4.15.0-120 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create OCP4.15 cluster. Install LSO4.15 and ODF4.15 cluster. 2. Create localvolume with one disk and Create storagesystem. 3. Update localvolume with additional disk which will create 3 new PVs. 4. Patch storagecluster yaml using following command: oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephNonResilientPools/enable", "value": true }]' Actual results: Ceph health is in Warning state. Expected results: Ceph health should be in OK state. Additional info: