Created attachment 1739692 [details] Attachment contains the rook operator logs Description of problem (please be detailed as possible and provide log snippests): When on installing ocs on IBM VPC cluster,the RGW pods fails to show up in openshift-storage and the storagecluster gets stuck in progressing phase Version of all relevant components (if applicable): OCP : 4.5.18 OCS : 4.5.2 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, the OCS is not successfully installed Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Install OCS 4.5 on IBM VPC cluster with 4.5 OCP 2. 3. Actual results: RGW pods are not found Expected results: RGW pods needs to be running Additional info: Snippet of rook operator log: 2020-12-13 09:54:09.861163 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2020-12-13 09:54:10.894275 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec" 2020-12-13 09:54:11.925748 I | cephclient: setting pool property "pg_num_min" to "8" on pool ".rgw.root" 2020-12-13 09:54:12.159494 I | op-mon: parsing mon endpoints: b=172.21.208.140:6789,c=172.21.220.182:6789,a=172.21.93.55:6789 2020-12-13 09:54:12.159572 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found 2020-12-13 09:54:13.962642 E | ceph-object-controller: failed to reconcile failed to create object store deployments: failed to create object pools: failed to create data pool: failed to create pool ocs-storagecluster-cephobjectstore.rgw.buckets.data for object store ocs-storagecluster-cephobjectstore.: failed to create replicated pool ocs-storagecluster-cephobjectstore.rgw.buckets.data. Error ERANGE: pg_num 32 size 3 would mean 816 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3) : exit status 34 Ceph status : ceph status cluster: id: 8d52a259-29a5-4220-aa22-9d031aa542d2 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 7h) mgr: a(active, since 2d) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 7h), 3 in (since 2d) task status: scrub status: mds.ocs-storagecluster-cephfilesystem-a: idle mds.ocs-storagecluster-cephfilesystem-b: idle data: pools: 9 pools, 240 pgs objects: 380 objects, 1.1 GiB usage: 6.7 GiB used, 143 GiB / 150 GiB avail pgs: 240 active+clean io: client: 1.2 KiB/s rd, 7.3 KiB/s wr, 2 op/s rd, 0 op/s wr Available Pools with PGs: ocs-storagecluster-cephblockpool 128 ocs-storagecluster-cephfilesystem-metadata 32 ocs-storagecluster-cephobjectstore.rgw.control 8 ocs-storagecluster-cephfilesystem-data0 32 ocs-storagecluster-cephobjectstore.rgw.meta 8 ocs-storagecluster-cephobjectstore.rgw.log 8 ocs-storagecluster-cephobjectstore.rgw.buckets.index 8 ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 8 .rgw.root 8
Have you tried also with OCS 4.6?
To summarize the discussion with Josh, the issue is that the autoscaler is scaling the block pool up to 128 PGs unexpectedly. Then we hit the PG limit and the object store cannot complete its initialization. Josh Durgin, 12:26 PM my guess is there was a delay between rbd pool creation and rgw pools being created so the autoscaler acted on just the rbd pool then later rgw pools were created, with minimum sizes if rbd and cephfs metadata are the only pools, autoscaling to 0.49 should result in 128 pgs for rbd possibly in earlier tests the rbd pool was created when there were 0 or 1 osd, resulting in the minimum (32 pgs) for it Travis Nielsen, 12:30 PM The delay for rgw pool creation isn't typically more than a minute or two. Perhaps the delay was larger than usual for this cluster? It's just surprising that we haven't seen this before. Josh Durgin, 12:31 PM agreed, I'm surprised this is the first time we're hitting this There is always a delay during pool creation when OCS is getting setup, so the question is still why this is happening in the IBM cluster since we haven't seen this behavior before.
Is there a way to recover from this scenario for this cluster?
(In reply to Yaniv Kaul from comment #2) > Have you tried also with OCS 4.6? IBM ROKS has tried with both OCS 4.5 and OCS 4.6..this is the first time we are hitting the issue. Happened on 1 cu cluster with OCS 4.5. So not sure if it can be reproduced (based on Travis' comment)
IBM team has updated that they have hit this issue again with OCS 4.6 as well.
*** Bug 1900910 has been marked as a duplicate of this bug. ***
Some further information on the issue on IBM ROKS - Seen on multiple clusters in EU region when 2 or more zones are used (Single zone cluster deployed successfully, so did a cluster which had worker nodes from eu-de1 and eu-de2 regions - however failed when using nodes from eu-de1 and eu-de3) - Seen with both OCS 4.5 and OCS 4.6
Per recommendation from Josh, we should just increase the limit of PGs per OSD. This override needs to be set in the OCS operator along with other ceph overrides: https://github.com/openshift/ocs-operator/blob/287d69621ee400034119bb39b769d79b26dd1e5b/controllers/storagecluster/reconcile.go#L45-L49 Increasing the setting to 280 (default is 250) would get us over the limit, but perhaps we should round off to 300 to give some buffer. @Josh Any concerns with this default in OCS? mon_max_pg_per_osd = 300
(In reply to Travis Nielsen from comment #12) > Per recommendation from Josh, we should just increase the limit of PGs per > OSD. > > This override needs to be set in the OCS operator along with other ceph > overrides: > https://github.com/openshift/ocs-operator/blob/ > 287d69621ee400034119bb39b769d79b26dd1e5b/controllers/storagecluster/ > reconcile.go#L45-L49 > > Increasing the setting to 280 (default is 250) would get us over the limit, > but perhaps we should round off to 300 to give some buffer. > @Josh Any concerns with this default in OCS? > > mon_max_pg_per_osd = 300 No concerns from me.
This was verified on OCS 4.6 here: For OCS 4.7 I am running tier1 execution here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/176/ Problem is that on IBM Cloud I can install only OCP 4.5 so that's unsupported deployment and not sure if it will succeed to install OCS 4.7 on top of OCP 4.5. The OCP 4.6 should be available in about 2 weeks.
Just checking 4.7 cluster on OCP 4.5 and see those pods running: $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.7.0-250.ci OpenShift Container Storage 4.7.0-250.ci Succeeded $ oc get pod -n openshift-storage NAME READY STATUS RESTARTS AGE 10243128100-debug 1/1 Running 0 3m51s 10243128101-debug 1/1 Running 0 3m52s 1024312899-debug 1/1 Running 0 3m52s csi-cephfsplugin-2qlkd 3/3 Running 0 153m csi-cephfsplugin-8xcdx 3/3 Running 0 153m csi-cephfsplugin-provisioner-697dfb4d67-5xtck 6/6 Running 0 153m csi-cephfsplugin-provisioner-697dfb4d67-zn7fh 6/6 Running 0 153m csi-cephfsplugin-qlj7j 3/3 Running 0 153m csi-rbdplugin-55gmx 3/3 Running 0 153m csi-rbdplugin-provisioner-79488647bb-kd4xv 6/6 Running 0 153m csi-rbdplugin-provisioner-79488647bb-xnd9k 6/6 Running 0 153m csi-rbdplugin-q8q74 3/3 Running 0 153m csi-rbdplugin-rb8gm 3/3 Running 0 153m must-gather-8hjrx-helper 1/1 Running 0 3m52s noobaa-core-0 1/1 Running 0 140m noobaa-db-pg-0 1/1 Running 0 140m noobaa-endpoint-798ff969bd-mrj7q 1/1 Running 0 61m noobaa-endpoint-798ff969bd-qg5zv 1/1 Running 0 138m noobaa-operator-cc5cb6f5-6l4zb 1/1 Running 0 154m ocs-metrics-exporter-76bff567d9-4fkzb 1/1 Running 0 154m ocs-operator-7997c9657d-hvsj4 1/1 Running 0 154m pv-backingstore-9c16562b79ee4cb48711705c-noobaa-pod-c238c73c 1/1 Running 0 6m45s rook-ceph-crashcollector-10.243.128.100-d9f9944d5-d25s8 1/1 Running 0 147m rook-ceph-crashcollector-10.243.128.101-7c89b58844-m2764 1/1 Running 0 152m rook-ceph-crashcollector-10.243.128.99-68fd459776-pstjv 1/1 Running 0 145m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7cf9cf47vbvrr 2/2 Running 0 139m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-597c7b58j7lc7 2/2 Running 0 139m rook-ceph-mgr-a-78b645f4fd-7l8kw 2/2 Running 0 144m rook-ceph-mon-a-7f4c4586d7-fmr9l 2/2 Running 0 152m rook-ceph-mon-b-5664dc84cc-vxk5c 2/2 Running 0 147m rook-ceph-mon-c-f5c59d475-zgdnm 2/2 Running 0 145m rook-ceph-operator-8446c87b68-bxj5l 1/1 Running 0 154m rook-ceph-osd-0-9b9887f7f-6bjp2 2/2 Running 0 140m rook-ceph-osd-1-77b687965f-zcvns 2/2 Running 0 140m rook-ceph-osd-2-c9675cc5c-6qnsf 2/2 Running 0 140m rook-ceph-osd-prepare-ocs-deviceset-0-data-0j7gr8-wtk5r 0/1 Completed 0 144m rook-ceph-osd-prepare-ocs-deviceset-1-data-092pvq-vc85h 0/1 Completed 0 144m rook-ceph-osd-prepare-ocs-deviceset-2-data-0ng7b2-w2bzr 0/1 Completed 0 144m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-64bbc5dkxj6k 2/2 Running 0 139m rook-ceph-tools-7dcc6577d9-k6glg 1/1 Running 0 139m I see only one RGW pod in 4.7 instead of 2 of them like in 4.6 but it was changed like this between the versions so I marking as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041