Bug 1908414

Summary: [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit issue
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Kesavan <kvellalo>
Component: ocs-operatorAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED ERRATA QA Contact: Petr Balogh <pbalogh>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: akgunjal, assingh, bkunal, ebenahar, gsitlani, jdurgin, jthottan, madam, mmanjuna, muagarwa, nberry, ocs-bugs, owasserm, pdhange, sabose, sostapov, tdesala, tnielsen
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.7.0-701.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1914979 (view as bug list) Environment:
Last Closed: 2021-05-19 09:17:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1914979    
Attachments:
Description Flags
Attachment contains the rook operator logs none

Description Kesavan 2020-12-16 16:15:03 UTC
Created attachment 1739692 [details]
Attachment contains the rook operator logs

Description of problem (please be detailed as possible and provide log
snippests):
When on installing ocs on IBM VPC cluster,the RGW pods fails to show up in openshift-storage and the storagecluster gets stuck in progressing phase

Version of all relevant components (if applicable):

OCP : 4.5.18
OCS : 4.5.2

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, the OCS is not successfully installed

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Install OCS 4.5 on IBM VPC cluster with 4.5 OCP
2.
3.


Actual results:
RGW pods are not found

Expected results:
RGW pods needs to be running 

Additional info:

Snippet of rook operator log:
2020-12-13 09:54:09.861163 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index"
2020-12-13 09:54:10.894275 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec"
2020-12-13 09:54:11.925748 I | cephclient: setting pool property "pg_num_min" to "8" on pool ".rgw.root"
2020-12-13 09:54:12.159494 I | op-mon: parsing mon endpoints: b=172.21.208.140:6789,c=172.21.220.182:6789,a=172.21.93.55:6789
2020-12-13 09:54:12.159572 I | ceph-object-store-user-controller: CephObjectStore "ocs-storagecluster-cephobjectstore" found
2020-12-13 09:54:13.962642 E | ceph-object-controller: failed to reconcile failed to create object store deployments: failed to create object pools: failed to create data pool: failed to create pool ocs-storagecluster-cephobjectstore.rgw.buckets.data for object store ocs-storagecluster-cephobjectstore.: failed to create replicated pool ocs-storagecluster-cephobjectstore.rgw.buckets.data. Error ERANGE:  pg_num 32 size 3 would mean 816 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)
: exit status 34


Ceph status :
ceph status
  cluster:
    id:     8d52a259-29a5-4220-aa22-9d031aa542d2
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 7h)
    mgr: a(active, since 2d)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 7h), 3 in (since 2d)
 
  task status:
    scrub status:
        mds.ocs-storagecluster-cephfilesystem-a: idle
        mds.ocs-storagecluster-cephfilesystem-b: idle
 
  data:
    pools:   9 pools, 240 pgs
    objects: 380 objects, 1.1 GiB
    usage:   6.7 GiB used, 143 GiB / 150 GiB avail
    pgs:     240 active+clean
 
  io:
    client:   1.2 KiB/s rd, 7.3 KiB/s wr, 2 op/s rd, 0 op/s wr


Available Pools with PGs:
ocs-storagecluster-cephblockpool                         128
ocs-storagecluster-cephfilesystem-metadata               32
ocs-storagecluster-cephobjectstore.rgw.control           8
ocs-storagecluster-cephfilesystem-data0                  32
ocs-storagecluster-cephobjectstore.rgw.meta              8
ocs-storagecluster-cephobjectstore.rgw.log               8
ocs-storagecluster-cephobjectstore.rgw.buckets.index     8
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec    8
.rgw.root                                                8

Comment 2 Yaniv Kaul 2020-12-16 17:52:48 UTC
Have you tried also with OCS 4.6?

Comment 3 Travis Nielsen 2020-12-16 22:02:03 UTC
To summarize the discussion with Josh, the issue is that the autoscaler is scaling the block pool up to 128 PGs unexpectedly. Then we hit the PG limit and the object store cannot complete its initialization.

Josh Durgin, 12:26 PM
my guess is there was a delay between rbd pool creation and rgw pools being created
so the autoscaler acted on just the rbd pool
then later rgw pools were created, with minimum sizes
if rbd and cephfs metadata are the only pools, autoscaling to 0.49 should result in 128 pgs for rbd
possibly in earlier tests the rbd pool was created when there were 0 or 1 osd, resulting in the minimum (32 pgs) for it

Travis Nielsen, 12:30 PM
The delay for rgw pool creation isn't typically more than a minute or two. Perhaps the delay was larger than usual for this cluster? It's just surprising that we haven't seen this before.

Josh Durgin, 12:31 PM
agreed, I'm surprised this is the first time we're hitting this



There is always a delay during pool creation when OCS is getting setup, so the question is still why this is happening in the IBM cluster since we haven't seen this behavior before.

Comment 4 Sahina Bose 2020-12-17 06:08:31 UTC
Is there a way to recover from this scenario for this cluster?

Comment 5 Sahina Bose 2020-12-17 06:10:24 UTC
(In reply to Yaniv Kaul from comment #2)
> Have you tried also with OCS 4.6?

IBM ROKS has tried with both OCS 4.5 and OCS 4.6..this is the first time we are hitting the issue. Happened on 1 cu cluster with OCS 4.5.
So not sure if it can be reproduced (based on Travis' comment)

Comment 6 Sahina Bose 2020-12-23 08:00:37 UTC
IBM team has updated that they have hit this issue again with OCS 4.6 as well.

Comment 8 Travis Nielsen 2021-01-04 23:13:16 UTC
*** Bug 1900910 has been marked as a duplicate of this bug. ***

Comment 9 Sahina Bose 2021-01-06 06:44:32 UTC
Some further information on the issue on IBM ROKS
- Seen on multiple clusters in EU region when 2 or more zones are used (Single zone cluster deployed successfully, so did a cluster which had worker nodes from eu-de1 and eu-de2 regions - however failed when using nodes from eu-de1 and eu-de3)
- Seen with both OCS 4.5 and OCS 4.6

Comment 12 Travis Nielsen 2021-01-11 20:52:31 UTC
Per recommendation from Josh, we should just increase the limit of PGs per OSD. 

This override needs to be set in the OCS operator along with other ceph overrides:
https://github.com/openshift/ocs-operator/blob/287d69621ee400034119bb39b769d79b26dd1e5b/controllers/storagecluster/reconcile.go#L45-L49

Increasing the setting to 280 (default is 250) would get us over the limit, but perhaps we should round off to 300 to give some buffer. 
@Josh Any concerns with this default in OCS?

mon_max_pg_per_osd = 300

Comment 13 Josh Durgin 2021-01-11 21:22:04 UTC
(In reply to Travis Nielsen from comment #12)
> Per recommendation from Josh, we should just increase the limit of PGs per
> OSD. 
> 
> This override needs to be set in the OCS operator along with other ceph
> overrides:
> https://github.com/openshift/ocs-operator/blob/
> 287d69621ee400034119bb39b769d79b26dd1e5b/controllers/storagecluster/
> reconcile.go#L45-L49
> 
> Increasing the setting to 280 (default is 250) would get us over the limit,
> but perhaps we should round off to 300 to give some buffer. 
> @Josh Any concerns with this default in OCS?
> 
> mon_max_pg_per_osd = 300

No concerns from me.

Comment 16 Petr Balogh 2021-02-03 13:10:57 UTC
This was verified on OCS 4.6 here:

For OCS 4.7 I am running tier1 execution here:
https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/176/

Problem is that on IBM Cloud I can install only OCP 4.5 so that's unsupported deployment and not sure if it will succeed to install OCS 4.7 on top of OCP 4.5.

The OCP 4.6 should be available in about 2 weeks.

Comment 17 Petr Balogh 2021-02-03 16:49:15 UTC
Just checking 4.7 cluster on OCP 4.5 and see those pods running:

$ oc get csv -n openshift-storage
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.7.0-250.ci   OpenShift Container Storage   4.7.0-250.ci              Succeeded


$ oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
10243128100-debug                                                 1/1     Running     0          3m51s
10243128101-debug                                                 1/1     Running     0          3m52s
1024312899-debug                                                  1/1     Running     0          3m52s
csi-cephfsplugin-2qlkd                                            3/3     Running     0          153m
csi-cephfsplugin-8xcdx                                            3/3     Running     0          153m
csi-cephfsplugin-provisioner-697dfb4d67-5xtck                     6/6     Running     0          153m
csi-cephfsplugin-provisioner-697dfb4d67-zn7fh                     6/6     Running     0          153m
csi-cephfsplugin-qlj7j                                            3/3     Running     0          153m
csi-rbdplugin-55gmx                                               3/3     Running     0          153m
csi-rbdplugin-provisioner-79488647bb-kd4xv                        6/6     Running     0          153m
csi-rbdplugin-provisioner-79488647bb-xnd9k                        6/6     Running     0          153m
csi-rbdplugin-q8q74                                               3/3     Running     0          153m
csi-rbdplugin-rb8gm                                               3/3     Running     0          153m
must-gather-8hjrx-helper                                          1/1     Running     0          3m52s
noobaa-core-0                                                     1/1     Running     0          140m
noobaa-db-pg-0                                                    1/1     Running     0          140m
noobaa-endpoint-798ff969bd-mrj7q                                  1/1     Running     0          61m
noobaa-endpoint-798ff969bd-qg5zv                                  1/1     Running     0          138m
noobaa-operator-cc5cb6f5-6l4zb                                    1/1     Running     0          154m
ocs-metrics-exporter-76bff567d9-4fkzb                             1/1     Running     0          154m
ocs-operator-7997c9657d-hvsj4                                     1/1     Running     0          154m
pv-backingstore-9c16562b79ee4cb48711705c-noobaa-pod-c238c73c      1/1     Running     0          6m45s
rook-ceph-crashcollector-10.243.128.100-d9f9944d5-d25s8           1/1     Running     0          147m
rook-ceph-crashcollector-10.243.128.101-7c89b58844-m2764          1/1     Running     0          152m
rook-ceph-crashcollector-10.243.128.99-68fd459776-pstjv           1/1     Running     0          145m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7cf9cf47vbvrr   2/2     Running     0          139m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-597c7b58j7lc7   2/2     Running     0          139m
rook-ceph-mgr-a-78b645f4fd-7l8kw                                  2/2     Running     0          144m
rook-ceph-mon-a-7f4c4586d7-fmr9l                                  2/2     Running     0          152m
rook-ceph-mon-b-5664dc84cc-vxk5c                                  2/2     Running     0          147m
rook-ceph-mon-c-f5c59d475-zgdnm                                   2/2     Running     0          145m
rook-ceph-operator-8446c87b68-bxj5l                               1/1     Running     0          154m
rook-ceph-osd-0-9b9887f7f-6bjp2                                   2/2     Running     0          140m
rook-ceph-osd-1-77b687965f-zcvns                                  2/2     Running     0          140m
rook-ceph-osd-2-c9675cc5c-6qnsf                                   2/2     Running     0          140m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0j7gr8-wtk5r           0/1     Completed   0          144m
rook-ceph-osd-prepare-ocs-deviceset-1-data-092pvq-vc85h           0/1     Completed   0          144m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0ng7b2-w2bzr           0/1     Completed   0          144m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-64bbc5dkxj6k   2/2     Running     0          139m
rook-ceph-tools-7dcc6577d9-k6glg                                  1/1     Running     0          139m

I see only one RGW pod in 4.7 instead of 2 of them like in 4.6 but it was changed like this between the versions so I marking as verified.

Comment 20 errata-xmlrpc 2021-05-19 09:17:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041