1914979 – [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit issue

Bug 1914979 - [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit issue

Summary: [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.6.2
Assignee:	Jose A. Rivera
QA Contact:	Petr Balogh
Docs Contact:
URL:
Whiteboard:
Depends On:	1908414
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-11 16:10 UTC by Bipin Kunal
Modified:	2022-11-21 03:07 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, there was a race condition with the Red Hat Ceph Storage PG autoscaler that caused the creation of 128 PGs instead of the default 32. This meant RADOS Object Gateway (RGW) pods would fail to come up. With this update, the limit of PGs per OSD is now 300 rather than 250. This prevents the creation of additional pools in small clusters avoiding the RGW pod failures.
Clone Of:	1908414
Environment:
Last Closed:	2021-02-01 13:18:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ocs-operator pull 989	0	None	closed	Adjust mon_max_pg_per_osd for Rook-Ceph	2021-02-15 09:14:58 UTC
Red Hat Product Errata	RHBA-2021:0305	0	None	None	None	2021-02-01 13:18:49 UTC

Comment 3 Travis Nielsen 2021-01-11 21:00:04 UTC

Moving to the OCS operator to apply the ceph setting override for PGs. See this comment for details: https://bugzilla.redhat.com/show_bug.cgi?id=1908414#c12

Comment 9 Petr Balogh 2021-01-27 14:19:08 UTC

Connected on one of cluster where we did run latest tier4a execution:
https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/16709/

with 4.6.2 RC build and see RGW pods are present:
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6895cbdp24b7   1/1     Running     0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-ccc44cdxcdqc   1/1     Running     0          26h


$ oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
1024312878-debug                                                  0/1     Completed   0          154m
1024312879-debug                                                  0/1     Completed   0          154m
1024312880-debug                                                  0/1     Completed   0          154m
csi-cephfsplugin-provisioner-6c49c688b7-42sxb                     6/6     Running     0          3h42m
csi-cephfsplugin-provisioner-6c49c688b7-tc2nj                     6/6     Running     0          26h
csi-cephfsplugin-q9vps                                            3/3     Running     0          26h
csi-cephfsplugin-w9jrj                                            3/3     Running     0          26h
csi-cephfsplugin-zt9vj                                            3/3     Running     0          4h1m
csi-rbdplugin-provisioner-d7c77f88d-nsk8v                         6/6     Running     0          3h52m
csi-rbdplugin-provisioner-d7c77f88d-wrjzp                         6/6     Running     0          26h
csi-rbdplugin-ps7rw                                               3/3     Running     0          26h
csi-rbdplugin-wk5zr                                               3/3     Running     0          26h
csi-rbdplugin-xwlm9                                               3/3     Running     0          4h10m
noobaa-core-0                                                     1/1     Running     0          26h
noobaa-db-0                                                       1/1     Running     0          26h
noobaa-endpoint-79bd7f7dfd-x4hnd                                  1/1     Running     0          26h
noobaa-operator-6ddc85d449-b9k5m                                  1/1     Running     0          26h
ocs-metrics-exporter-56f646bc5d-knd9t                             1/1     Running     0          26h
ocs-operator-78bb659978-6nsjw                                     1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.78-5f69844d7b-bpknw           1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.79-84b97b7455-z697c           1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.80-d5c44779-rdlml             1/1     Running     0          26h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6985d89dpszdl   1/1     Running     5          26h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7f9db97bgx5d9   1/1     Running     0          26h
rook-ceph-mgr-a-75cffd7ff-64tvn                                   1/1     Running     0          4h31m
rook-ceph-mon-a-698575bc89-29b5b                                  1/1     Running     10         26h
rook-ceph-mon-b-7d7cf694cd-84r8z                                  1/1     Running     0          26h
rook-ceph-mon-c-855b7cfdcc-p24z6                                  1/1     Running     0          26h
rook-ceph-operator-5988f7dcff-m6hhn                               1/1     Running     0          26h
rook-ceph-osd-0-59cc65bdf8-lsxg9                                  1/1     Running     0          4h21m
rook-ceph-osd-1-547fdb9dcf-xtfn2                                  1/1     Running     0          26h
rook-ceph-osd-2-7b446cf45-pp9xd                                   1/1     Running     0          26h
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-7nfr6-vxszb          0/1     Completed   0          26h
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-rq6ff-p54m5          0/1     Completed   0          26h
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-xh2bh-nz767          0/1     Completed   0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6895cbdp24b7   1/1     Running     0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-ccc44cdxcdqc   1/1     Running     0          26h
rook-ceph-tools-d87986957-sph5q                                   1/1     Running     0          26h

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.2-233.ci   OpenShift Container Storage   4.6.2-233.ci              Succeeded

$ oc version
Client Version: 4.5.0-0.nightly-2020-12-05-205859
Server Version: 4.5.24
Kubernetes Version: v1.18.3+fa69cae

@akgunjal.com will provide more info after IBM Cloud team testing so then we can move to verified but based on what I see from above it looks ok.

Comment 11 akgunjal@in.ibm.com 2021-01-28 09:41:35 UTC

@petr: We have verified this fix in EU region and it works fine.

Comment 15 errata-xmlrpc 2021-02-01 13:18:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.6.2 container bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0305

Note You need to log in before you can comment on or make changes to this bug.

akgunjal
assingh
bkunal
ebenahar
edonnell
gsitlani
jarrpa
jthottan
kvellalo
madam
mmanjuna
muagarwa
nberry
ocs-bugs
owasserm
pbalogh
pdhange
sabose
sostapov
tnielsen