Bug 1914979

Summary: [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit issue
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Bipin Kunal <bkunal>
Component: ocs-operatorAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED ERRATA QA Contact: Petr Balogh <pbalogh>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: akgunjal, assingh, bkunal, ebenahar, edonnell, gsitlani, jarrpa, jthottan, kvellalo, madam, mmanjuna, muagarwa, nberry, ocs-bugs, owasserm, pbalogh, pdhange, sabose, sostapov, tnielsen
Target Milestone: ---Keywords: AutomationBackLog, ZStream
Target Release: OCS 4.6.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, there was a race condition with the Red Hat Ceph Storage PG autoscaler that caused the creation of 128 PGs instead of the default 32. This meant RADOS Object Gateway (RGW) pods would fail to come up. With this update, the limit of PGs per OSD is now 300 rather than 250. This prevents the creation of additional pools in small clusters avoiding the RGW pod failures.
Story Points: ---
Clone Of: 1908414 Environment:
Last Closed: 2021-02-01 13:18:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1908414    
Bug Blocks:    

Comment 3 Travis Nielsen 2021-01-11 21:00:04 UTC
Moving to the OCS operator to apply the ceph setting override for PGs. See this comment for details: https://bugzilla.redhat.com/show_bug.cgi?id=1908414#c12

Comment 9 Petr Balogh 2021-01-27 14:19:08 UTC
Connected on one of cluster where we did run latest tier4a execution:
https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/16709/

with 4.6.2 RC build and see RGW pods are present:
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6895cbdp24b7   1/1     Running     0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-ccc44cdxcdqc   1/1     Running     0          26h


$ oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
1024312878-debug                                                  0/1     Completed   0          154m
1024312879-debug                                                  0/1     Completed   0          154m
1024312880-debug                                                  0/1     Completed   0          154m
csi-cephfsplugin-provisioner-6c49c688b7-42sxb                     6/6     Running     0          3h42m
csi-cephfsplugin-provisioner-6c49c688b7-tc2nj                     6/6     Running     0          26h
csi-cephfsplugin-q9vps                                            3/3     Running     0          26h
csi-cephfsplugin-w9jrj                                            3/3     Running     0          26h
csi-cephfsplugin-zt9vj                                            3/3     Running     0          4h1m
csi-rbdplugin-provisioner-d7c77f88d-nsk8v                         6/6     Running     0          3h52m
csi-rbdplugin-provisioner-d7c77f88d-wrjzp                         6/6     Running     0          26h
csi-rbdplugin-ps7rw                                               3/3     Running     0          26h
csi-rbdplugin-wk5zr                                               3/3     Running     0          26h
csi-rbdplugin-xwlm9                                               3/3     Running     0          4h10m
noobaa-core-0                                                     1/1     Running     0          26h
noobaa-db-0                                                       1/1     Running     0          26h
noobaa-endpoint-79bd7f7dfd-x4hnd                                  1/1     Running     0          26h
noobaa-operator-6ddc85d449-b9k5m                                  1/1     Running     0          26h
ocs-metrics-exporter-56f646bc5d-knd9t                             1/1     Running     0          26h
ocs-operator-78bb659978-6nsjw                                     1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.78-5f69844d7b-bpknw           1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.79-84b97b7455-z697c           1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.80-d5c44779-rdlml             1/1     Running     0          26h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6985d89dpszdl   1/1     Running     5          26h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7f9db97bgx5d9   1/1     Running     0          26h
rook-ceph-mgr-a-75cffd7ff-64tvn                                   1/1     Running     0          4h31m
rook-ceph-mon-a-698575bc89-29b5b                                  1/1     Running     10         26h
rook-ceph-mon-b-7d7cf694cd-84r8z                                  1/1     Running     0          26h
rook-ceph-mon-c-855b7cfdcc-p24z6                                  1/1     Running     0          26h
rook-ceph-operator-5988f7dcff-m6hhn                               1/1     Running     0          26h
rook-ceph-osd-0-59cc65bdf8-lsxg9                                  1/1     Running     0          4h21m
rook-ceph-osd-1-547fdb9dcf-xtfn2                                  1/1     Running     0          26h
rook-ceph-osd-2-7b446cf45-pp9xd                                   1/1     Running     0          26h
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-7nfr6-vxszb          0/1     Completed   0          26h
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-rq6ff-p54m5          0/1     Completed   0          26h
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-xh2bh-nz767          0/1     Completed   0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6895cbdp24b7   1/1     Running     0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-ccc44cdxcdqc   1/1     Running     0          26h
rook-ceph-tools-d87986957-sph5q                                   1/1     Running     0          26h

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.2-233.ci   OpenShift Container Storage   4.6.2-233.ci              Succeeded

$ oc version
Client Version: 4.5.0-0.nightly-2020-12-05-205859
Server Version: 4.5.24
Kubernetes Version: v1.18.3+fa69cae

@akgunjal.com will provide more info after IBM Cloud team testing so then we can move to verified but based on what I see from above it looks ok.

Comment 11 akgunjal@in.ibm.com 2021-01-28 09:41:35 UTC
@petr: We have verified this fix in EU region and it works fine.

Comment 15 errata-xmlrpc 2021-02-01 13:18:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.6.2 container bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0305