Bug 1914979 - [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit issue
Summary: [GSS][VMWare][ROKS] rgw pods are not showing up in OCS 4.5 - due to pg_limit ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: OCS 4.6.2
Assignee: Jose A. Rivera
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On: 1908414
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 16:10 UTC by Bipin Kunal
Modified: 2022-11-21 03:07 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, there was a race condition with the Red Hat Ceph Storage PG autoscaler that caused the creation of 128 PGs instead of the default 32. This meant RADOS Object Gateway (RGW) pods would fail to come up. With this update, the limit of PGs per OSD is now 300 rather than 250. This prevents the creation of additional pools in small clusters avoiding the RGW pod failures.
Clone Of: 1908414
Environment:
Last Closed: 2021-02-01 13:18:34 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 989 0 None closed Adjust mon_max_pg_per_osd for Rook-Ceph 2021-02-15 09:14:58 UTC
Red Hat Product Errata RHBA-2021:0305 0 None None None 2021-02-01 13:18:49 UTC

Comment 3 Travis Nielsen 2021-01-11 21:00:04 UTC
Moving to the OCS operator to apply the ceph setting override for PGs. See this comment for details: https://bugzilla.redhat.com/show_bug.cgi?id=1908414#c12

Comment 9 Petr Balogh 2021-01-27 14:19:08 UTC
Connected on one of cluster where we did run latest tier4a execution:
https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/16709/

with 4.6.2 RC build and see RGW pods are present:
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6895cbdp24b7   1/1     Running     0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-ccc44cdxcdqc   1/1     Running     0          26h


$ oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
1024312878-debug                                                  0/1     Completed   0          154m
1024312879-debug                                                  0/1     Completed   0          154m
1024312880-debug                                                  0/1     Completed   0          154m
csi-cephfsplugin-provisioner-6c49c688b7-42sxb                     6/6     Running     0          3h42m
csi-cephfsplugin-provisioner-6c49c688b7-tc2nj                     6/6     Running     0          26h
csi-cephfsplugin-q9vps                                            3/3     Running     0          26h
csi-cephfsplugin-w9jrj                                            3/3     Running     0          26h
csi-cephfsplugin-zt9vj                                            3/3     Running     0          4h1m
csi-rbdplugin-provisioner-d7c77f88d-nsk8v                         6/6     Running     0          3h52m
csi-rbdplugin-provisioner-d7c77f88d-wrjzp                         6/6     Running     0          26h
csi-rbdplugin-ps7rw                                               3/3     Running     0          26h
csi-rbdplugin-wk5zr                                               3/3     Running     0          26h
csi-rbdplugin-xwlm9                                               3/3     Running     0          4h10m
noobaa-core-0                                                     1/1     Running     0          26h
noobaa-db-0                                                       1/1     Running     0          26h
noobaa-endpoint-79bd7f7dfd-x4hnd                                  1/1     Running     0          26h
noobaa-operator-6ddc85d449-b9k5m                                  1/1     Running     0          26h
ocs-metrics-exporter-56f646bc5d-knd9t                             1/1     Running     0          26h
ocs-operator-78bb659978-6nsjw                                     1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.78-5f69844d7b-bpknw           1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.79-84b97b7455-z697c           1/1     Running     0          26h
rook-ceph-crashcollector-10.243.128.80-d5c44779-rdlml             1/1     Running     0          26h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6985d89dpszdl   1/1     Running     5          26h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7f9db97bgx5d9   1/1     Running     0          26h
rook-ceph-mgr-a-75cffd7ff-64tvn                                   1/1     Running     0          4h31m
rook-ceph-mon-a-698575bc89-29b5b                                  1/1     Running     10         26h
rook-ceph-mon-b-7d7cf694cd-84r8z                                  1/1     Running     0          26h
rook-ceph-mon-c-855b7cfdcc-p24z6                                  1/1     Running     0          26h
rook-ceph-operator-5988f7dcff-m6hhn                               1/1     Running     0          26h
rook-ceph-osd-0-59cc65bdf8-lsxg9                                  1/1     Running     0          4h21m
rook-ceph-osd-1-547fdb9dcf-xtfn2                                  1/1     Running     0          26h
rook-ceph-osd-2-7b446cf45-pp9xd                                   1/1     Running     0          26h
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-7nfr6-vxszb          0/1     Completed   0          26h
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-rq6ff-p54m5          0/1     Completed   0          26h
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-xh2bh-nz767          0/1     Completed   0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-6895cbdp24b7   1/1     Running     0          26h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-ccc44cdxcdqc   1/1     Running     0          26h
rook-ceph-tools-d87986957-sph5q                                   1/1     Running     0          26h

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.2-233.ci   OpenShift Container Storage   4.6.2-233.ci              Succeeded

$ oc version
Client Version: 4.5.0-0.nightly-2020-12-05-205859
Server Version: 4.5.24
Kubernetes Version: v1.18.3+fa69cae

@akgunjal.com will provide more info after IBM Cloud team testing so then we can move to verified but based on what I see from above it looks ok.

Comment 11 akgunjal@in.ibm.com 2021-01-28 09:41:35 UTC
@petr: We have verified this fix in EU region and it works fine.

Comment 15 errata-xmlrpc 2021-02-01 13:18:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.6.2 container bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0305


Note You need to log in before you can comment on or make changes to this bug.