Bug 1731148

Summary: multisite pg_num on site2 pools should use site1/source values
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Harrigan <jharriga>
Component: Ceph-AnsibleAssignee: Ali Maredia <amaredia>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: low Docs Contact:
Priority: low    
Version: 4.0CC: amaredia, aschoen, assingh, ceph-eng-bugs, ceph-qe-bugs, gabrioux, gmeno, hyelloji, nthomas, tserlin, vumrao
Target Milestone: rcFlags: hyelloji: needinfo-
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.15-1.el8, ceph-ansible-4.0.15-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-19 17:30:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1727980    
Attachments:
Description Flags
pgcalcl-2019-07-18.png Ceph PGCalc sample none

Description John Harrigan 2019-07-18 13:04:07 UTC
Description of problem:
Using RHCS 3.2z2 and ceph-ansible to deploy multisite, the pools on site2
are created using default pg_num values. This can result in very poor performance.

The pg_num values on site2 should inherit the pg_num value from the existing
pools on site1 

Version-Release number of selected component (if applicable):
ceph-ansible.noarch      3.2.15-1.el7cp

How reproducible:
Always

Steps to Reproduce:
1. deploy site1
2. create pools on site1 with pg_num values as suggested by pg num calc ( https://access.redhat.com/labsinfo/cephpgc )
3. deploy site2
4. edit all.yaml for multisite values
5. run ceph-ansible
6. view pg_num values on site1 and site2

Actual results:
root@f18-h14-000-r620:~
# for i in `rados lspools` ; do echo -ne $i"\t" ; ceph osd pool get $i pg_num ; done
default.rgw.users.keys pg_num: 64
default.rgw.data.root pg_num: 64
.rgw.root pg_num: 64
default.rgw.control pg_num: 64
default.rgw.gc pg_num: 64
default.rgw.buckets.data pg_num: 1024
default.rgw.buckets.index pg_num: 128
default.rgw.buckets.extra pg_num: 64
default.rgw.log pg_num: 64
default.rgw.meta pg_num: 64
default.rgw.intent-log pg_num: 64
default.rgw.usage pg_num: 64
default.rgw.users pg_num: 64
default.rgw.users.email pg_num: 64
default.rgw.users.swift pg_num: 64
default.rgw.users.uid pg_num: 64
site2.rgw.meta pg_num: 8
site2.rgw.log pg_num: 8
site2.rgw.control pg_num: 8
site2.rgw.buckets.index pg_num: 8
site2.rgw.buckets.data pg_num: 8

Expected results:
The pg_num values on site1 default.rgw pools and site2.rgw pools match

Additional info:

Comment 1 John Harrigan 2019-07-18 16:00:13 UTC
Added attachment 'pgcalc2019-07-18.png'
This is the output from the pg calculator at    https://ceph.com/pgcalc/
I selected "Ceph Use Case Selector: Rados Gateway Only - Jewel or later"
All other values are defaults, including "Size", "OSD #" and "Target PGs per OSD"

There are three distinct pg_num values here:
* default.rgw.buckets.data = 4096
* default.rgw.buckets.index = 128
* default.rgw.*             = 64

Comment 2 John Harrigan 2019-07-18 16:01:44 UTC
Created attachment 1591827 [details]
pgcalcl-2019-07-18.png Ceph PGCalc sample

Comment 3 Giridhar Ramaraju 2019-08-05 13:11:57 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 4 Giridhar Ramaraju 2019-08-05 13:12:49 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 13 errata-xmlrpc 2020-05-19 17:30:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:2231