Bug 1366577

Summary: Wrong calculation of PGs peer OSD leads to cluster in HEALTH_WARN state with explanation "too many PGs per OSD (768 > max 300)"
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Daniel Horák <dahorak>
Component: CephAssignee: Shubhendu Tripathi <shtripat>
Ceph sub component: configuration QA Contact: Daniel Horák <dahorak>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: asriram, ceph-docs, ceph-eng-bugs, dzafman, hnallurv, jowilkin, kchai, kchidamb, kdreyer, ldachary, ltrilety, mbukatov, mkudlej, nthomas, rghatvis, shtripat, sjust, vsarmila
Version: 2   
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhscon-core-0.0.44-1.el7scon.x86_64, rhscon-ui-0.0.58-1.el7scon.noarch Doc Type: Bug Fix
Doc Text:
Previously, the automatic PG calculation logic caused problems as it calculated on per pool basis instead of calculating on a cluster level based on the number of OSDs in the cluster. This incorrect PG calculation issued cluster health warning due to large number of PGs being created during each pool creation. With this update, the automatic calculation of PGs is disabled. The administrator needs to manually provide the PG values per OSD by using the PG calculator tool from Ceph to ensure the cluster remains in a healthy state.
Story Points: ---
Clone Of: 1362403 Environment:
Last Closed: 2016-10-19 15:21:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1362403    
Bug Blocks: 1357777    

Comment 2 Shubhendu Tripathi 2016-08-22 06:33:08 UTC
Hi John,

So if I understand it correct, the final values for PGs should be like below-

OSDs                    PGs
============================
<=5                     128
>5 && <=10              512
>10 && <=20             1024
>20 && <=30             2048
>30 && <=40             3072
>40 && <=50             4096
>50                     Use the formula to calculate PGs

Is this understanding correct and should we go ahead and implement this?

Comment 3 Shubhendu Tripathi 2016-08-25 05:14:25 UTC
Sam, can you please confirm the above comment#2.
If this understanding is correct?

Comment 4 Samuel Just 2016-08-26 15:16:19 UTC
                   pgs/osd
osds    pgs    size 2  size 3
5       128     76      51
6       512     256     170
10      512     153     102
11      1024    279     186
20      1024    153     102
21      2048    292     195
30      2048    204     136
31      3072    297     198
40      3072    230     153
41      4096    299     199
50      4096    245     163

With replication set to 2, these numbers work pretty well (between 100 and 200).  With replication set to 3, it's a little on the high side, but still between 150 and 300.  That's probably ok if we need to use the same guideline for both.  300 pgs/osd is on the high side, but not horrible.

Comment 5 Shubhendu Tripathi 2016-08-26 15:58:18 UTC
So looks like the logic in comment#2 looks good. I would go ahead with this.
Thanks Sam for explanation.

Comment 6 Shubhendu Tripathi 2016-09-08 08:38:37 UTC
As confirmed in comment#5 below changes would be done for this issue

1. Update backend logic to calculate the pg nums as per comment#2
2. Update UI slider logic to adhere to logic in comment#2

Comment 7 Karnan 2016-09-30 08:42:54 UTC
Fixed as per comments 2, 3 in https://bugzilla.redhat.com/show_bug.cgi?id=1375538

Comment 9 Daniel Horák 2016-10-03 13:33:07 UTC
What is the final resolution of this Bug?
Accordingly to comment 5, there should be some update in the logic for automatic PG calculation, but accordingly to Comment 7 (pointing to Bug 1375538 comment 2 and 3), the automatic calculation should be completely removed.

Comment 10 Shubhendu Tripathi 2016-10-04 03:58:17 UTC
Daniel, as per latest discussions with Michael Kidd, we no more would do auto calculation of PG nums and there is a text box provided in UI to admin to enter the value. Also there would be a link provided to PG Calc tool in UI.

Refer https://bugzilla.redhat.com/show_bug.cgi?id=1375538 for more details on this decision.

Comment 11 Daniel Horák 2016-10-04 06:16:58 UTC
I'm verifying this bug, as per comment 10, the PG calculation logic was completely removed and there is just box to set manually the number of PGs. Additional details related to the change will be verified in bug 1375538.

Red Hat Enterprise Linux Server release 7.3 (Maipo)

rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.59-1.el7scon.noarch

>> VERIFIED

Comment 13 Shubhendu Tripathi 2016-10-17 12:17:53 UTC
Doc-text looks good.

Comment 14 errata-xmlrpc 2016-10-19 15:21:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082