Bug 1366577
Summary: | Wrong calculation of PGs peer OSD leads to cluster in HEALTH_WARN state with explanation "too many PGs per OSD (768 > max 300)" | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Storage Console | Reporter: | Daniel Horák <dahorak> |
Component: | Ceph | Assignee: | Shubhendu Tripathi <shtripat> |
Ceph sub component: | configuration | QA Contact: | Daniel Horák <dahorak> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | asriram, ceph-docs, ceph-eng-bugs, dzafman, hnallurv, jowilkin, kchai, kchidamb, kdreyer, ldachary, ltrilety, mbukatov, mkudlej, nthomas, rghatvis, shtripat, sjust, vsarmila |
Version: | 2 | ||
Target Milestone: | --- | ||
Target Release: | 2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | rhscon-core-0.0.44-1.el7scon.x86_64, rhscon-ui-0.0.58-1.el7scon.noarch | Doc Type: | Bug Fix |
Doc Text: |
Previously, the automatic PG calculation logic caused problems as it calculated on per pool basis instead of calculating on a cluster level based on the number of OSDs in the cluster. This incorrect PG calculation issued cluster health warning due to large number of PGs being created during each pool creation.
With this update, the automatic calculation of PGs is disabled. The administrator needs to manually provide the PG values per OSD by using the PG calculator tool from Ceph to ensure the cluster remains in a healthy state.
|
Story Points: | --- |
Clone Of: | 1362403 | Environment: | |
Last Closed: | 2016-10-19 15:21:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1362403 | ||
Bug Blocks: | 1357777 |
Sam, can you please confirm the above comment#2. If this understanding is correct? pgs/osd osds pgs size 2 size 3 5 128 76 51 6 512 256 170 10 512 153 102 11 1024 279 186 20 1024 153 102 21 2048 292 195 30 2048 204 136 31 3072 297 198 40 3072 230 153 41 4096 299 199 50 4096 245 163 With replication set to 2, these numbers work pretty well (between 100 and 200). With replication set to 3, it's a little on the high side, but still between 150 and 300. That's probably ok if we need to use the same guideline for both. 300 pgs/osd is on the high side, but not horrible. So looks like the logic in comment#2 looks good. I would go ahead with this. Thanks Sam for explanation. As confirmed in comment#5 below changes would be done for this issue 1. Update backend logic to calculate the pg nums as per comment#2 2. Update UI slider logic to adhere to logic in comment#2 Fixed as per comments 2, 3 in https://bugzilla.redhat.com/show_bug.cgi?id=1375538 What is the final resolution of this Bug? Accordingly to comment 5, there should be some update in the logic for automatic PG calculation, but accordingly to Comment 7 (pointing to Bug 1375538 comment 2 and 3), the automatic calculation should be completely removed. Daniel, as per latest discussions with Michael Kidd, we no more would do auto calculation of PG nums and there is a text box provided in UI to admin to enter the value. Also there would be a link provided to PG Calc tool in UI. Refer https://bugzilla.redhat.com/show_bug.cgi?id=1375538 for more details on this decision. I'm verifying this bug, as per comment 10, the PG calculation logic was completely removed and there is just box to set manually the number of PGs. Additional details related to the change will be verified in bug 1375538. Red Hat Enterprise Linux Server release 7.3 (Maipo) rhscon-ceph-0.0.43-1.el7scon.x86_64 rhscon-core-0.0.45-1.el7scon.x86_64 rhscon-core-selinux-0.0.45-1.el7scon.noarch rhscon-ui-0.0.59-1.el7scon.noarch >> VERIFIED Doc-text looks good. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:2082 |
Hi John, So if I understand it correct, the final values for PGs should be like below- OSDs PGs ============================ <=5 128 >5 && <=10 512 >10 && <=20 1024 >20 && <=30 2048 >30 && <=40 3072 >40 && <=50 4096 >50 Use the formula to calculate PGs Is this understanding correct and should we go ahead and implement this?