Bug 1375538
Summary: | PG count for pool creation is hard set and calculated in a wrong way | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Storage Console | Reporter: | Martin Bukatovic <mbukatov> |
Component: | core | Assignee: | Shubhendu Tripathi <shtripat> |
core sub component: | provisioning | QA Contact: | Martin Bukatovic <mbukatov> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | japplewh, julim, kchidamb, linuxkidd, nthomas, rghatvis, shtripat, vsarmila |
Version: | 2 | ||
Target Milestone: | --- | ||
Target Release: | 2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | rhscon-ceph-0.0.43-1.el7scon.x86_64, rhscon-ui-0.0.60-1.el7scon.noarch | Doc Type: | Bug Fix |
Doc Text: |
Previously, the automatic PG calculation logic caused problems as it calculated on per pool basis instead of calculating on a cluster level based on the number of OSDs in the cluster and PGs should be shared across the pools in the cluster. This incorrect PG calculation issued cluster health warning due to large number of PGs being created during each pool creation.
With this update, the automatic calculation of PGs is disabled. The administrator needs to manually provide the PG values per OSD by using the PG calculator tool from Ceph to ensure the cluster remains in a healthy state.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-10-19 15:22:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1357777 |
Description
Martin Bukatovic
2016-09-13 10:49:34 UTC
rewriting pgcalc in the async timeframe is not tenable so we should expose the default of 0 PGs in an editable form for the user to adjust. To summarize below are the changes which would be done - 1. Provide a text box in UI for enter pg num while creating a pool (with default value set as zero) 2. Have a check to validate negative values provided for pg num 3. Add a link to pgcalc tool next to pg num with help icon saying "Be aware that pg count per pool is critical. please visit pg calc tool to better understand what value should be used" 4. While expand cluster flow using new OSD nodes, show a warning to mention that "With expansion of cluster with OSD, cluster coming to non usable state would be very much possible as it involves movement of data across placement groups" 5. Add a checkbox to accept the expansion from admin, and if selected then only allow expansion submit from UI screen 6. In backend, dont calculate the pg num automatically and always expact the value from api to be passed. @Michael/Ju, need your help t frame the warning messages in step-3 and step-4. Kindly provide your inputs. For item 2, also validate non-zero My suggestions on warning texts below: 3. "Be aware that the PG count per pool value is critical for cluster performance and stability. Please visit the Ceph PGs per Pool Calc tool to better understand what value should be used." 4. "Ceph cluster expansion requires data movement between OSDs and can cause significant client IO performance impact if proper adjustments are not made. Please contact Red Hat support for help with the recommended changes." @Ju, can you ack this please? Checking with packages (on RHEL 7.3 based, RHSCon 2.0 sever machine): rhscon-ceph-0.0.43-1.el7scon.x86_64 rhscon-ui-0.0.59-1.el7scon.noarch rhscon-core-0.0.45-1.el7scon.x86_64 rhscon-core-selinux-0.0.45-1.el7scon.noarch Following the reproducer from the description of this BZ, I see the following issues: 1) On the "Add Object Storage" page, the explanation of importance of PG number calculation is present (as proposed in comment 3), but a direct html link to pgcalc tool is missing. Based on the description of the bug and proposal in comment 2, I would expect that link to the PG calc tool should be there. 2) The form on "Add Object Storage" page doesn't check for zero value of PG field. It's possible to submit a request with zero PG number, which would fail in the end, but console doesn't directly show any error. The form should both display a warning for a zero value in the same way as for the negative number and doesn't allow to click on next button to submit such invalid request. Looking at your original description, especially these properties of PG number:
> The per-pool calculations should be rounded to a power of 2, not the overall
> cluster value. It's unclear which is intended in the slide deck, but the
> per-pool value is what's important.
>
> Per pool PG count ( pg_num * size ) should not be allowed to be less than the
> OSD count in the cluster as this would limit performance of that pool.
I'm wondering if it would make sense for the form on "Add Object Storage" page
to reject PG value which doesn't meet these requirements in a similar way how
it rejects negative values and how it should reject zero value.
Martin, While it would be great to have rules around the PG value, that would entail adding more logic and confirming it's implemented properly before the async update which doesn't seem realistic. So for this async update, simply removing the default enforcement, allowing a manual specification of PG count and linking to the PG calc tool is as good as I believe we can get. Ultimately, we would have enforcement of the pg calc tool values and provide a means for the end user to override by acknowledging if they change the value, non-optimal behavior may be experienced (wording tbd). Micheal, We can stop suggesting PG value by keeping it empty and validate user input to stop giving negative and zero values. Also, as you suggest, we can add a small warning message. Can you reply back with the exact warning message? (In reply to Shubhendu Tripathi from comment #2) > 4. While expand cluster flow using new OSD nodes, show a warning to mention > that "With expansion of cluster with OSD, cluster coming to non usable state > would be very much possible as it involves movement of data across placement > groups" > 5. Add a checkbox to accept the expansion from admin, and if selected then > only allow expansion submit from UI screen Just for the sake of keeping thing organized, those items are covered in BZ 1375972 and not this one. (In reply to Michael J. Kidd from comment #7) > While it would be great to have rules around the PG value, that would > entail adding more logic and confirming it's implemented properly before the > async update which doesn't seem realistic. So for this async update, simply > removing the default enforcement, allowing a manual specification of PG > count and linking to the PG calc tool is as good as I believe we can get. > > Ultimately, we would have enforcement of the pg calc tool values and provide > a means for the end user to override by acknowledging if they change the > value, non-optimal behavior may be experienced (wording tbd). So it's not reasonable to add any additional checks. Thanks of the clarification. Karnan: See Comment #3. The message is already in the test build I was given access to, but was missing the link to the PG Calc tool. I provided that feedback via email on the request to check the current message state. Added link to pgcalc tool in the warning message. Also added validation to the pg number input. Checking with packages (on RHEL 7.3 based, RHSCon 2.0 sever machine): rhscon-core-selinux-0.0.45-1.el7scon.noarch rhscon-ui-0.0.60-1.el7scon.noarch rhscon-ceph-0.0.43-1.el7scon.x86_64 rhscon-core-0.0.45-1.el7scon.x86_64 and I see that: * note now includes link to https://access.redhat.com/labs/cephpgc/ * there is no default value for "Placement Groups" field * for zero value of PG, error message is displayed and "next" button disabled Doc-text looks good. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:2082 |