1375538 – PG count for pool creation is hard set and calculated in a wrong way

Bug 1375538 - PG count for pool creation is hard set and calculated in a wrong way

Summary: PG count for pool creation is hard set and calculated in a wrong way

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	2
Assignee:	Shubhendu Tripathi
QA Contact:	Martin Bukatovic
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Console-2-Async
TreeView+	depends on / blocked

Reported:	2016-09-13 10:49 UTC by Martin Bukatovic
Modified:	2016-10-19 15:22 UTC (History)
CC List:	8 users (show)
Fixed In Version:	rhscon-ceph-0.0.43-1.el7scon.x86_64, rhscon-ui-0.0.60-1.el7scon.noarch
Doc Type:	Bug Fix
Doc Text:	Previously, the automatic PG calculation logic caused problems as it calculated on per pool basis instead of calculating on a cluster level based on the number of OSDs in the cluster and PGs should be shared across the pools in the cluster. This incorrect PG calculation issued cluster health warning due to large number of PGs being created during each pool creation. With this update, the automatic calculation of PGs is disabled. The administrator needs to manually provide the PG values per OSD by using the PG calculator tool from Ceph to ensure the cluster remains in a healthy state.
Clone Of:
Environment:
Last Closed:	2016-10-19 15:22:28 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Gerrithub.io	295614	None	None	None	2016-09-30 05:51:53 UTC
Red Hat Bugzilla	1366577	unspecified	CLOSED	Wrong calculation of PGs peer OSD leads to cluster in HEALTH_WARN state with explanation "too many PGs per OSD (768 > ma...	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2016:2082	normal	SHIPPED_LIVE	Moderate: Red Hat Storage Console 2 security and bug fix update	2017-04-18 19:29:02 UTC

Internal Links: 1366577

Description Martin Bukatovic 2016-09-13 10:49:34 UTC

Description of problem
======================

PG calculation as currently implemented by RHSC 2.0 is wrong (details below).

The worst case scenario is that a ceph cluster ends up in a non recoverable
state. In other words, there is a *riks of data loss* because of this issue.

The only scenario when the current implementation works right is one pool per
cluster, which which not very likely use case.

See pgcalc tool[1] tool which provides proper guidance how to configure
PG count for a ceph pool.

[1] at http://ceph.com/pgcalc/ or https://access.redhat.com/labs/cephpgc/

Version-Release
===============

RHSC 2.0

On RHSC 2.0 server machine:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-ui-0.0.53-1.el7scon.noarch
ceph-installer-1.0.15-1.el7scon.noarch
ceph-ansible-1.0.5-32.el7scon.noarch

On Ceph 2.0 storage machines:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-agent-0.0.18-1.el7scon.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Create new ceph cluster named 'alpha'.
4. Go Storage -> Pools -> Add Storage to start a wizard for ceph pool setup
5. Going through the wizard, select cluster 'alpha' and object storage
6. Stop on "Add Object Storage" page of the wizard and check number of
   placement groups (PG)

Actual results
==============

On the "Add Object Storage" page, the PG number is:

 * pre calculated by console itself (the calculation itself is wrong)
 * not possible for the user to change

Expected results
================

Either (the quick fix):

 * there is no default predefined value for PG count to start with
 * user can edit the value
 * linking to pgcalc tool for proper guidance (downstream version is
   available at https://access.redhat.com/labs/cephpgc/)

Or (the proper fix):

Console implements the same logic as pgcalc tool providing the same guidance
and functionality to the user.

Additional info
===============

Adding comments from Michael Kidd here:

This issue allows the customer to create many Pools with the same PG count
(which isn't taking into account how many pools, and how much data will exist
in them ), and get into a state of too many PGs per OSD.
This is especially critical since you cannot reduce the PG count after the pool
is  created... Instead, a new pool with the proper values must be created,
and all data migrated (a usually painful process).

The per-pool calculations should be rounded to a power of 2, not the overall
cluster value. It's unclear which is intended in the slide deck, but the
per-pool value is what's important.

Per pool PG count ( pg_num * size ) should not be allowed to be less than the
OSD count in the cluster as this would limit performance of that pool.

Comment 1 Jeff Applewhite 2016-09-14 12:47:12 UTC

rewriting pgcalc in the async timeframe is not tenable so we should expose the default of 0 PGs in an editable form for the user to adjust.

Comment 2 Shubhendu Tripathi 2016-09-22 07:18:05 UTC

To summarize below are the changes which would be done -

1. Provide a text box in UI for enter pg num while creating a pool (with default value set as zero)
2. Have a check to validate negative values provided for pg num
3. Add a link to pgcalc tool next to pg num with help icon saying "Be aware that pg count per pool is critical. please visit pg calc tool to better understand what value should be used"
4. While expand cluster flow using new OSD nodes, show a warning to mention that "With expansion of cluster with OSD, cluster coming to non usable state would be very much possible as it involves movement of data across placement groups"
5. Add a checkbox to accept the expansion from admin, and if selected then only allow expansion submit from UI screen
6. In backend, dont calculate the pg num automatically and always expact the value from api to be passed.

@Michael/Ju, need your help t frame the warning messages in step-3 and step-4. Kindly provide your inputs.

Comment 3 Michael J. Kidd 2016-09-22 16:59:17 UTC

For item 2, also validate non-zero

My suggestions on warning texts below:

3. "Be aware that the PG count per pool value is critical for cluster performance and stability. Please visit the Ceph PGs per Pool Calc tool to better understand what value should be used."


4. "Ceph cluster expansion requires data movement between OSDs and can cause significant client IO performance impact if proper adjustments are not made.  Please contact Red Hat support for help with the recommended changes."

Comment 4 Shubhendu Tripathi 2016-09-23 06:51:39 UTC

@Ju, can you ack this please?

Comment 5 Martin Bukatovic 2016-10-03 16:21:16 UTC

Checking with packages (on RHEL 7.3 based, RHSCon 2.0 sever machine):

rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-ui-0.0.59-1.el7scon.noarch
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch

Following the reproducer from the description of this BZ, I see the following
issues:

1) On the "Add Object Storage" page, the explanation of importance of PG number
calculation is present (as proposed in comment 3), but a direct html link to
pgcalc tool is missing.

Based on the description of the bug and proposal in comment 2, I would expect
that link to the PG calc tool should be there.

2) The form on "Add Object Storage" page doesn't check for zero value of PG field.
It's possible to submit a request with zero PG number, which would fail in the
end, but console doesn't directly show any error.

The form should both display a warning for a zero value in the same way as for the
negative number and doesn't allow to click on next button to submit such invalid
request.

Comment 6 Martin Bukatovic 2016-10-03 16:32:39 UTC

Looking at your original description, especially these properties of PG number:

> The per-pool calculations should be rounded to a power of 2, not the overall
> cluster value. It's unclear which is intended in the slide deck, but the
> per-pool value is what's important.
> 
> Per pool PG count ( pg_num * size ) should not be allowed to be less than the
> OSD count in the cluster as this would limit performance of that pool.

I'm wondering if it would make sense for the form on "Add Object Storage" page
to reject PG value which doesn't meet these requirements in a similar way how
it rejects negative values and how it should reject zero value.

Comment 7 Michael J. Kidd 2016-10-03 20:12:03 UTC

Martin,
  While it would be great to have rules around the PG value, that would entail adding more logic and confirming it's implemented properly before the async update which doesn't seem realistic.  So for this async update, simply removing the default enforcement, allowing a manual specification of PG count and linking to the PG calc tool is as good as I believe we can get.

Ultimately, we would have enforcement of the pg calc tool values and provide a means for the end user to override by acknowledging if they change the value, non-optimal behavior may be experienced (wording tbd).

Comment 8 Karnan 2016-10-04 06:24:11 UTC

Micheal,
We can stop suggesting PG value by keeping it empty and validate user input to stop giving negative and zero values. Also, as you suggest, we can add a small warning message. 

Can you reply back with the exact warning message?

Comment 9 Martin Bukatovic 2016-10-04 14:20:44 UTC

(In reply to Shubhendu Tripathi from comment #2)
> 4. While expand cluster flow using new OSD nodes, show a warning to mention
> that "With expansion of cluster with OSD, cluster coming to non usable state
> would be very much possible as it involves movement of data across placement
> groups"
> 5. Add a checkbox to accept the expansion from admin, and if selected then
> only allow expansion submit from UI screen

Just for the sake of keeping thing organized, those items are covered in
BZ 1375972 and not this one.

Comment 10 Martin Bukatovic 2016-10-04 14:24:51 UTC

(In reply to Michael J. Kidd from comment #7)
> While it would be great to have rules around the PG value, that would
> entail adding more logic and confirming it's implemented properly before the
> async update which doesn't seem realistic.  So for this async update, simply
> removing the default enforcement, allowing a manual specification of PG
> count and linking to the PG calc tool is as good as I believe we can get.
> 
> Ultimately, we would have enforcement of the pg calc tool values and provide
> a means for the end user to override by acknowledging if they change the
> value, non-optimal behavior may be experienced (wording tbd).

So it's not reasonable to add any additional checks. Thanks of the clarification.

Comment 11 Michael J. Kidd 2016-10-04 20:37:54 UTC

Karnan: See Comment #3.

  The message is already in the test build I was given access to, but was missing the link to the PG Calc tool.  I provided that feedback via email on the request to check the current message state.

Comment 12 Karnan 2016-10-06 10:07:02 UTC

Added link to pgcalc tool in the warning message. Also added validation to the pg number input.

Comment 13 Martin Bukatovic 2016-10-06 15:08:22 UTC

Checking with packages (on RHEL 7.3 based, RHSCon 2.0 sever machine):

rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.60-1.el7scon.noarch
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64

and I see that:

* note now includes link to https://access.redhat.com/labs/cephpgc/
* there is no default value for "Placement Groups" field
* for zero value of PG, error message is displayed and "next" button disabled

Comment 15 Shubhendu Tripathi 2016-10-17 12:20:51 UTC

Doc-text looks good.

Comment 16 errata-xmlrpc 2016-10-19 15:22:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082

Note You need to log in before you can comment on or make changes to this bug.