Previously, a warning message was not displayed during cluster expansion flow. With this update, a warning message is displayed which will warn the users about the implications of cluster expansion. Users can expand cluster only if they check the "I understand the risk" checkbox.
DescriptionMartin Bukatovic
2016-09-14 11:38:46 UTC
Description of problem
======================
When a new storage machine is added to RHSCon 2.0 managed ceph cluster, the
console will automatically add OSDs (which are going to be created there) into
the cluster without any warning or direct admin intervention.
While the adding of a new storage node is a manual process (admin needs to tell
the console that a pacticular node should be added into the cluster), there is
no warning about the implications of the process.
Adding new OSD triggers a reallocation process (data are moved across the
entire cluster) which has a significant performance impact.
Version-Release
===============
On RHSC 2.0 server machine:
rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-ui-0.0.53-1.el7scon.noarch
ceph-installer-1.0.15-1.el7scon.noarch
ceph-ansible-1.0.5-32.el7scon.noarch
On Ceph 2.0 storage machines:
rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-agent-0.0.18-1.el7scon.noarch
How reproducible
================
100 %
Steps to Reproduce
==================
1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Create new ceph cluster named 'alpha',
while leaving few more accepted nodes unused.
4. Create object storage pool named 'alpha_pool' (standard type).
5. Load some data into 'alpha_pool' so that you use about 50% of raw storage
space, eg.:
~~~
# dd if=/dev/zero of=zero.data bs=1M count=1024
# for i in {1..6}; do \
> rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data
> done
# ceph --cluster alpha df
~~~
6. Check the status of the cluster and list OSDs both via cli and console web:
~~~
# ceph --cluster alpha -s
# ceph --cluster alpha osd tree
~~~
While in the console, go to "OSD" tab of 'alpha' cluster page, note the
utilization shown there.
7. Add one new machine (already accepted in step #2) into the cluster 'alpha'.
Go to "Clusters", click on menu for the 'alpha' cluster and select "Expand",
then one the "Expand Cluster: alpha" page select one machine.
8. Again, check the status of the cluster and list OSDs both via cli and
console web:
~~~
# ceph --cluster alpha -s
# ceph --cluster alpha osd tree
~~~
While in the console, go to "OSD" tab of 'alpha' cluster page, note the
utilization shown there.
Actual results
==============
During step #7 (adding storage machine), I added one new machine into the
cluster, starting "Expand Cluster".
No warning is shown during the whole operation which is completely automatic,
admin is not warned nor it's able to start or cancel the operation.
Note that even adding single new OSD could move significant amount of data
across the cluster.
Expected results
================
Before starting "Expand Cluster" task, console should show a warning notice
to the admin so that admin:
* understand implications of the operation he is about to start
* can cancel start of the operation at that point if he don't like the
implications
Michael Kidd suggests a notice like this:
> Ceph cluster expansion can cause significant client IO performance impact.
> Please consider adjusting the backfill and recovery throttles (link or
> checkbox to do so provided) and/or contacting Red Hat support for further
> guidance.
Tested on Red Hat Enterprise Linux Server release 7.3 (Maipo) with following versions:
ceph-ansible-1.0.5-34.el7scon.noarch
ceph-installer-1.0.15-2.el7scon.noarch
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.59-1.el7scon.noarch
Now there is following warning note in the Expand Cluster wizard:
~~~~~~~~~~~~~~~~
Ceph cluster expansion requires data movement between OSDs and can cause significant client IO performance impact if proper adjustments are not made. Please contact Red Hat support for help with the recommended changes.
~~~~~~~~~~~~~~~~
and it is necessary to check "I understand the risk" checkbox to be able to continue.
>> VERIFIED
Comment 5Shubhendu Tripathi
2016-10-17 13:32:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2016:2082
Description of problem ====================== When a new storage machine is added to RHSCon 2.0 managed ceph cluster, the console will automatically add OSDs (which are going to be created there) into the cluster without any warning or direct admin intervention. While the adding of a new storage node is a manual process (admin needs to tell the console that a pacticular node should be added into the cluster), there is no warning about the implications of the process. Adding new OSD triggers a reallocation process (data are moved across the entire cluster) which has a significant performance impact. Version-Release =============== On RHSC 2.0 server machine: rhscon-core-selinux-0.0.41-1.el7scon.noarch rhscon-core-0.0.41-1.el7scon.x86_64 rhscon-ceph-0.0.40-1.el7scon.x86_64 rhscon-ui-0.0.53-1.el7scon.noarch ceph-installer-1.0.15-1.el7scon.noarch ceph-ansible-1.0.5-32.el7scon.noarch On Ceph 2.0 storage machines: rhscon-core-selinux-0.0.41-1.el7scon.noarch rhscon-agent-0.0.18-1.el7scon.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHSC 2.0 following the documentation. 2. Accept few nodes for the ceph cluster. 3. Create new ceph cluster named 'alpha', while leaving few more accepted nodes unused. 4. Create object storage pool named 'alpha_pool' (standard type). 5. Load some data into 'alpha_pool' so that you use about 50% of raw storage space, eg.: ~~~ # dd if=/dev/zero of=zero.data bs=1M count=1024 # for i in {1..6}; do \ > rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data > done # ceph --cluster alpha df ~~~ 6. Check the status of the cluster and list OSDs both via cli and console web: ~~~ # ceph --cluster alpha -s # ceph --cluster alpha osd tree ~~~ While in the console, go to "OSD" tab of 'alpha' cluster page, note the utilization shown there. 7. Add one new machine (already accepted in step #2) into the cluster 'alpha'. Go to "Clusters", click on menu for the 'alpha' cluster and select "Expand", then one the "Expand Cluster: alpha" page select one machine. 8. Again, check the status of the cluster and list OSDs both via cli and console web: ~~~ # ceph --cluster alpha -s # ceph --cluster alpha osd tree ~~~ While in the console, go to "OSD" tab of 'alpha' cluster page, note the utilization shown there. Actual results ============== During step #7 (adding storage machine), I added one new machine into the cluster, starting "Expand Cluster". No warning is shown during the whole operation which is completely automatic, admin is not warned nor it's able to start or cancel the operation. Note that even adding single new OSD could move significant amount of data across the cluster. Expected results ================ Before starting "Expand Cluster" task, console should show a warning notice to the admin so that admin: * understand implications of the operation he is about to start * can cancel start of the operation at that point if he don't like the implications Michael Kidd suggests a notice like this: > Ceph cluster expansion can cause significant client IO performance impact. > Please consider adjusting the backfill and recovery throttles (link or > checkbox to do so provided) and/or contacting Red Hat support for further > guidance.