Bug 1375972 - when cluster is expanded (new machine added), console doesn't warn admin about implications of associated recovery operation
Summary: when cluster is expanded (new machine added), console doesn't warn admin abou...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: UI
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2
Assignee: Karnan
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: Console-2-Async
TreeView+ depends on / blocked
 
Reported: 2016-09-14 11:38 UTC by Martin Bukatovic
Modified: 2016-10-19 15:22 UTC (History)
7 users (show)

Fixed In Version: rhscon-core-0.0.44-1.el7scon.x86_64, rhscon-ui-0.0.58-1.el7scon.noarch
Doc Type: Bug Fix
Doc Text:
Previously, a warning message was not displayed during cluster expansion flow. With this update, a warning message is displayed which will warn the users about the implications of cluster expansion. Users can expand cluster only if they check the "I understand the risk" checkbox.
Clone Of:
Environment:
Last Closed: 2016-10-19 15:22:33 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1375899 0 unspecified CLOSED when new disk devices are added into storage nodes, RHSC creates new OSD(s) and adds them into cluster automatically wit... 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2016:2082 0 normal SHIPPED_LIVE Moderate: Red Hat Storage Console 2 security and bug fix update 2017-04-18 19:29:02 UTC

Internal Links: 1375899

Description Martin Bukatovic 2016-09-14 11:38:46 UTC
Description of problem
======================

When a new storage machine is added to RHSCon 2.0 managed ceph cluster, the
console will automatically add OSDs (which are going to be created there) into
the cluster without any warning or direct admin intervention. 

While the adding of a new storage node is a manual process (admin needs to tell
the console that a pacticular node should be added into the cluster), there is
no warning about the implications of the process.
 
Adding new OSD triggers a reallocation process (data are moved across the
entire cluster) which has a significant performance impact.

Version-Release
===============

On RHSC 2.0 server machine:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-ui-0.0.53-1.el7scon.noarch
ceph-installer-1.0.15-1.el7scon.noarch
ceph-ansible-1.0.5-32.el7scon.noarch

On Ceph 2.0 storage machines:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-agent-0.0.18-1.el7scon.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.

2. Accept few nodes for the ceph cluster.

3. Create new ceph cluster named 'alpha', 
   while leaving few more accepted nodes unused.

4. Create object storage pool named 'alpha_pool' (standard type).

5. Load some data into 'alpha_pool' so that you use about 50% of raw storage
   space, eg.:

   ~~~
   # dd if=/dev/zero of=zero.data bs=1M count=1024
   # for i in {1..6}; do \
   > rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data
   > done
   # ceph --cluster alpha df
   ~~~

6. Check the status of the cluster and list OSDs both via cli and console web:

   ~~~
   # ceph --cluster alpha -s
   # ceph --cluster alpha osd tree
   ~~~

   While in the console, go to "OSD" tab of 'alpha' cluster page, note the
   utilization shown there.

7. Add one new machine (already accepted in step #2) into the cluster 'alpha'.

   Go to "Clusters", click on menu for the 'alpha' cluster and select "Expand",
   then one the "Expand Cluster: alpha" page select one machine.

8. Again, check the status of the cluster and list OSDs both via cli and
   console web:

   ~~~
   # ceph --cluster alpha -s
   # ceph --cluster alpha osd tree
   ~~~

   While in the console, go to "OSD" tab of 'alpha' cluster page, note the
   utilization shown there.

Actual results
==============

During step #7 (adding storage machine), I added one new machine into the
cluster, starting "Expand Cluster".

No warning is shown during the whole operation which is completely automatic,
admin is not warned nor it's able to start or cancel the operation.

Note that even adding single new OSD could move significant amount of data
across the cluster.

Expected results
================

Before starting "Expand Cluster" task, console should show a warning notice
to the admin so that admin:

 * understand implications of the operation he is about to start
 * can cancel start of the operation at that point if he don't like the
   implications

Michael Kidd suggests a notice like this:

> Ceph cluster expansion can cause significant client IO performance impact.
> Please consider adjusting the backfill and recovery throttles (link or
> checkbox to do so provided) and/or contacting Red Hat support for further
> guidance.

Comment 1 Karnan 2016-09-30 08:44:07 UTC
Fixed as per comments 3 point-4 in https://bugzilla.redhat.com/show_bug.cgi?id=1375538

Comment 3 Daniel Horák 2016-10-05 07:52:36 UTC
Tested on Red Hat Enterprise Linux Server release 7.3 (Maipo) with following versions:
  ceph-ansible-1.0.5-34.el7scon.noarch
  ceph-installer-1.0.15-2.el7scon.noarch
  rhscon-ceph-0.0.43-1.el7scon.x86_64
  rhscon-core-0.0.45-1.el7scon.x86_64
  rhscon-core-selinux-0.0.45-1.el7scon.noarch
  rhscon-ui-0.0.59-1.el7scon.noarch

Now there is following warning note in the Expand Cluster wizard:
  ~~~~~~~~~~~~~~~~
  Ceph cluster expansion requires data movement between OSDs and can cause significant client IO performance impact if proper adjustments are not made. Please contact Red Hat support for help with the recommended changes.
  ~~~~~~~~~~~~~~~~
and it is necessary to check "I understand the risk" checkbox to be able to continue.

>> VERIFIED

Comment 5 Shubhendu Tripathi 2016-10-17 13:32:26 UTC
doc-text looks good

Comment 6 errata-xmlrpc 2016-10-19 15:22:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082


Note You need to log in before you can comment on or make changes to this bug.