Bug 1375972

Summary: when cluster is expanded (new machine added), console doesn't warn admin about implications of associated recovery operation
Product: Red Hat Storage Console Reporter: Martin Bukatovic <mbukatov>
Component: UIAssignee: Karnan <kchidamb>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2CC: dahorak, kchidamb, nthomas, rghatvis, sankarshan, shtripat, vsarmila
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhscon-core-0.0.44-1.el7scon.x86_64, rhscon-ui-0.0.58-1.el7scon.noarch Doc Type: Bug Fix
Doc Text:
Previously, a warning message was not displayed during cluster expansion flow. With this update, a warning message is displayed which will warn the users about the implications of cluster expansion. Users can expand cluster only if they check the "I understand the risk" checkbox.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-19 15:22:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1357777    

Description Martin Bukatovic 2016-09-14 11:38:46 UTC
Description of problem
======================

When a new storage machine is added to RHSCon 2.0 managed ceph cluster, the
console will automatically add OSDs (which are going to be created there) into
the cluster without any warning or direct admin intervention. 

While the adding of a new storage node is a manual process (admin needs to tell
the console that a pacticular node should be added into the cluster), there is
no warning about the implications of the process.
 
Adding new OSD triggers a reallocation process (data are moved across the
entire cluster) which has a significant performance impact.

Version-Release
===============

On RHSC 2.0 server machine:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-ui-0.0.53-1.el7scon.noarch
ceph-installer-1.0.15-1.el7scon.noarch
ceph-ansible-1.0.5-32.el7scon.noarch

On Ceph 2.0 storage machines:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-agent-0.0.18-1.el7scon.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.

2. Accept few nodes for the ceph cluster.

3. Create new ceph cluster named 'alpha', 
   while leaving few more accepted nodes unused.

4. Create object storage pool named 'alpha_pool' (standard type).

5. Load some data into 'alpha_pool' so that you use about 50% of raw storage
   space, eg.:

   ~~~
   # dd if=/dev/zero of=zero.data bs=1M count=1024
   # for i in {1..6}; do \
   > rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data
   > done
   # ceph --cluster alpha df
   ~~~

6. Check the status of the cluster and list OSDs both via cli and console web:

   ~~~
   # ceph --cluster alpha -s
   # ceph --cluster alpha osd tree
   ~~~

   While in the console, go to "OSD" tab of 'alpha' cluster page, note the
   utilization shown there.

7. Add one new machine (already accepted in step #2) into the cluster 'alpha'.

   Go to "Clusters", click on menu for the 'alpha' cluster and select "Expand",
   then one the "Expand Cluster: alpha" page select one machine.

8. Again, check the status of the cluster and list OSDs both via cli and
   console web:

   ~~~
   # ceph --cluster alpha -s
   # ceph --cluster alpha osd tree
   ~~~

   While in the console, go to "OSD" tab of 'alpha' cluster page, note the
   utilization shown there.

Actual results
==============

During step #7 (adding storage machine), I added one new machine into the
cluster, starting "Expand Cluster".

No warning is shown during the whole operation which is completely automatic,
admin is not warned nor it's able to start or cancel the operation.

Note that even adding single new OSD could move significant amount of data
across the cluster.

Expected results
================

Before starting "Expand Cluster" task, console should show a warning notice
to the admin so that admin:

 * understand implications of the operation he is about to start
 * can cancel start of the operation at that point if he don't like the
   implications

Michael Kidd suggests a notice like this:

> Ceph cluster expansion can cause significant client IO performance impact.
> Please consider adjusting the backfill and recovery throttles (link or
> checkbox to do so provided) and/or contacting Red Hat support for further
> guidance.

Comment 1 Karnan 2016-09-30 08:44:07 UTC
Fixed as per comments 3 point-4 in https://bugzilla.redhat.com/show_bug.cgi?id=1375538

Comment 3 Daniel Horák 2016-10-05 07:52:36 UTC
Tested on Red Hat Enterprise Linux Server release 7.3 (Maipo) with following versions:
  ceph-ansible-1.0.5-34.el7scon.noarch
  ceph-installer-1.0.15-2.el7scon.noarch
  rhscon-ceph-0.0.43-1.el7scon.x86_64
  rhscon-core-0.0.45-1.el7scon.x86_64
  rhscon-core-selinux-0.0.45-1.el7scon.noarch
  rhscon-ui-0.0.59-1.el7scon.noarch

Now there is following warning note in the Expand Cluster wizard:
  ~~~~~~~~~~~~~~~~
  Ceph cluster expansion requires data movement between OSDs and can cause significant client IO performance impact if proper adjustments are not made. Please contact Red Hat support for help with the recommended changes.
  ~~~~~~~~~~~~~~~~
and it is necessary to check "I understand the risk" checkbox to be able to continue.

>> VERIFIED

Comment 5 Shubhendu Tripathi 2016-10-17 13:32:26 UTC
doc-text looks good

Comment 6 errata-xmlrpc 2016-10-19 15:22:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082