Hide Forgot
Description of problem ====================== When a new storage machine is added to RHSCon 2.0 managed ceph cluster, the console will automatically add OSDs (which are going to be created there) into the cluster without any warning or direct admin intervention. While the adding of a new storage node is a manual process (admin needs to tell the console that a pacticular node should be added into the cluster), there is no warning about the implications of the process. Adding new OSD triggers a reallocation process (data are moved across the entire cluster) which has a significant performance impact. Version-Release =============== On RHSC 2.0 server machine: rhscon-core-selinux-0.0.41-1.el7scon.noarch rhscon-core-0.0.41-1.el7scon.x86_64 rhscon-ceph-0.0.40-1.el7scon.x86_64 rhscon-ui-0.0.53-1.el7scon.noarch ceph-installer-1.0.15-1.el7scon.noarch ceph-ansible-1.0.5-32.el7scon.noarch On Ceph 2.0 storage machines: rhscon-core-selinux-0.0.41-1.el7scon.noarch rhscon-agent-0.0.18-1.el7scon.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHSC 2.0 following the documentation. 2. Accept few nodes for the ceph cluster. 3. Create new ceph cluster named 'alpha', while leaving few more accepted nodes unused. 4. Create object storage pool named 'alpha_pool' (standard type). 5. Load some data into 'alpha_pool' so that you use about 50% of raw storage space, eg.: ~~~ # dd if=/dev/zero of=zero.data bs=1M count=1024 # for i in {1..6}; do \ > rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data > done # ceph --cluster alpha df ~~~ 6. Check the status of the cluster and list OSDs both via cli and console web: ~~~ # ceph --cluster alpha -s # ceph --cluster alpha osd tree ~~~ While in the console, go to "OSD" tab of 'alpha' cluster page, note the utilization shown there. 7. Add one new machine (already accepted in step #2) into the cluster 'alpha'. Go to "Clusters", click on menu for the 'alpha' cluster and select "Expand", then one the "Expand Cluster: alpha" page select one machine. 8. Again, check the status of the cluster and list OSDs both via cli and console web: ~~~ # ceph --cluster alpha -s # ceph --cluster alpha osd tree ~~~ While in the console, go to "OSD" tab of 'alpha' cluster page, note the utilization shown there. Actual results ============== During step #7 (adding storage machine), I added one new machine into the cluster, starting "Expand Cluster". No warning is shown during the whole operation which is completely automatic, admin is not warned nor it's able to start or cancel the operation. Note that even adding single new OSD could move significant amount of data across the cluster. Expected results ================ Before starting "Expand Cluster" task, console should show a warning notice to the admin so that admin: * understand implications of the operation he is about to start * can cancel start of the operation at that point if he don't like the implications Michael Kidd suggests a notice like this: > Ceph cluster expansion can cause significant client IO performance impact. > Please consider adjusting the backfill and recovery throttles (link or > checkbox to do so provided) and/or contacting Red Hat support for further > guidance.
Fixed as per comments 3 point-4 in https://bugzilla.redhat.com/show_bug.cgi?id=1375538
Tested on Red Hat Enterprise Linux Server release 7.3 (Maipo) with following versions: ceph-ansible-1.0.5-34.el7scon.noarch ceph-installer-1.0.15-2.el7scon.noarch rhscon-ceph-0.0.43-1.el7scon.x86_64 rhscon-core-0.0.45-1.el7scon.x86_64 rhscon-core-selinux-0.0.45-1.el7scon.noarch rhscon-ui-0.0.59-1.el7scon.noarch Now there is following warning note in the Expand Cluster wizard: ~~~~~~~~~~~~~~~~ Ceph cluster expansion requires data movement between OSDs and can cause significant client IO performance impact if proper adjustments are not made. Please contact Red Hat support for help with the recommended changes. ~~~~~~~~~~~~~~~~ and it is necessary to check "I understand the risk" checkbox to be able to continue. >> VERIFIED
doc-text looks good
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:2082