Bug 1375899 - when new disk devices are added into storage nodes, RHSC creates new OSD(s) and adds them into cluster automatically without any admin intervention [NEEDINFO]
Summary: when new disk devices are added into storage nodes, RHSC creates new OSD(s) a...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: core
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2
Assignee: Nishanth Thomas
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-14 08:18 UTC by Martin Bukatovic
Modified: 2020-08-13 08:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 05:42:07 UTC
Target Upstream Version:
tpetr: needinfo? (nthomas)


Attachments (Terms of Use)
screenshot 1: OSDs tab (after adding few new OSDs) (46.76 KB, image/png)
2016-09-14 08:19 UTC, Martin Bukatovic
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1375972 0 unspecified CLOSED when cluster is expanded (new machine added), console doesn't warn admin about implications of associated recovery opera... 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 3125451 0 None None None 2017-07-25 11:24:38 UTC

Internal Links: 1375972

Description Martin Bukatovic 2016-09-14 08:18:02 UTC
Description of problem
======================

When additional disk devices are added to machines of RHSC 2.0 managed ceph
cluster, the console will automatically create OSDs there and add them into
the cluster without any warning or admin intervention. 

All critical operations (such as adding new OSD into the cluster) should be
admin driven. In this case the problem is that adding new OSD triggers
a reallocation process (data are moved across the entire cluster) which has
a significant performance impact.

Version-Release
===============

On RHSC 2.0 server machine:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-ui-0.0.53-1.el7scon.noarch
ceph-installer-1.0.15-1.el7scon.noarch
ceph-ansible-1.0.5-32.el7scon.noarch

On Ceph 2.0 storage machines:

rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-agent-0.0.18-1.el7scon.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.

2. Accept few nodes for the ceph cluster.

3. Create new ceph cluster named 'alpha'.

4. Create object storage pool named 'alpha_pool' (standard type).

5. Load some data into 'alpha_pool' so that you use about 50% of raw storage
   space, eg.:

   ~~~
   # dd if=/dev/zero of=zero.data bs=1M count=1024
   # for i in {1..6}; do \
   > rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data
   > done
   # ceph --cluster alpha df
   ~~~

6. Check the status of the cluster and list OSDs both via cli and console web:

   ~~~
   # ceph --cluster alpha -s
   # ceph --cluster alpha osd tree
   ~~~

   While in the console, go to "OSD" tab of 'alpha' cluster page, note the
   utilization shown there.

7. Add one (or more) disk devices into every machine of the cluster 'alpha'.
   Note that for a single OSD, you need to add 2 devices (one for journal and
   one for the storage itself).

8. Again, check the status of the cluster and list OSDs both via cli and
   console web:

   ~~~
   # ceph --cluster alpha -s
   # ceph --cluster alpha osd tree
   ~~~

   While in the console, go to "OSD" tab of 'alpha' cluster page, note the
   utilization shown there.

Actual results
==============

During step #7 (adding storage disks), I added 2 new devices into each node
so that a new OSD could be added. After a while, it appeared in the UI and
the recovery started (moving the cluster into HEALTH_WARN state):

~~~
# ceph --cluster alpha -s
    cluster bbfc3be5-5bf9-4c70-a0b5-fd98d06366bd
     health HEALTH_WARN
            5 pgs degraded
            1 pgs stuck unclean
            recovery 8/12 objects degraded (66.667%)
     monmap e3: 3 mons at {dhcp-126-79=10.34.126.79:6789/0,dhcp-126-80=10.34.126.80:6789/0,dhcp-126-81=10.34.126.81:6789/0}
            election epoch 10, quorum 0,1,2 dhcp-126-79,dhcp-126-80,dhcp-126-81
     osdmap e29: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v313: 128 pgs, 1 pools, 6144 MB data, 6 objects
            13307 MB used, 17379 MB / 30686 MB avail
            8/12 objects degraded (66.667%)
                 123 active+clean
                   5 active+degraded
~~~

When the operation finished, I could see:

~~~
# ceph --cluster alpha -s
    cluster bbfc3be5-5bf9-4c70-a0b5-fd98d06366bd
     health HEALTH_OK
     monmap e3: 3 mons at {dhcp-126-79=10.34.126.79:6789/0,dhcp-126-80=10.34.126.80:6789/0,dhcp-126-81=10.34.126.81:6789/0}
            election epoch 10, quorum 0,1,2 dhcp-126-79,dhcp-126-80,dhcp-126-81
     osdmap e29: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v362: 128 pgs, 1 pools, 6144 MB data, 6 objects
            12395 MB used, 18291 MB / 30686 MB avail
                 128 active+clean
~~~

No warning is shown during the whole operation which is completely automatic,
admin is not warned nor it's able to start or cancel the operation.

Note that even adding single new OSD could move significant amount of data
across the cluster (as can be seen at screenshot #1).

Expected results
================

Adding new disk devices doesn't trigger any automatic action, new OSDs are not
added into the cluster without admins knowledge.

Admin should notified about new devices instead, as Michael Kidd notes:

> Before being allowed to proceed with this
> expansion, the user should need to check a box to acknowledge their
> understanding of the potential performance impact, and to contact
> Support if they'd like to discuss performance impact mitigation.

Michael suggests a notice like this:

> "Ceph cluster expansion can cause significant client IO performance impact.
> Please consider adjusting the backfill and recovery throttles (link or
> checkbox to do so provided) and/or contacting Red Hat support for further
> guidance."

The below article covers this topic and should be linked as part of that
notice: https://access.redhat.com/articles/1292733

Comment 1 Martin Bukatovic 2016-09-14 08:19:44 UTC
Created attachment 1200755 [details]
screenshot 1: OSDs tab (after adding few new OSDs)

Comment 3 Tomas Petr 2017-07-24 10:29:41 UTC
Hello,
I am experiencing the same issue, the auto-expansion is picking up new disks,and causes troubles, because it is needed to manually revert these changes.
-----
rhscon-ceph-0.0.43-1.el7scon.x86_64 
rhscon-core-0.0.45-1.el7scon.x86_64 
rhscon-core-selinux-0.0.45-1.el7scon.noarch 
rhscon-ui-0.0.60-1.el7scon.noarch    
ansible-2.2.1.0-1.el7.noarch   
ceph-ansible-2.1.9-1.el7scon.noarch
kernel-3.10.0-514.2.2.el7.x86_64 

rhscon-agent-0.0.19-1.el7scon.noarch 
rhscon-core-selinux-0.0.45-1.el7scon.noarch
-----

Issue 1:
to already running Ceph Storage node were attached 1xSSD disk and 5x HDD.
 - On SSD disk were created 10x Ceph journal partition even when only 5 of them are used:
partial output of ceph-disk list with new attached disks:
# ceph-disk list
....
/dev/sdm :
 /dev/sdm1 ceph data, active, cluster ceph, osd.108, journal /dev/sdr6
/dev/sdn :
 /dev/sdn1 ceph data, active, cluster ceph, osd.109, journal /dev/sdr7
/dev/sdo :
 /dev/sdo1 ceph data, active, cluster ceph, osd.110, journal /dev/sdr8
/dev/sdp :
 /dev/sdp1 ceph data, active, cluster ceph, osd.111, journal /dev/sdr9
/dev/sdq :
 /dev/sdq1 ceph data, active, cluster ceph, osd.107, journal /dev/sdr5
/dev/sdr :
 /dev/sdr1 ceph journal
 /dev/sdr2 ceph journal
 /dev/sdr3 ceph journal
 /dev/sdr4 ceph journal
 /dev/sdr6 ceph journal, for /dev/sdm1
 /dev/sdr7 ceph journal, for /dev/sdn1
 /dev/sdr8 ceph journal, for /dev/sdo1
 /dev/sdr9 ceph journal, for /dev/sdp1
 /dev/sdr5 ceph journal, for /dev/sdq1

----
Issue 2.:
To second already running Ceph Storage node were attached only 2 new 300GB SSD disks intended to use as journal holders. Storage console created on first SSD 1x journal partition and on the second SSD Ceph OSD data partition.

This is not wanted behavior.

----
Question 1.:

In BZ#1342969 https://bugzilla.redhat.com/show_bug.cgi?id=1342969, https://review.gerrithub.io/#/c/294928/1/provider/import_cluster.go
was disabled auto-expansion when OSDs with co-located journals were detected.

Would be possible to set disable auto-expansion at any case from default?
Like have option to allow auto-expansion manually during cluster import/creation otherwise it would be disabled?

----
Question 2.:
related to the https://review.gerrithub.io/#/c/294928/1/provider/import_cluster.go :
In mongodb, would be  manually change autoexpand flag to false enough and working? Or are there any other dependencies? 
db.storage_clusters.update({"autoexpand":true},{$set:{"autoexpand":false}})

----
Question 3.:
If I want to prevent the Storage console from taking actions while attaching new disks, as we want use different tool for adding new OSDs, is disable and mask all the Skyring related services enough? 

Thanks, Tomas

Comment 4 Tomas Petr 2017-07-25 11:34:42 UTC
(In reply to Tomas Petr from comment #3)
> Question 2.:
> related to the
> https://review.gerrithub.io/#/c/294928/1/provider/import_cluster.go :
> In mongodb, would be  manually change autoexpand flag to false enough and
> working? Or are there any other dependencies? 
> db.storage_clusters.update({"autoexpand":true},{$set:{"autoexpand":false}})
> 

Got answer from Darshan on Question 2 in private email thread, so adding the steps to disable auto-expand + outputs from my test environment:

There is an option to change it with API calls, either via REST clients in browser or using curl.
- username and password is the same as for Storage Console dashboard log in
- clusterid can be obtained either from "ceph -s" output from specific Ceph cluster or from browser "https://FQDN-skyring-server:10443/api/v1/clusters" to show all Ceph clusters in Storage Console, details Diagnostic steps part
- "disableautoexpand":true - disable auto-expand
- "disableautoexpand":false - enable auto-expand, will not work if co-located journal is detected, even if value "disableautoexpand":false is set

# curl --cacert <path-to-skyring-certificate> -X POST --data '{"username":"<user>","password":"<password>"}' https://<FQDN-skyring-server>:10443/api/v1/auth/login -i
# curl --cacert path-to-skyring-certificate> -X PATCH --data '{"disableautoexpand":true}' https://<FQDN-skyring-server>:10443/api/v1/clusters/<ceph-clusterid/uuid> -b session-key=<cookie-session-key-returned-in-previous-step> -i

# EXAMPLE to disable:
# curl --cacert /etc/pki/tls/skyring.crt -X POST --data '{"username":"admin","password":"admin"}' https://rhscon.subman:10443/api/v1/auth/login -i
    HTTP/1.1 200 OK
    Set-Cookie: session-key=MTUwMDk2NjAwNnxHd3dBR0RVNU56WmxZemMyTkRRMU0yRXlNRGhrWlRrd00ySXhOdz09fKRAopAblhzl3Iw1tPeNOveaii7XFiWYcM9FI1DkvYth; Path=/; Expires=Tue, 01 Aug 2017 07:00:06 GMT; Max-Age=604800
    Date: Tue, 25 Jul 2017 07:00:06 GMT
    Content-Length: 24
    Content-Type: text/plain; charset=utf-8
# curl --cacert /etc/pki/tls/skyring.crt -X PATCH --data '{"disableautoexpand":true}' https://rhscon.subman:10443/api/v1/clusters/a93b8b8c-a7fe-4103-8434-6a490f641a66 -b session-key=MTUwMDk2NjAwNnxHd3dBR0RVNU56WmxZemMyTkRRMU0yRXlNRGhrWlRrd00ySXhOdz09fKRAopAblhzl3Iw1tPeNOveaii7XFiWYcM9FI1DkvYth -i
    HTTP/1.1 200 OK
    Date: Tue, 25 Jul 2017 07:08:52 GMT
    Content-Length: 0
    Content-Type: text/plain; charset=utf-8

Disable auto-expand by changing record in mongodb directly:
Login to the mongodb on rhscon node, admin password is by default in the /etc/skyring/skyring.conf file

# mongo 127.0.0.1:27017/skyring -u admin -p <passwd>
use skyring
show collections
db.storage_clusters.find()
db.storage_clusters.find().forEach(printjson)
db.storage_clusters.find({},{autoexpand:1})
# disable auto-expand 
db.storage_clusters.update({"autoexpand":true},{$set:{"autoexpand":false}})
exit

Comment 5 Shubhendu Tripathi 2018-11-19 05:42:07 UTC
This product is EOL now


Note You need to log in before you can comment on or make changes to this bug.