Description of problem ====================== When additional disk devices are added to machines of RHSC 2.0 managed ceph cluster, the console will automatically create OSDs there and add them into the cluster without any warning or admin intervention. All critical operations (such as adding new OSD into the cluster) should be admin driven. In this case the problem is that adding new OSD triggers a reallocation process (data are moved across the entire cluster) which has a significant performance impact. Version-Release =============== On RHSC 2.0 server machine: rhscon-core-selinux-0.0.41-1.el7scon.noarch rhscon-core-0.0.41-1.el7scon.x86_64 rhscon-ceph-0.0.40-1.el7scon.x86_64 rhscon-ui-0.0.53-1.el7scon.noarch ceph-installer-1.0.15-1.el7scon.noarch ceph-ansible-1.0.5-32.el7scon.noarch On Ceph 2.0 storage machines: rhscon-core-selinux-0.0.41-1.el7scon.noarch rhscon-agent-0.0.18-1.el7scon.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHSC 2.0 following the documentation. 2. Accept few nodes for the ceph cluster. 3. Create new ceph cluster named 'alpha'. 4. Create object storage pool named 'alpha_pool' (standard type). 5. Load some data into 'alpha_pool' so that you use about 50% of raw storage space, eg.: ~~~ # dd if=/dev/zero of=zero.data bs=1M count=1024 # for i in {1..6}; do \ > rados --cluster alpha put -p alpha_pool test_object_0${i} zero.data > done # ceph --cluster alpha df ~~~ 6. Check the status of the cluster and list OSDs both via cli and console web: ~~~ # ceph --cluster alpha -s # ceph --cluster alpha osd tree ~~~ While in the console, go to "OSD" tab of 'alpha' cluster page, note the utilization shown there. 7. Add one (or more) disk devices into every machine of the cluster 'alpha'. Note that for a single OSD, you need to add 2 devices (one for journal and one for the storage itself). 8. Again, check the status of the cluster and list OSDs both via cli and console web: ~~~ # ceph --cluster alpha -s # ceph --cluster alpha osd tree ~~~ While in the console, go to "OSD" tab of 'alpha' cluster page, note the utilization shown there. Actual results ============== During step #7 (adding storage disks), I added 2 new devices into each node so that a new OSD could be added. After a while, it appeared in the UI and the recovery started (moving the cluster into HEALTH_WARN state): ~~~ # ceph --cluster alpha -s cluster bbfc3be5-5bf9-4c70-a0b5-fd98d06366bd health HEALTH_WARN 5 pgs degraded 1 pgs stuck unclean recovery 8/12 objects degraded (66.667%) monmap e3: 3 mons at {dhcp-126-79=10.34.126.79:6789/0,dhcp-126-80=10.34.126.80:6789/0,dhcp-126-81=10.34.126.81:6789/0} election epoch 10, quorum 0,1,2 dhcp-126-79,dhcp-126-80,dhcp-126-81 osdmap e29: 3 osds: 3 up, 3 in flags sortbitwise pgmap v313: 128 pgs, 1 pools, 6144 MB data, 6 objects 13307 MB used, 17379 MB / 30686 MB avail 8/12 objects degraded (66.667%) 123 active+clean 5 active+degraded ~~~ When the operation finished, I could see: ~~~ # ceph --cluster alpha -s cluster bbfc3be5-5bf9-4c70-a0b5-fd98d06366bd health HEALTH_OK monmap e3: 3 mons at {dhcp-126-79=10.34.126.79:6789/0,dhcp-126-80=10.34.126.80:6789/0,dhcp-126-81=10.34.126.81:6789/0} election epoch 10, quorum 0,1,2 dhcp-126-79,dhcp-126-80,dhcp-126-81 osdmap e29: 3 osds: 3 up, 3 in flags sortbitwise pgmap v362: 128 pgs, 1 pools, 6144 MB data, 6 objects 12395 MB used, 18291 MB / 30686 MB avail 128 active+clean ~~~ No warning is shown during the whole operation which is completely automatic, admin is not warned nor it's able to start or cancel the operation. Note that even adding single new OSD could move significant amount of data across the cluster (as can be seen at screenshot #1). Expected results ================ Adding new disk devices doesn't trigger any automatic action, new OSDs are not added into the cluster without admins knowledge. Admin should notified about new devices instead, as Michael Kidd notes: > Before being allowed to proceed with this > expansion, the user should need to check a box to acknowledge their > understanding of the potential performance impact, and to contact > Support if they'd like to discuss performance impact mitigation. Michael suggests a notice like this: > "Ceph cluster expansion can cause significant client IO performance impact. > Please consider adjusting the backfill and recovery throttles (link or > checkbox to do so provided) and/or contacting Red Hat support for further > guidance." The below article covers this topic and should be linked as part of that notice: https://access.redhat.com/articles/1292733
Created attachment 1200755 [details] screenshot 1: OSDs tab (after adding few new OSDs)
Hello, I am experiencing the same issue, the auto-expansion is picking up new disks,and causes troubles, because it is needed to manually revert these changes. ----- rhscon-ceph-0.0.43-1.el7scon.x86_64 rhscon-core-0.0.45-1.el7scon.x86_64 rhscon-core-selinux-0.0.45-1.el7scon.noarch rhscon-ui-0.0.60-1.el7scon.noarch ansible-2.2.1.0-1.el7.noarch ceph-ansible-2.1.9-1.el7scon.noarch kernel-3.10.0-514.2.2.el7.x86_64 rhscon-agent-0.0.19-1.el7scon.noarch rhscon-core-selinux-0.0.45-1.el7scon.noarch ----- Issue 1: to already running Ceph Storage node were attached 1xSSD disk and 5x HDD. - On SSD disk were created 10x Ceph journal partition even when only 5 of them are used: partial output of ceph-disk list with new attached disks: # ceph-disk list .... /dev/sdm : /dev/sdm1 ceph data, active, cluster ceph, osd.108, journal /dev/sdr6 /dev/sdn : /dev/sdn1 ceph data, active, cluster ceph, osd.109, journal /dev/sdr7 /dev/sdo : /dev/sdo1 ceph data, active, cluster ceph, osd.110, journal /dev/sdr8 /dev/sdp : /dev/sdp1 ceph data, active, cluster ceph, osd.111, journal /dev/sdr9 /dev/sdq : /dev/sdq1 ceph data, active, cluster ceph, osd.107, journal /dev/sdr5 /dev/sdr : /dev/sdr1 ceph journal /dev/sdr2 ceph journal /dev/sdr3 ceph journal /dev/sdr4 ceph journal /dev/sdr6 ceph journal, for /dev/sdm1 /dev/sdr7 ceph journal, for /dev/sdn1 /dev/sdr8 ceph journal, for /dev/sdo1 /dev/sdr9 ceph journal, for /dev/sdp1 /dev/sdr5 ceph journal, for /dev/sdq1 ---- Issue 2.: To second already running Ceph Storage node were attached only 2 new 300GB SSD disks intended to use as journal holders. Storage console created on first SSD 1x journal partition and on the second SSD Ceph OSD data partition. This is not wanted behavior. ---- Question 1.: In BZ#1342969 https://bugzilla.redhat.com/show_bug.cgi?id=1342969, https://review.gerrithub.io/#/c/294928/1/provider/import_cluster.go was disabled auto-expansion when OSDs with co-located journals were detected. Would be possible to set disable auto-expansion at any case from default? Like have option to allow auto-expansion manually during cluster import/creation otherwise it would be disabled? ---- Question 2.: related to the https://review.gerrithub.io/#/c/294928/1/provider/import_cluster.go : In mongodb, would be manually change autoexpand flag to false enough and working? Or are there any other dependencies? db.storage_clusters.update({"autoexpand":true},{$set:{"autoexpand":false}}) ---- Question 3.: If I want to prevent the Storage console from taking actions while attaching new disks, as we want use different tool for adding new OSDs, is disable and mask all the Skyring related services enough? Thanks, Tomas
(In reply to Tomas Petr from comment #3) > Question 2.: > related to the > https://review.gerrithub.io/#/c/294928/1/provider/import_cluster.go : > In mongodb, would be manually change autoexpand flag to false enough and > working? Or are there any other dependencies? > db.storage_clusters.update({"autoexpand":true},{$set:{"autoexpand":false}}) > Got answer from Darshan on Question 2 in private email thread, so adding the steps to disable auto-expand + outputs from my test environment: There is an option to change it with API calls, either via REST clients in browser or using curl. - username and password is the same as for Storage Console dashboard log in - clusterid can be obtained either from "ceph -s" output from specific Ceph cluster or from browser "https://FQDN-skyring-server:10443/api/v1/clusters" to show all Ceph clusters in Storage Console, details Diagnostic steps part - "disableautoexpand":true - disable auto-expand - "disableautoexpand":false - enable auto-expand, will not work if co-located journal is detected, even if value "disableautoexpand":false is set # curl --cacert <path-to-skyring-certificate> -X POST --data '{"username":"<user>","password":"<password>"}' https://<FQDN-skyring-server>:10443/api/v1/auth/login -i # curl --cacert path-to-skyring-certificate> -X PATCH --data '{"disableautoexpand":true}' https://<FQDN-skyring-server>:10443/api/v1/clusters/<ceph-clusterid/uuid> -b session-key=<cookie-session-key-returned-in-previous-step> -i # EXAMPLE to disable: # curl --cacert /etc/pki/tls/skyring.crt -X POST --data '{"username":"admin","password":"admin"}' https://rhscon.subman:10443/api/v1/auth/login -i HTTP/1.1 200 OK Set-Cookie: session-key=MTUwMDk2NjAwNnxHd3dBR0RVNU56WmxZemMyTkRRMU0yRXlNRGhrWlRrd00ySXhOdz09fKRAopAblhzl3Iw1tPeNOveaii7XFiWYcM9FI1DkvYth; Path=/; Expires=Tue, 01 Aug 2017 07:00:06 GMT; Max-Age=604800 Date: Tue, 25 Jul 2017 07:00:06 GMT Content-Length: 24 Content-Type: text/plain; charset=utf-8 # curl --cacert /etc/pki/tls/skyring.crt -X PATCH --data '{"disableautoexpand":true}' https://rhscon.subman:10443/api/v1/clusters/a93b8b8c-a7fe-4103-8434-6a490f641a66 -b session-key=MTUwMDk2NjAwNnxHd3dBR0RVNU56WmxZemMyTkRRMU0yRXlNRGhrWlRrd00ySXhOdz09fKRAopAblhzl3Iw1tPeNOveaii7XFiWYcM9FI1DkvYth -i HTTP/1.1 200 OK Date: Tue, 25 Jul 2017 07:08:52 GMT Content-Length: 0 Content-Type: text/plain; charset=utf-8 Disable auto-expand by changing record in mongodb directly: Login to the mongodb on rhscon node, admin password is by default in the /etc/skyring/skyring.conf file # mongo 127.0.0.1:27017/skyring -u admin -p <passwd> use skyring show collections db.storage_clusters.find() db.storage_clusters.find().forEach(printjson) db.storage_clusters.find({},{autoexpand:1}) # disable auto-expand db.storage_clusters.update({"autoexpand":true},{$set:{"autoexpand":false}}) exit
This product is EOL now
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days