Bug 1375252 - [Docs] Document how to turn a Director managed Ceph deployment into an unmanaged deployment
Summary: [Docs] Document how to turn a Director managed Ceph deployment into an unmana...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Kim Nylander
QA Contact: Yogev Rabl
Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-12 14:40 UTC by Giulio Fidente
Modified: 2019-07-25 15:15 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-27 11:48:47 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Giulio Fidente 2016-09-12 14:40:32 UTC
Description of problem:
This process could be implemented by those whom deployed Ceph/Hammer with a version of Director < 9 and want to preserve Hammer when upgrading to OSP/10.

By default, any Director managed deployment of Ceph will be upgraded to the Jewel release instead.

Comment 3 Giulio Fidente 2016-09-21 16:38:17 UTC
The pre-existing cluster fsid, mon_host and client key can be collected from a working deployment by logging on a controller and give the following commands:

$ sudo grep fsid /etc/ceph/ceph.conf
fsid = 0334862a-7f60-11e6-bfd0-52540029bf6b

$ sudo grep mon_host /etc/ceph/ceph.conf 
mon_host = 10.35.140.35

$ sudo ceph auth get client.admin --format json|jq .[0].key
"AQDGf+FXAAAAABAAsScHVT+OWkcL749oYoyzyQ=="


These should be fed using an environment file together with the puppet-ceph-external.yaml to perform a the configuration update.

For it to succeed though, the pre-existing cephstorage-X nodes *must* be removed from the heat stack. I am still investigating if this is possible without reinstalling them one by one.

Comment 4 Giulio Fidente 2016-09-28 10:10:36 UTC
To remove the OSDs (ceph-storage) nodes from the Heat stack without actually deleting them, the process I have tested is

1. match the nova instances (nova list) with the ironic nodes (ironic node-list) to gather the ironic UUID of the nodes hosting the nova ceph-storage instances


2. set the ironic nodes hosting ceph-storage instances in maintenance state:

  $ ironic node-set-maintenance c9f2a019-61e5-4812-b0f8-5afbbce16ba1 true


3. delete the ironic nodes:

  $ ironic node-delete c9f2a019-61e5-4812-b0f8-5afbbce16ba1


4. update the existing stack using:

  --ceph-storage-scale 0

and adding:

  -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph-external.yaml -e ceph-external.yaml

where ceph-external.yaml is the file created with data from comment #3, for example:

  parameter_defaults:
    CephClusterFSID: '114a91cc-855b-11e6-9cf4-5254006225bd'
    CephClientKey: 'AQB1iOtXAAAAABAAc7oNuJRDRPI30P5pVTGmnQ=='
    CephExternalMonHost: '192.0.2.15'

Comment 7 Giulio Fidente 2016-10-20 22:41:25 UTC
Yogev, one more step is needed, after the process in comment #4 is finished, to prevent the ceph-mon package from being upgraded as this might happen when OSP is upgraded from 9 to 10, given that ceph-mon is still installed on the same nodes. More specifically I think versionlocking ceph-mon is the safest approach, running on each controller node:

# yum install yum-versionlock
# yum versionlock ceph-mon

Comment 14 Yogev Rabl 2017-09-03 13:52:05 UTC
Shouldn't we test it with removing the ceph mons from the controllers?

Comment 15 Ken Holden 2017-09-04 18:12:20 UTC
I have tested this procedure against OSP 8 (including the removal of Ceph Mons from Overcloud Controllers) with repeated success.  I was able to deploy new OpenStack instances with CEPH ephemeral and block volume based storage before and after the removal. 


1. Deployed OSP 8 baremetal overcloud consisting of 3 controllers, 1 compute and 3 ceph storage nodes (2 SSDs and 5 OSDs each). controllers were configured as monitors per OSP 8 Director deploy of Ceph
openstack overcloud deploy --templates \
--ntp-server 192.168.1.250 \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /home/stack/templates/storage-environment.yaml \
--control-flavor control \
--compute-flavor compute \
--ceph-storage-flavor ceph-storage \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 3

2. performed an overcloud stack update to bring the OSP 8 deployment to the latest RPMs as of today
openstack overcloud update stack overcloud -i \
--templates \
-e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /home/stack/templates/storage-environment.yaml


2. added a new baremetal server and configured it as a new ceph monitor
3. removed the monitor roles from the 3 controllers so that only the new server was acting as ceph monitor
4. removed the 3 ceph storage nodes from ironic in undercloud

5. re-deployed changing ceph-storage-scale 0 and sourcing the yaml files necessary to point the overcloud to an external Ceph cluster. this just completed successfully:
openstack overcloud deploy --templates \
--ntp-server 192.168.1.250 \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph-external.yaml \
-e /home/stack/templates/ceph-external.yaml \
--control-flavor control \
--compute-flavor compute \
--ceph-storage-flavor ceph-storage \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 0

2017-09-01 16:50:37 [overcloud]: UPDATE_COMPLETE  Stack UPDATE completed successfully
Stack overcloud UPDATE_COMPLETE
Overcloud Endpoint: http://192.168.1.50:5000/v2.0
Overcloud Deployed

Controllers and computes are pointing to new monitor (as I copied my new ceph.conf from the new monitor to all servers in the cluster before running second deploy)

[root@overcloud-controller-2 heat-admin]# ceph -s
    cluster 1da393e4-8ea9-11e7-9f0e-5254000947bc
     health HEALTH_OK
     monmap e6: 1 mons at {ceph-cloudbox4=172.16.6.120:6789/0}
            election epoch 21, quorum 0 ceph-cloudbox4
     osdmap e44: 15 osds: 15 up, 15 in
      pgmap v311: 448 pgs, 4 pools, 16824 MB data, 4212 objects
            34238 MB used, 991 GB / 1024 GB avail
                 448 active+clean

was able to deploy image and instance to the cluster pointing to the external ceph cluster (that was previously under director control)
[root@overcloud-compute-0 heat-admin]# grep -R -i 172.16.6.120 (IP of new Ceph Monitor) /etc/*
/etc/ceph/ceph.conf:mon_host = 172.16.6.120
/etc/hosts:172.16.6.120	cloudbox4.lab.lan	cloudbox4
/etc/libvirt/qemu/instance-00000002.xml:        <host name='172.16.6.120' port='6789'/>
/etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host: 172.16.6.120
/etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host_v6: 172.16.6.120

[root@overcloud-controller-2 heat-admin]# grep -R -i 172.16.6.120 (IP of new Ceph Monitor) /etc/*
/etc/ceph/ceph.conf:mon_host = 172.16.6.120
/etc/hosts:172.16.6.120	cloudbox4.lab.lan	cloudbox4
/etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host: 172.16.6.120
/etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host_v6: 172.16.6.120


I then performed a stack update to ensure that the director was only touching the existing Compute and Controllers.  Ceph was not touched from Director:


openstack overcloud update stack overcloud -i \
--templates \
-e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph-external.yaml \
-e /home/stack/templates/ceph-external.yaml

starting package update on stack overcloud
IN_PROGRESS
IN_PROGRESS
IN_PROGRESS
WAITING
on_breakpoint: [u'overcloud-controller-2', u'overcloud-compute-0', u'overcloud-controller-0', u'overcloud-controller-1']
Breakpoint reached, continue? Regexp or Enter=proceed, no=cancel update, C-c=quit interactive mode: 
removing breakpoint on overcloud-controller-1
IN_PROGRESS
""" truncated output """"
""" truncated output """"
COMPLETE
update finished with status COMPLETE

Comment 16 Yogev Rabl 2017-09-05 13:41:19 UTC
I have also verified the procedure. It works well, the results as expected.


Note You need to log in before you can comment on or make changes to this bug.