| Summary: | [Docs] Document how to turn a Director managed Ceph deployment into an unmanaged deployment | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Giulio Fidente <gfidente> |
| Component: | documentation | Assignee: | Kim Nylander <knylande> |
| Status: | CLOSED CANTFIX | QA Contact: | Yogev Rabl <yrabl> |
| Severity: | unspecified | Docs Contact: | Derek <dcadzow> |
| Priority: | unspecified | ||
| Version: | 10.0 (Newton) | CC: | dbecker, gcharot, gfidente, jefbrown, jliberma, jomurphy, jschluet, kholden, knylande, mburns, morazi, pneedle, rhel-osp-director-maint, srevivo, tvignaud |
| Target Milestone: | --- | Keywords: | Documentation, ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-27 11:48:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Giulio Fidente
2016-09-12 14:40:32 UTC
The pre-existing cluster fsid, mon_host and client key can be collected from a working deployment by logging on a controller and give the following commands: $ sudo grep fsid /etc/ceph/ceph.conf fsid = 0334862a-7f60-11e6-bfd0-52540029bf6b $ sudo grep mon_host /etc/ceph/ceph.conf mon_host = 10.35.140.35 $ sudo ceph auth get client.admin --format json|jq .[0].key "AQDGf+FXAAAAABAAsScHVT+OWkcL749oYoyzyQ==" These should be fed using an environment file together with the puppet-ceph-external.yaml to perform a the configuration update. For it to succeed though, the pre-existing cephstorage-X nodes *must* be removed from the heat stack. I am still investigating if this is possible without reinstalling them one by one. To remove the OSDs (ceph-storage) nodes from the Heat stack without actually deleting them, the process I have tested is 1. match the nova instances (nova list) with the ironic nodes (ironic node-list) to gather the ironic UUID of the nodes hosting the nova ceph-storage instances 2. set the ironic nodes hosting ceph-storage instances in maintenance state: $ ironic node-set-maintenance c9f2a019-61e5-4812-b0f8-5afbbce16ba1 true 3. delete the ironic nodes: $ ironic node-delete c9f2a019-61e5-4812-b0f8-5afbbce16ba1 4. update the existing stack using: --ceph-storage-scale 0 and adding: -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph-external.yaml -e ceph-external.yaml where ceph-external.yaml is the file created with data from comment #3, for example: parameter_defaults: CephClusterFSID: '114a91cc-855b-11e6-9cf4-5254006225bd' CephClientKey: 'AQB1iOtXAAAAABAAc7oNuJRDRPI30P5pVTGmnQ==' CephExternalMonHost: '192.0.2.15' Yogev, one more step is needed, after the process in comment #4 is finished, to prevent the ceph-mon package from being upgraded as this might happen when OSP is upgraded from 9 to 10, given that ceph-mon is still installed on the same nodes. More specifically I think versionlocking ceph-mon is the safest approach, running on each controller node: # yum install yum-versionlock # yum versionlock ceph-mon Shouldn't we test it with removing the ceph mons from the controllers? I have tested this procedure against OSP 8 (including the removal of Ceph Mons from Overcloud Controllers) with repeated success. I was able to deploy new OpenStack instances with CEPH ephemeral and block volume based storage before and after the removal. 1. Deployed OSP 8 baremetal overcloud consisting of 3 controllers, 1 compute and 3 ceph storage nodes (2 SSDs and 5 OSDs each). controllers were configured as monitors per OSP 8 Director deploy of Ceph openstack overcloud deploy --templates \ --ntp-server 192.168.1.250 \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /home/stack/templates/storage-environment.yaml \ --control-flavor control \ --compute-flavor compute \ --ceph-storage-flavor ceph-storage \ --control-scale 3 \ --compute-scale 1 \ --ceph-storage-scale 3 2. performed an overcloud stack update to bring the OSP 8 deployment to the latest RPMs as of today openstack overcloud update stack overcloud -i \ --templates \ -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /home/stack/templates/storage-environment.yaml 2. added a new baremetal server and configured it as a new ceph monitor 3. removed the monitor roles from the 3 controllers so that only the new server was acting as ceph monitor 4. removed the 3 ceph storage nodes from ironic in undercloud 5. re-deployed changing ceph-storage-scale 0 and sourcing the yaml files necessary to point the overcloud to an external Ceph cluster. this just completed successfully: openstack overcloud deploy --templates \ --ntp-server 192.168.1.250 \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph-external.yaml \ -e /home/stack/templates/ceph-external.yaml \ --control-flavor control \ --compute-flavor compute \ --ceph-storage-flavor ceph-storage \ --control-scale 3 \ --compute-scale 1 \ --ceph-storage-scale 0 2017-09-01 16:50:37 [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully Stack overcloud UPDATE_COMPLETE Overcloud Endpoint: http://192.168.1.50:5000/v2.0 Overcloud Deployed Controllers and computes are pointing to new monitor (as I copied my new ceph.conf from the new monitor to all servers in the cluster before running second deploy) [root@overcloud-controller-2 heat-admin]# ceph -s cluster 1da393e4-8ea9-11e7-9f0e-5254000947bc health HEALTH_OK monmap e6: 1 mons at {ceph-cloudbox4=172.16.6.120:6789/0} election epoch 21, quorum 0 ceph-cloudbox4 osdmap e44: 15 osds: 15 up, 15 in pgmap v311: 448 pgs, 4 pools, 16824 MB data, 4212 objects 34238 MB used, 991 GB / 1024 GB avail 448 active+clean was able to deploy image and instance to the cluster pointing to the external ceph cluster (that was previously under director control) [root@overcloud-compute-0 heat-admin]# grep -R -i 172.16.6.120 (IP of new Ceph Monitor) /etc/* /etc/ceph/ceph.conf:mon_host = 172.16.6.120 /etc/hosts:172.16.6.120 cloudbox4.lab.lan cloudbox4 /etc/libvirt/qemu/instance-00000002.xml: <host name='172.16.6.120' port='6789'/> /etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host: 172.16.6.120 /etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host_v6: 172.16.6.120 [root@overcloud-controller-2 heat-admin]# grep -R -i 172.16.6.120 (IP of new Ceph Monitor) /etc/* /etc/ceph/ceph.conf:mon_host = 172.16.6.120 /etc/hosts:172.16.6.120 cloudbox4.lab.lan cloudbox4 /etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host: 172.16.6.120 /etc/puppet/hieradata/ceph_cluster.yaml:ceph_mon_host_v6: 172.16.6.120 I then performed a stack update to ensure that the director was only touching the existing Compute and Controllers. Ceph was not touched from Director: openstack overcloud update stack overcloud -i \ --templates \ -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-ceph-external.yaml \ -e /home/stack/templates/ceph-external.yaml starting package update on stack overcloud IN_PROGRESS IN_PROGRESS IN_PROGRESS WAITING on_breakpoint: [u'overcloud-controller-2', u'overcloud-compute-0', u'overcloud-controller-0', u'overcloud-controller-1'] Breakpoint reached, continue? Regexp or Enter=proceed, no=cancel update, C-c=quit interactive mode: removing breakpoint on overcloud-controller-1 IN_PROGRESS """ truncated output """" """ truncated output """" COMPLETE update finished with status COMPLETE I have also verified the procedure. It works well, the results as expected. |