Bug 1464572

Summary:	take-over-existing-cluster.yml does not support existing RGW config and overwrites with new config if rgw nodes are enabled in /etc/ansible/hosts
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Vikhyat Umrao <vumrao>
Component:	Ceph-Ansible	Assignee:	Sébastien Han <shan>
Status:	CLOSED WONTFIX	QA Contact:	ceph-qe-bugs <ceph-qe-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.0	CC:	adeza, aschoen, ceph-eng-bugs, flucifre, gmeno, mkudlej, nthomas, sankarshan, seb
Target Milestone:	rc
Target Release:	3.0
Hardware:	x86_64
OS:	All
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.0.0-0.1.rc1.el7cp	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-02 11:28:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vikhyat Umrao 2017-06-23 19:25:51 UTC

Description of problem:
take-over-existing-cluster.yml does not support existing RGW nodes takeover and overwrites with new config if rgw nodes are enabled in /etc/ansible/hosts


cat /etc/ansible/hosts

[mons]
mon1
mon2

[osds]
osd1
osd2

[rgws]
node1
node2

[client.rgw.node1]
host = node1
keyring = /var/lib/ceph/radosgw/ceph-rgw.node1/keyring
rgw socket path = /tmp/radosgw-node1.sock
log file = /var/log/ceph/ceph-rgw-node1.log
rgw data = /var/lib/ceph/radosgw/ceph-rgw.node1
rgw frontends = civetweb port=192.168.24.74:8080 num_threads=50


[client.rgw.node2]
host = node2
keyring = /var/lib/ceph/radosgw/ceph-rgw.node2/keyring
rgw socket path = /tmp/radosgw-node2.sock
log file = /var/log/ceph/ceph-rgw-node2.log
rgw data = /var/lib/ceph/radosgw/ceph-rgw.node2
rgw frontends = civetweb port=192.168.24.75:8080 num_threads=50

Original ceph.conf has Keystone and other tunables.


For more details please check RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1459350

Version-Release number of selected component (if applicable):

Comment 1 Vikhyat Umrao 2017-06-23 19:28:51 UTC

Version-Release number of selected component (if applicable):

$ cat installed-rpms | grep ansible
ansible-2.2.1.0-1.el7.noarch                                
ceph-ansible-2.1.9-1.el7scon.noarch

Comment 2 seb 2017-06-26 11:45:20 UTC

were the keystone options configured with ceph-ansible or not?

Comment 3 Vikhyat Umrao 2017-06-26 16:23:43 UTC

(In reply to seb from comment #2)
> were the keystone options configured with ceph-ansible or not?

This was RHCS 1.3.z to 2.y upgrade. Keystone was configured maybe manually in RHCS 1.3.z.

Comment 5 seb 2017-06-28 12:33:34 UTC

Well if the ceph.conf was edited manually outside of ansible this is normal that ansible overwrote it.
If that's the case, this issue should be closed.

Comment 6 Vikhyat Umrao 2017-06-28 13:05:22 UTC

(In reply to seb from comment #5)
> Well if the ceph.conf was edited manually outside of Ansible this is normal
> that ansible overwrote it.
> If that's the case, this issue should be closed.

Here the issue is when we take over a ceph cluster to ansible it means that that cluster was never managed by Ansible and it is true in 1.3.z because all 1.3.z clusters are managed by ceph-deploy.

Now customers are expecting when they take over the cluster to ansible they should have the same ceph.conf how it was before and we are able to do that for Monitor and OSD nodes for more details are here: https://bugzilla.redhat.com/show_bug.cgi?id=1459350 but not for RGW nodes.

This bug is to track the support for RGW nodes when taking over the cluster in Ansible.

Comment 8 seb 2017-06-29 09:56:31 UTC

If this BZ is a better version of https://bugzilla.redhat.com/show_bug.cgi?id=1459350 then let's close the other one as dup.

Comment 10 Federico Lucifredi 2017-08-02 11:28:58 UTC

Running take-over-existing-cluster is valid in 2.x as we changed from ceph-deploy (in 1.3) to ceph-ansible in 2.0 and had to provide a way for customer to make sure the existing cluster is brought under ceph-ansible control to handle management tasks like add/remove osds.

In 3.0, the assumption is that either cluster is installed newly or upgraded from 2.x. In both cases, take over use case is not applicable in 3.0.