Bug 1357292

Summary: [ceph-ansible] : unable to add mon after upgrade from ceph 1.3 to ceph 2.0 as it generates different keyring
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rachana Patel <racpatel>
Component: Ceph-AnsibleAssignee: Andrew Schoen <aschoen>
Status: CLOSED CURRENTRELEASE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: urgent Docs Contact: Bara Ancincova <bancinco>
Priority: unspecified    
Version: 3.0CC: adeza, anharris, aschoen, ceph-eng-bugs, gmeno, hnallurv, kdreyer, nthomas, racpatel, sankarshan
Target Milestone: rc   
Target Release: 3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-1.0.5-30.el7scon Doc Type: Known Issue
Doc Text:
.Ansible fails to add a monitor to an upgraded cluster An attempt to add a monitor to a cluster by using the Ansible automation application after upgrading the cluster from Red Hat Ceph Storage 1.3 to 2 fails on the following task: ---- TASK: [ceph-mon | collect admin and bootstrap keys] ---- This happens because the original monitor keyring was created with the `mds "allow"` capability while the newly added monitor requires a keyring with the `mds "allow *"` capability. To work around this issue, after installing the `ceph-mon` package, manually copy the administration keyring from an already existing monitor node to the new monitor node: ---- scp /etc/ceph/<cluster_name>.client.admin.keyring <target_host_name>:/etc/ceph ---- For example: ---- # scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph ---- Then use Ansible to add the monitor as described in the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide#adding_a_monitor_with_ansible[Adding a Monitor with Ansible] section of the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide[Administration Guide] for Red Hat Ceph Storage 2.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-29 13:52:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1322504, 1383917, 1412948    

Description Rachana Patel 2016-07-17 21:26:46 UTC
Description of problem:
=======================
Upgrade cluster from ceph 1.3 to ceph 2.0 and then add mon using ceph-ansible. It was hung at task 
TASK: [ceph-mon | collect admin and bootstrap keys]


Version-Release number of selected component (if applicable):
=============================================================
ceph-ansible-1.0.5-27.el7scon.noarch
ceph-mon-10.2.2-16.el7cp.x86_64


How reproducible:
=================
always

Steps to Reproduce:
====================
1.follow Document - https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.3/installation-guide-for-red-hat-enterprise-linux/
create ceph cluster with 3 MON, 3 OSD, 1 admin node/calamari and one RGW node

2. upgrade ceph 1.3 to ceph 2,0 - follow Document
https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/installation-guide-for-red-hat-enterprise-linux
(chown method)

3. After upgrade Install the ceph-ansible-1.0.5-27.el7scon version of ceph-ansible.ceph-ansible files should be installed at /usr/share/ceph-ansible

4. Copy the sample `group_vars/all.sample`` to ``group_vars/all``
`cp /usr/share/ceph-ansible/group_vars/all.sample /usr/share/ceph-ansible/group_vars/all`

5. Set `generate_fsid: false` in `group_vars/all`
Get your current cluster fsid with `ceph fsid` and set `fsid` accordingly in `group_vars/all`

6. Modify the ansible inventory at /etc/ansible/hosts to include your ceph hosts. Add monitors under a [mons] section, and OSDs under an [osds] section to identify their roles to Ansible.

7. from ansible node you should have passwordless ssh to all node in cluster

8. From the `/usr/share/ceph-ansible` directory run the playbook like so: `ansible-playbook take-over-existing-cluster.yml` (made changes to remove syntax error)

9. now add one more node in host file under mon section. Do all preflight operation on that node

10. modify group_vars/all and group_vars/osds to as mentioned in https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/installation-guide-for-ubuntu/#installing_ceph_ansible
(except fetch_directory and fsid - do not set fetch_directory and fsid is set on previous steps)

11. run ansible-playbook site.yml -i /etc/ansible/hosts


Actual results:
===============
Installation was hung at task
TASK: [ceph-mon | collect admin and bootstrap keys]


Expected results:
=================
It should install mon successfully.


Additional info:
================
1. on all MON nodes which were part of upgrade has same value for 
"/var/lib/ceph/mon/ceph-<ID>/keyring

while newly added MON has different value for that file. (ceph-ansible generates new keyring for that one)


2. ceph -s or mon_status never shows newly added MON as part of quorom.

3. once we overwrite that file with file from other MON. new MON becomes part of cluster and quorom

Comment 2 Ken Dreyer (Red Hat) 2016-07-18 13:33:48 UTC
*** Bug 1357291 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Schoen 2016-07-18 15:13:53 UTC
PR opened upstream: https://github.com/ceph/ceph-ansible/pull/887

Comment 13 Christina Meno 2016-07-22 19:37:08 UTC
Moving out of 2.0 because this only affects adding mons to an upgraded cluster and we have a set of steps to work around it.

Comment 18 Ken Dreyer (Red Hat) 2017-03-03 16:26:47 UTC
The upstream ticket at http://tracker.ceph.com/issues/16255 says the Ceph bug was fixed in Ceph v10.2.4. Would you please retest with the latest ceph-ansible and ceph packages?