Description of problem: ======================= Upgrade cluster from ceph 1.3 to ceph 2.0 and then add mon using ceph-ansible. It was hung at task TASK: [ceph-mon | collect admin and bootstrap keys] Version-Release number of selected component (if applicable): ============================================================= ceph-ansible-1.0.5-27.el7scon.noarch ceph-mon-10.2.2-16.el7cp.x86_64 How reproducible: ================= always Steps to Reproduce: ==================== 1.follow Document - https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.3/installation-guide-for-red-hat-enterprise-linux/ create ceph cluster with 3 MON, 3 OSD, 1 admin node/calamari and one RGW node 2. upgrade ceph 1.3 to ceph 2,0 - follow Document https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/installation-guide-for-red-hat-enterprise-linux (chown method) 3. After upgrade Install the ceph-ansible-1.0.5-27.el7scon version of ceph-ansible.ceph-ansible files should be installed at /usr/share/ceph-ansible 4. Copy the sample `group_vars/all.sample`` to ``group_vars/all`` `cp /usr/share/ceph-ansible/group_vars/all.sample /usr/share/ceph-ansible/group_vars/all` 5. Set `generate_fsid: false` in `group_vars/all` Get your current cluster fsid with `ceph fsid` and set `fsid` accordingly in `group_vars/all` 6. Modify the ansible inventory at /etc/ansible/hosts to include your ceph hosts. Add monitors under a [mons] section, and OSDs under an [osds] section to identify their roles to Ansible. 7. from ansible node you should have passwordless ssh to all node in cluster 8. From the `/usr/share/ceph-ansible` directory run the playbook like so: `ansible-playbook take-over-existing-cluster.yml` (made changes to remove syntax error) 9. now add one more node in host file under mon section. Do all preflight operation on that node 10. modify group_vars/all and group_vars/osds to as mentioned in https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/installation-guide-for-ubuntu/#installing_ceph_ansible (except fetch_directory and fsid - do not set fetch_directory and fsid is set on previous steps) 11. run ansible-playbook site.yml -i /etc/ansible/hosts Actual results: =============== Installation was hung at task TASK: [ceph-mon | collect admin and bootstrap keys] Expected results: ================= It should install mon successfully. Additional info: ================ 1. on all MON nodes which were part of upgrade has same value for "/var/lib/ceph/mon/ceph-<ID>/keyring while newly added MON has different value for that file. (ceph-ansible generates new keyring for that one) 2. ceph -s or mon_status never shows newly added MON as part of quorom. 3. once we overwrite that file with file from other MON. new MON becomes part of cluster and quorom
*** Bug 1357291 has been marked as a duplicate of this bug. ***
PR opened upstream: https://github.com/ceph/ceph-ansible/pull/887
Moving out of 2.0 because this only affects adding mons to an upgraded cluster and we have a set of steps to work around it.
The upstream ticket at http://tracker.ceph.com/issues/16255 says the Ceph bug was fixed in Ceph v10.2.4. Would you please retest with the latest ceph-ansible and ceph packages?