Bug 1329013

Summary: [ceph-ansible] : Installation is failing to create OSD
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Rachana Patel <racpatel>
Component: ceph-installerAssignee: Andrew Schoen <aschoen>
Status: CLOSED ERRATA QA Contact: Rachana Patel <racpatel>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 2CC: adeza, aschoen, ceph-eng-bugs, hnallurv, kdreyer, nthomas, sankarshan
Target Milestone: ---   
Target Release: 2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-1.0.5-4.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:49:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of anisble none

Description Rachana Patel 2016-04-20 22:44:11 UTC
Created attachment 1149214 [details]
output of anisble

Description of problem:
=======================
Ceph Installation using ansible is failing for OSD creation task


Version-Release number of selected component (if applicable):
==============================================================
ceph-ansible-1.0.5-3.el7.noarch
ceph-10.1.1-1.el7cp.x86_64



How reproducible:
================
always

Steps to Reproduce:
==================
1. prepare node for ceph installation.(follow prerequisites steps.)
2. run ansible command with below values
[root@magna042 ceph-ansible]# ansible-playbook site.yml -vv -i  /etc/ansible/hosts_1  --extra-vars '{"ceph_stable": true, "ceph_origin": "distro", "ceph_stable_rh_storage": true, "monitor_interface": "eno1", "journal_collocation": true, "devices": ["/dev/sdb", "/dev/sdc", "/dev/sdd"], "journal_size": 100, "public_network": "10.8.128.0/21", "cephx": true, "fetch_directory": "~/ceph-ansible-keys", "rbd_client_directories": true}' -u root


Actual results:
===============

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/root/site.retry

magna074                   : ok=68   changed=13   unreachable=0    failed=0   
magna084                   : ok=65   changed=10   unreachable=0    failed=1   
magna085                   : ok=65   changed=10   unreachable=0    failed=1   
magna090                   : ok=65   changed=10   unreachable=0    failed=1 


For complete output and error refer attachment

Comment 2 Alfredo Deza 2016-04-21 13:35:56 UTC
This looks like the OSD can't communicate with the mon. Following the troubleshooting guide for OSDs here http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

I ran `ceph health`:

[ubuntu@magna090 ~]$ sudo ceph health
2016-04-21 13:30:44.713595 7f56837ce700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2016-04-21 13:30:44.713609 7f56837ce700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2016-04-21 13:30:44.713611 7f56837ce700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound

There is no keyring. `ls /etc/ceph` indicates this is true, so next is to check the information in ceph.conf and to look for the monitor configuration. A well known issue is a "0.0.0.0" address for the monitor:

[ubuntu@magna090 ~]$ cat /etc/ceph/ceph.conf | grep -B 2 "mon addr"
[mon.magna074]
host = magna074
mon addr = 0.0.0.0

So the issue here is that "mon addr" for mon.magna074 in the magna090 ceph.conf points to an invalid address

Comment 3 Andrew Schoen 2016-04-21 14:16:22 UTC
This PR in ceph-ansible resolves this issue: https://github.com/ceph/ceph-ansible/pull/720

Comment 4 Andrew Schoen 2016-04-21 14:57:26 UTC
The fix was merged into ceph-ansible here: https://github.com/ceph/ceph-ansible/commit/9565418411e4b7fbd02f81677af6e54f78fb15ad

Comment 11 Rachana Patel 2016-07-27 02:01:08 UTC
verified with build - ceph-ansible-1.0.5-31.el7scon.noarch

Comment 13 errata-xmlrpc 2016-08-23 19:49:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754