Bug 1444444

Summary: OSD or MON creation and configuration is failing with latest ceph-ansible builds
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Daniel Horák <dahorak>
Component: ceph-ansibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2CC: adeza, aschoen, ceph-eng-bugs, gmeno, hnallurv, kdreyer, nthomas, sankarshan, seb
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-2.2.2-1.el7scon Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-19 13:17:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Horák 2017-04-21 10:52:46 UTC
Description of problem:
Ceph cluster creation is failing with latest ceph-ansible builds on one of following error:

OSD creation/configuration failure:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  RUNNING HANDLER [ceph.ceph-common : restart ceph osds daemon(s)] ***************
  fatal: [osd2.localdomain]: FAILED! => {
      "changed": true, 
      "cmd": [
          "/tmp/restart_osd_daemon.sh"
      ], 
      "delta": "0:20:10.020056", 
      "end": "2017-04-20 22:26:30.616511", 
      "failed": true, 
      "rc": 1, 
      "start": "2017-04-20 22:06:20.596455", 
      "warnings": []
  }
  STDOUT:
  Error with PGs, check config
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
or Monitor configuration failure
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2017-04-21 06:19:51.708796 7f7d8c243700  0 -- :/3634752360 >> IP:6789/0 pipe(0x7f7d8805ab00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7d8805c730).fault
  fatal: [mon1.localdomain]: FAILED! => {
      "changed": true, 
      "cmd": [
          "/tmp/restart_mon_daemon.sh"
      ], 
      "delta": "0:00:10.042757", 
      "end": "2017-04-21 06:20:00.846509", 
      "failed": true, 
      "rc": 1, 
      "start": "2017-04-21 06:19:50.803752", 
      "warnings": []
  }
  STDOUT:
  Error while restarting mon daemon
  STDERR:
  Job for ceph-mon failed. See "systemctl status ceph-mon" and "journalctl -xe" for details.
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):
  ansible-2.2.2.0-1.el7.noarch
  ceph-ansible-2.2.1-1.el7scon.noarch
  ceph-installer-1.3.0-1.el7scon.noarch
  rhscon-ceph-0.0.43-1.el7scon.x86_64
  rhscon-core-0.0.45-1.el7scon.x86_64
  rhscon-core-selinux-0.0.45-1.el7scon.noarch
  rhscon-ui-0.0.60-1.el7scon.noarch
    
How reproducible:
  It happened on every tested cluster, but no on each OSD/MON.

Steps to Reproduce:
1. Prepare and install machines for Ceph cluster managed by RHSCON 2 (Skyring).
2. Create Ceph cluster via RHSCON Web UI.
3. Check the "Create Cluster" task.
4. Check ceph-installer tasks.
  $ curl http://rhscon-server.example.com:8181/api/tasks/ | jq .

Actual results:
  Cluster creation task in RHSCon contains one of following errors:

    Failed to add mon(s) [mon3.localdomain]

  or

    OSD addition failed for [osd1.localdomain:map[/dev/vdc:/dev/vdb] osd2.localdomain:map[/dev/vdd:/dev/vdc]...

  Related tasks in ceph-ansible contains errors mentioned above in the description.

Expected results:
  Ceph cluster is properly created with all expected monitors and osds.

Additional info:

Comment 1 seb 2017-04-24 08:40:15 UTC
Currently addressed upstream here: https://github.com/ceph/ceph-ansible/pull/1455
This will be merged today and you will get it from the next package build. Stay tuned.

Comment 2 Christina Meno 2017-04-24 14:22:42 UTC
Merged to master, looking to get this backported to the stable-2.2 branch

Comment 3 seb 2017-04-24 15:18:18 UTC
Waiting for CI to pass on stable-2.2 with the backport: https://github.com/ceph/ceph-ansible/pull/1467

Comment 4 Ken Dreyer (Red Hat) 2017-04-25 15:19:43 UTC
We need v2.2.2 tagged upstream with this change.

Comment 5 seb 2017-04-25 15:40:29 UTC
Expect a new tag in the next hour.

Comment 8 Daniel Horák 2017-04-27 08:42:29 UTC
Tested and VERIFIED by automatic test suite on:
  calamari-server-1.5.6-1.el7cp.x86_64

Cluster creation works as expected.

>> VERIFIED

Comment 9 Daniel Horák 2017-04-27 08:43:16 UTC
(In reply to Daniel Horák from comment #8)
> Tested and VERIFIED by automatic test suite on:
>   calamari-server-1.5.6-1.el7cp.x86_64

Related package is of course:
  ceph-ansible-2.2.2-1.el7scon.noarch

> Cluster creation works as expected.
> 
> >> VERIFIED

Comment 11 errata-xmlrpc 2017-06-19 13:17:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1496