Bug 1444444 - OSD or MON creation and configuration is failing with latest ceph-ansible builds
Summary: OSD or MON creation and configuration is failing with latest ceph-ansible builds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: ceph-ansible
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2
Assignee: Sébastien Han
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-21 10:52 UTC by Daniel Horák
Modified: 2017-06-19 13:17 UTC (History)
9 users (show)

Fixed In Version: ceph-ansible-2.2.2-1.el7scon
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-19 13:17:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 1455 0 None None None 2017-04-21 14:12:03 UTC
Red Hat Product Errata RHBA-2017:1496 0 normal SHIPPED_LIVE ceph-installer, ceph-ansible, and ceph-iscsi-ansible update 2017-06-19 17:14:02 UTC

Description Daniel Horák 2017-04-21 10:52:46 UTC
Description of problem:
Ceph cluster creation is failing with latest ceph-ansible builds on one of following error:

OSD creation/configuration failure:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  RUNNING HANDLER [ceph.ceph-common : restart ceph osds daemon(s)] ***************
  fatal: [osd2.localdomain]: FAILED! => {
      "changed": true, 
      "cmd": [
          "/tmp/restart_osd_daemon.sh"
      ], 
      "delta": "0:20:10.020056", 
      "end": "2017-04-20 22:26:30.616511", 
      "failed": true, 
      "rc": 1, 
      "start": "2017-04-20 22:06:20.596455", 
      "warnings": []
  }
  STDOUT:
  Error with PGs, check config
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
or Monitor configuration failure
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  2017-04-21 06:19:51.708796 7f7d8c243700  0 -- :/3634752360 >> IP:6789/0 pipe(0x7f7d8805ab00 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7d8805c730).fault
  fatal: [mon1.localdomain]: FAILED! => {
      "changed": true, 
      "cmd": [
          "/tmp/restart_mon_daemon.sh"
      ], 
      "delta": "0:00:10.042757", 
      "end": "2017-04-21 06:20:00.846509", 
      "failed": true, 
      "rc": 1, 
      "start": "2017-04-21 06:19:50.803752", 
      "warnings": []
  }
  STDOUT:
  Error while restarting mon daemon
  STDERR:
  Job for ceph-mon failed. See "systemctl status ceph-mon" and "journalctl -xe" for details.
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):
  ansible-2.2.2.0-1.el7.noarch
  ceph-ansible-2.2.1-1.el7scon.noarch
  ceph-installer-1.3.0-1.el7scon.noarch
  rhscon-ceph-0.0.43-1.el7scon.x86_64
  rhscon-core-0.0.45-1.el7scon.x86_64
  rhscon-core-selinux-0.0.45-1.el7scon.noarch
  rhscon-ui-0.0.60-1.el7scon.noarch
    
How reproducible:
  It happened on every tested cluster, but no on each OSD/MON.

Steps to Reproduce:
1. Prepare and install machines for Ceph cluster managed by RHSCON 2 (Skyring).
2. Create Ceph cluster via RHSCON Web UI.
3. Check the "Create Cluster" task.
4. Check ceph-installer tasks.
  $ curl http://rhscon-server.example.com:8181/api/tasks/ | jq .

Actual results:
  Cluster creation task in RHSCon contains one of following errors:

    Failed to add mon(s) [mon3.localdomain]

  or

    OSD addition failed for [osd1.localdomain:map[/dev/vdc:/dev/vdb] osd2.localdomain:map[/dev/vdd:/dev/vdc]...

  Related tasks in ceph-ansible contains errors mentioned above in the description.

Expected results:
  Ceph cluster is properly created with all expected monitors and osds.

Additional info:

Comment 1 seb 2017-04-24 08:40:15 UTC
Currently addressed upstream here: https://github.com/ceph/ceph-ansible/pull/1455
This will be merged today and you will get it from the next package build. Stay tuned.

Comment 2 Christina Meno 2017-04-24 14:22:42 UTC
Merged to master, looking to get this backported to the stable-2.2 branch

Comment 3 seb 2017-04-24 15:18:18 UTC
Waiting for CI to pass on stable-2.2 with the backport: https://github.com/ceph/ceph-ansible/pull/1467

Comment 4 Ken Dreyer (Red Hat) 2017-04-25 15:19:43 UTC
We need v2.2.2 tagged upstream with this change.

Comment 5 seb 2017-04-25 15:40:29 UTC
Expect a new tag in the next hour.

Comment 8 Daniel Horák 2017-04-27 08:42:29 UTC
Tested and VERIFIED by automatic test suite on:
  calamari-server-1.5.6-1.el7cp.x86_64

Cluster creation works as expected.

>> VERIFIED

Comment 9 Daniel Horák 2017-04-27 08:43:16 UTC
(In reply to Daniel Horák from comment #8)
> Tested and VERIFIED by automatic test suite on:
>   calamari-server-1.5.6-1.el7cp.x86_64

Related package is of course:
  ceph-ansible-2.2.2-1.el7scon.noarch

> Cluster creation works as expected.
> 
> >> VERIFIED

Comment 11 errata-xmlrpc 2017-06-19 13:17:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1496


Note You need to log in before you can comment on or make changes to this bug.