1335938 – Ceph-installer report success even if the OSDs are not created successfully

Bug 1335938 - Ceph-installer report success even if the OSDs are not created successfully

Summary: Ceph-installer report success even if the OSDs are not created successfully

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	ceph-installer
Sub Component:
Version:	2
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	2
Assignee:	Alfredo Deza
QA Contact:	Daniel Horák
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1335913 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-13 15:02 UTC by Nishanth Thomas
Modified:	2016-08-23 19:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ceph-ansible-1.0.5-14.el7scon
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:50:38 UTC
Embargoed:

Attachments	(Terms of Use)
"ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9" output (107.45 KB, text/plain) 2016-06-02 12:15 UTC, Daniel Horák	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:1754	0	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Description Nishanth Thomas 2016-05-13 15:02:33 UTC

Description of problem:

 Ceph-installer reports that the OSDs are created successfully(task returns success) but actually the OSDs are not created(ceph -s does not list the OSDs)

Version-Release number of selected component (if applicable):
http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-29.2/RHSCON-2.repo

How reproducible:
not always

Steps to Reproduce:
1.Have more number of disks(10) on the node and create OSDs one after another

Comment 3 Daniel Horák 2016-05-17 12:19:13 UTC

*** Bug 1335913 has been marked as a duplicate of this bug. ***

Comment 4 Alfredo Deza 2016-05-17 14:33:08 UTC

Upstream pull request opened: https://github.com/ceph/ceph-ansible/pull/794

Comment 5 Alfredo Deza 2016-05-18 12:59:59 UTC

Merged upstream. Pushed 52f73f30c5b1e350d4965d4d82c456d2d9c39500 to downstream.

Comment 9 Nishanth Thomas 2016-05-27 10:22:52 UTC

This issue seen on the latest builds

Comment 10 Ken Dreyer (Red Hat) 2016-05-31 17:18:39 UTC

Nishanth,

Would you please provide the following information?

* What versions of the products are being used?
* What are the exact steps reproduce?
* Relevant log output to the issue and products (e.g. ansible output,
ceph-installer task information, /var/log/ceph/* log, systemd log
output from osds/mons)
* If an OSD is related to the issue, we expect a look at
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

Comment 11 Nishanth Thomas 2016-06-01 15:26:05 UTC

(In reply to Ken Dreyer (Red Hat) from comment #10)
> Nishanth,
> 
> Would you please provide the following information?
> 
> * What versions of the products are being used?

ceph-ansible-1.0.5-15.el7scon.noarch.rpm           20-May-2016 17:13    108K
ceph-installer-1.0.11-1.el7scon.noarch.rpm         18-May-2016 20:55     75K

> * What are the exact steps reproduce?

create a cluster with more than 8 disks per node. Also provide custom clustername(TestCluster10)

> * Relevant log output to the issue and products (e.g. ansible output,
> ceph-installer task information, /var/log/ceph/* log, systemd log
> output from osds/mons)

Not available as the setup is cleaned up

> * If an OSD is related to the issue, we expect a look at
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

Comment 12 Nishanth Thomas 2016-06-01 15:28:20 UTC

I tried to reproduce this issue couple of times today but no success. So I am closing this for now and will re-open if found again

Comment 13 Daniel Horák 2016-06-02 12:14:51 UTC

Seems like I was able reproduce it again.

Related packages:
  ceph-ansible-1.0.5-18.el7scon.noarch
  ceph-installer-1.0.11-1.el7scon.noarch
  
  ceph-base-10.2.1-12.el7cp.x86_64
  ceph-common-10.2.1-12.el7cp.x86_64
  ceph-osd-10.2.1-12.el7cp.x86_64
  ceph-selinux-10.2.1-12.el7cp.x86_64
  libcephfs1-10.2.1-12.el7cp.x86_64
  python-cephfs-10.2.1-12.el7cp.x86_64

Here is visible, that on /dev/vdd (on node1) is no OSD, but it should be there:
# ceph-disk list
  /dev/vda :
   /dev/vda1 other, swap
   /dev/vda2 other, xfs, mounted on /
  /dev/vdb :
   /dev/vdb2 ceph journal, for /dev/vdc1
   /dev/vdb1 ceph journal, for /dev/vde1
  /dev/vdc :
   /dev/vdc1 ceph data, active, cluster TestClusterA, osd.1, journal /dev/vdb2
  /dev/vdd other, unknown
  /dev/vde :
   /dev/vde1 ceph data, active, cluster TestClusterA, osd.0, journal /dev/vdb1
  /dev/vdf other, unknown
  /dev/vdg other, unknown

Related Ceph installer task was submitted this way:
  2016-06-02T10:47:09.437+02:00 INFO     api.go:174 Configure] admin:670b65a9-fd32-4971-9afd-202ec4481aa6-Started configuration on node: jenkins-usm1-node1.localdomain. TaskId: e1e52f53-3d4b-489e-84c4-fdaa88ad06a9. Request Data: {"cluster_name":"TestClusterA","cluster_network":"172.16.176.0/24","devices":{"/dev/vdd":"/dev/vdb"},"fsid":"50261f74-e019-48bf-a584-af9bdfd60200","host":"jenkins-usm1-node1.localdomain","journal_size":5120,"monitors":[{"address":"172.16.176.83","host":"jenkins-usm1-mon1.localdomain"},{"address":"172.16.176.84","host":"jenkins-usm1-mon2.localdomain"},{"address":"172.16.176.85","host":"jenkins-usm1-mon3.localdomain"}],"public_network":"172.16.176.0/24","redhat_storage":true}. Route: http://localhost:8181/api/osd/configure
  
I'll post the ceph-installer task log as an attachment (# ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9).

I'll try to collect more data and post it there, also if it helps direct access on the affected machines, please let me know.

Comment 14 Daniel Horák 2016-06-02 12:15:35 UTC

Created attachment 1164043 [details]
"ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9" output

Comment 18 Daniel Horák 2016-06-02 13:21:29 UTC

The issue described in comment 13 have different root cause, described in new Bug 1342117.

I'll test this bug accordingly to the original scenario with not "correctly" cleaned data disks.

Comment 19 Daniel Horák 2016-08-02 14:43:59 UTC

Tested on multiple scenarios in the last weeks, failed OSD creation task is properly reported.

Latest testing on USM Server/ceph-installer server (RHEL 7.2):
  ceph-ansible-1.0.5-31.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  rhscon-ceph-0.0.39-1.el7scon.x86_64
  rhscon-core-0.0.39-1.el7scon.x86_64
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  rhscon-ui-0.0.51-1.el7scon.noarch
  salt-2015.5.5-1.el7.noarch
  salt-master-2015.5.5-1.el7.noarch
  salt-selinux-0.0.39-1.el7scon.noarch

  Ceph node (RHEL 7.2):
  ceph-base-10.2.2-32.el7cp.x86_64
  ceph-common-10.2.2-32.el7cp.x86_64
  ceph-osd-10.2.2-32.el7cp.x86_64
  ceph-selinux-10.2.2-32.el7cp.x86_64
  libcephfs1-10.2.2-32.el7cp.x86_64
  python-cephfs-10.2.2-32.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  salt-2015.5.5-1.el7.noarch
  salt-minion-2015.5.5-1.el7.noarch
  salt-selinux-0.0.39-1.el7scon.noarch

>> VERIFIED

Comment 21 errata-xmlrpc 2016-08-23 19:50:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.