Bug 1335938
Summary: | Ceph-installer report success even if the OSDs are not created successfully | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Storage Console | Reporter: | Nishanth Thomas <nthomas> | ||||
Component: | ceph-installer | Assignee: | Alfredo Deza <adeza> | ||||
Status: | CLOSED ERRATA | QA Contact: | Daniel Horák <dahorak> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 2 | CC: | adeza, aschoen, ceph-eng-bugs, dahorak, kdreyer, mkudlej, nthomas, sankarshan, sds-qe-bugs | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | 2 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | ceph-ansible-1.0.5-14.el7scon | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-08-23 19:50:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Nishanth Thomas
2016-05-13 15:02:33 UTC
*** Bug 1335913 has been marked as a duplicate of this bug. *** Upstream pull request opened: https://github.com/ceph/ceph-ansible/pull/794 Merged upstream. Pushed 52f73f30c5b1e350d4965d4d82c456d2d9c39500 to downstream. This issue seen on the latest builds Nishanth, Would you please provide the following information? * What versions of the products are being used? * What are the exact steps reproduce? * Relevant log output to the issue and products (e.g. ansible output, ceph-installer task information, /var/log/ceph/* log, systemd log output from osds/mons) * If an OSD is related to the issue, we expect a look at http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ (In reply to Ken Dreyer (Red Hat) from comment #10) > Nishanth, > > Would you please provide the following information? > > * What versions of the products are being used? ceph-ansible-1.0.5-15.el7scon.noarch.rpm 20-May-2016 17:13 108K ceph-installer-1.0.11-1.el7scon.noarch.rpm 18-May-2016 20:55 75K > * What are the exact steps reproduce? create a cluster with more than 8 disks per node. Also provide custom clustername(TestCluster10) > * Relevant log output to the issue and products (e.g. ansible output, > ceph-installer task information, /var/log/ceph/* log, systemd log > output from osds/mons) Not available as the setup is cleaned up > * If an OSD is related to the issue, we expect a look at > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ I tried to reproduce this issue couple of times today but no success. So I am closing this for now and will re-open if found again Seems like I was able reproduce it again. Related packages: ceph-ansible-1.0.5-18.el7scon.noarch ceph-installer-1.0.11-1.el7scon.noarch ceph-base-10.2.1-12.el7cp.x86_64 ceph-common-10.2.1-12.el7cp.x86_64 ceph-osd-10.2.1-12.el7cp.x86_64 ceph-selinux-10.2.1-12.el7cp.x86_64 libcephfs1-10.2.1-12.el7cp.x86_64 python-cephfs-10.2.1-12.el7cp.x86_64 Here is visible, that on /dev/vdd (on node1) is no OSD, but it should be there: # ceph-disk list /dev/vda : /dev/vda1 other, swap /dev/vda2 other, xfs, mounted on / /dev/vdb : /dev/vdb2 ceph journal, for /dev/vdc1 /dev/vdb1 ceph journal, for /dev/vde1 /dev/vdc : /dev/vdc1 ceph data, active, cluster TestClusterA, osd.1, journal /dev/vdb2 /dev/vdd other, unknown /dev/vde : /dev/vde1 ceph data, active, cluster TestClusterA, osd.0, journal /dev/vdb1 /dev/vdf other, unknown /dev/vdg other, unknown Related Ceph installer task was submitted this way: 2016-06-02T10:47:09.437+02:00 INFO api.go:174 Configure] admin:670b65a9-fd32-4971-9afd-202ec4481aa6-Started configuration on node: jenkins-usm1-node1.localdomain. TaskId: e1e52f53-3d4b-489e-84c4-fdaa88ad06a9. Request Data: {"cluster_name":"TestClusterA","cluster_network":"172.16.176.0/24","devices":{"/dev/vdd":"/dev/vdb"},"fsid":"50261f74-e019-48bf-a584-af9bdfd60200","host":"jenkins-usm1-node1.localdomain","journal_size":5120,"monitors":[{"address":"172.16.176.83","host":"jenkins-usm1-mon1.localdomain"},{"address":"172.16.176.84","host":"jenkins-usm1-mon2.localdomain"},{"address":"172.16.176.85","host":"jenkins-usm1-mon3.localdomain"}],"public_network":"172.16.176.0/24","redhat_storage":true}. Route: http://localhost:8181/api/osd/configure I'll post the ceph-installer task log as an attachment (# ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9). I'll try to collect more data and post it there, also if it helps direct access on the affected machines, please let me know. Created attachment 1164043 [details]
"ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9" output
The issue described in comment 13 have different root cause, described in new Bug 1342117. I'll test this bug accordingly to the original scenario with not "correctly" cleaned data disks. Tested on multiple scenarios in the last weeks, failed OSD creation task is properly reported.
Latest testing on USM Server/ceph-installer server (RHEL 7.2):
ceph-ansible-1.0.5-31.el7scon.noarch
ceph-installer-1.0.14-1.el7scon.noarch
rhscon-ceph-0.0.39-1.el7scon.x86_64
rhscon-core-0.0.39-1.el7scon.x86_64
rhscon-core-selinux-0.0.39-1.el7scon.noarch
rhscon-ui-0.0.51-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-master-2015.5.5-1.el7.noarch
salt-selinux-0.0.39-1.el7scon.noarch
Ceph node (RHEL 7.2):
ceph-base-10.2.2-32.el7cp.x86_64
ceph-common-10.2.2-32.el7cp.x86_64
ceph-osd-10.2.2-32.el7cp.x86_64
ceph-selinux-10.2.2-32.el7cp.x86_64
libcephfs1-10.2.2-32.el7cp.x86_64
python-cephfs-10.2.2-32.el7cp.x86_64
rhscon-agent-0.0.16-1.el7scon.noarch
rhscon-core-selinux-0.0.39-1.el7scon.noarch
salt-2015.5.5-1.el7.noarch
salt-minion-2015.5.5-1.el7.noarch
salt-selinux-0.0.39-1.el7scon.noarch
>> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1754 |