Description of problem: Ceph-installer reports that the OSDs are created successfully(task returns success) but actually the OSDs are not created(ceph -s does not list the OSDs) Version-Release number of selected component (if applicable): http://puddle.ceph.redhat.com/puddles/rhscon/2/2016-04-29.2/RHSCON-2.repo How reproducible: not always Steps to Reproduce: 1.Have more number of disks(10) on the node and create OSDs one after another
*** Bug 1335913 has been marked as a duplicate of this bug. ***
Upstream pull request opened: https://github.com/ceph/ceph-ansible/pull/794
Merged upstream. Pushed 52f73f30c5b1e350d4965d4d82c456d2d9c39500 to downstream.
This issue seen on the latest builds
Nishanth, Would you please provide the following information? * What versions of the products are being used? * What are the exact steps reproduce? * Relevant log output to the issue and products (e.g. ansible output, ceph-installer task information, /var/log/ceph/* log, systemd log output from osds/mons) * If an OSD is related to the issue, we expect a look at http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
(In reply to Ken Dreyer (Red Hat) from comment #10) > Nishanth, > > Would you please provide the following information? > > * What versions of the products are being used? ceph-ansible-1.0.5-15.el7scon.noarch.rpm 20-May-2016 17:13 108K ceph-installer-1.0.11-1.el7scon.noarch.rpm 18-May-2016 20:55 75K > * What are the exact steps reproduce? create a cluster with more than 8 disks per node. Also provide custom clustername(TestCluster10) > * Relevant log output to the issue and products (e.g. ansible output, > ceph-installer task information, /var/log/ceph/* log, systemd log > output from osds/mons) Not available as the setup is cleaned up > * If an OSD is related to the issue, we expect a look at > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
I tried to reproduce this issue couple of times today but no success. So I am closing this for now and will re-open if found again
Seems like I was able reproduce it again. Related packages: ceph-ansible-1.0.5-18.el7scon.noarch ceph-installer-1.0.11-1.el7scon.noarch ceph-base-10.2.1-12.el7cp.x86_64 ceph-common-10.2.1-12.el7cp.x86_64 ceph-osd-10.2.1-12.el7cp.x86_64 ceph-selinux-10.2.1-12.el7cp.x86_64 libcephfs1-10.2.1-12.el7cp.x86_64 python-cephfs-10.2.1-12.el7cp.x86_64 Here is visible, that on /dev/vdd (on node1) is no OSD, but it should be there: # ceph-disk list /dev/vda : /dev/vda1 other, swap /dev/vda2 other, xfs, mounted on / /dev/vdb : /dev/vdb2 ceph journal, for /dev/vdc1 /dev/vdb1 ceph journal, for /dev/vde1 /dev/vdc : /dev/vdc1 ceph data, active, cluster TestClusterA, osd.1, journal /dev/vdb2 /dev/vdd other, unknown /dev/vde : /dev/vde1 ceph data, active, cluster TestClusterA, osd.0, journal /dev/vdb1 /dev/vdf other, unknown /dev/vdg other, unknown Related Ceph installer task was submitted this way: 2016-06-02T10:47:09.437+02:00 INFO api.go:174 Configure] admin:670b65a9-fd32-4971-9afd-202ec4481aa6-Started configuration on node: jenkins-usm1-node1.localdomain. TaskId: e1e52f53-3d4b-489e-84c4-fdaa88ad06a9. Request Data: {"cluster_name":"TestClusterA","cluster_network":"172.16.176.0/24","devices":{"/dev/vdd":"/dev/vdb"},"fsid":"50261f74-e019-48bf-a584-af9bdfd60200","host":"jenkins-usm1-node1.localdomain","journal_size":5120,"monitors":[{"address":"172.16.176.83","host":"jenkins-usm1-mon1.localdomain"},{"address":"172.16.176.84","host":"jenkins-usm1-mon2.localdomain"},{"address":"172.16.176.85","host":"jenkins-usm1-mon3.localdomain"}],"public_network":"172.16.176.0/24","redhat_storage":true}. Route: http://localhost:8181/api/osd/configure I'll post the ceph-installer task log as an attachment (# ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9). I'll try to collect more data and post it there, also if it helps direct access on the affected machines, please let me know.
Created attachment 1164043 [details] "ceph-installer task e1e52f53-3d4b-489e-84c4-fdaa88ad06a9" output
The issue described in comment 13 have different root cause, described in new Bug 1342117. I'll test this bug accordingly to the original scenario with not "correctly" cleaned data disks.
Tested on multiple scenarios in the last weeks, failed OSD creation task is properly reported. Latest testing on USM Server/ceph-installer server (RHEL 7.2): ceph-ansible-1.0.5-31.el7scon.noarch ceph-installer-1.0.14-1.el7scon.noarch rhscon-ceph-0.0.39-1.el7scon.x86_64 rhscon-core-0.0.39-1.el7scon.x86_64 rhscon-core-selinux-0.0.39-1.el7scon.noarch rhscon-ui-0.0.51-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-master-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch Ceph node (RHEL 7.2): ceph-base-10.2.2-32.el7cp.x86_64 ceph-common-10.2.2-32.el7cp.x86_64 ceph-osd-10.2.2-32.el7cp.x86_64 ceph-selinux-10.2.2-32.el7cp.x86_64 libcephfs1-10.2.2-32.el7cp.x86_64 python-cephfs-10.2.2-32.el7cp.x86_64 rhscon-agent-0.0.16-1.el7scon.noarch rhscon-core-selinux-0.0.39-1.el7scon.noarch salt-2015.5.5-1.el7.noarch salt-minion-2015.5.5-1.el7.noarch salt-selinux-0.0.39-1.el7scon.noarch >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1754