When using a version of ceph-disk which contains the change [1] to enable --runtime for ceph-osd systemd units, directory-backed OSDs do not start on boot. If I modify the same system to use disk-backed OSDs then I do not have this problem. How to reproduce: - Deploy a directory-backed OSD - Observe on first boot, on an OSD node, that OSDs are running but that their target is in /run not /etc, which will cause them to not start on reboot if they are directory-based [2] - Reboot the system and observe that the OSDs are not running Workaround: Replace ceph-disk's main.py with the version [3] that proceeded the --runtime change [1] (and python -O -m compileall /usr/lib/python2.7/site-packages/ceph_disk/). Version: ceph version 10.2.5-37.el7cp (033f137cde8573cfc5a4662b4ed6a63b8a8d1464) Red Hat Enterprise Linux Server release 7.3 (Maipo) [root@compute-0 ~]# rpm -qa | grep ceph | sort | uniq ceph-base-10.2.5-37.el7cp.x86_64 ceph-common-10.2.5-37.el7cp.x86_64 ceph-mds-10.2.5-37.el7cp.x86_64 ceph-mon-10.2.5-37.el7cp.x86_64 ceph-osd-10.2.5-37.el7cp.x86_64 ceph-radosgw-10.2.5-37.el7cp.x86_64 ceph-selinux-10.2.5-37.el7cp.x86_64 collectd-ceph-5.7.0-4.el7ost.x86_64 libcephfs1-10.2.5-37.el7cp.x86_64 puppet-ceph-2.3.0-3.el7ost.noarch python-cephfs-10.2.5-37.el7cp.x86_64 [root@compute-0 ~]# [root@compute-0 ~]# grep runtime /usr/lib/python2.7/site-packages/ceph_disk/main.py | wc -l 2 [root@compute-0 ~]# md5sum /usr/lib/python2.7/site-packages/ceph_disk/main.py ac08de47454124bd3d3d0a23478194cf /usr/lib/python2.7/site-packages/ceph_disk/main.py [root@compute-0 ~]# Note: I encountered this on a deployment done by OSP-Director using the OSP11 puddle. Footnotes: [1] https://github.com/ceph/ceph/commit/539385b143feee3905dceaf7a8faaced42f2d3c6 [2] [root@overcloud-osd-compute-1 ~]# ls /run/systemd/system/ceph-osd.target.wants/ ceph-osd [root@overcloud-osd-compute-1 ~]# ls /etc/systemd/system/ceph-osd.target.wants/ ls: cannot access /etc/systemd/system/ceph-osd.target.wants/: No such file or directory [root@overcloud-osd-compute-1 ~]# [3] https://raw.githubusercontent.com/ceph/ceph/72f0b2aa1eb4b7b2a2222c2847d26f99400a8374/src/ceph-disk/ceph_disk/main.py
Testing a fix at https://github.com/ceph/ceph/pull/14546
Would you be so kind as to verify that applying the patch at https://github.com/ceph/ceph/pull/14546 fixes the problem ? It has been tested but it would be good to have your confirmation before merging it :-)
*** Bug 1439223 has been marked as a duplicate of this bug. ***
I confirm that https://github.com/ceph/ceph/pull/14546 fixes the problem. Thank you
John, QE would like your help in testing this fix. Can you do that?
I have empirically tested that the patch fixes the bug with the following process: - Apply the patch [1] - Deploy a directory-backed OSD - Observe on first boot, on an OSD node, that OSDs are running and that their target is in /etc not /run (so they _should_ restart on reboot) - Reboot the system and observe that the OSDs _are_ running [1] Details on how patch was applied: pushd /usr/lib/python2.7/site-packages/ceph_disk/ mv main.py /root/ceph-disk-main.py curl https://raw.githubusercontent.com/ceph/ceph/f425a127b7487d2093c8c943f0bcdec3d673d601/src/ceph-disk/ceph_disk/main.py > main.py popd python -O -m compileall /usr/lib/python2.7/site-packages/ceph_disk/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1497
*** Bug 1457612 has been marked as a duplicate of this bug. ***