Bug 1442265 - ceph-disk's systemctl enable --runtime change causes directory-backed OSDs to not start on boot
Summary: ceph-disk's systemctl enable --runtime change causes directory-backed OSDs to...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Disk
Version: 2.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 2.3
Assignee: Loic Dachary
QA Contact: John Fulton
URL:
Whiteboard:
Keywords:
: 1439223 1457612 (view as bug list)
Depends On:
Blocks: 1439223
TreeView+ depends on / blocked
 
Reported: 2017-04-13 21:06 UTC by John Fulton
Modified: 2017-07-30 14:58 UTC (History)
11 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-06-19 13:31:58 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1497 normal SHIPPED_LIVE Red Hat Ceph Storage 2.3 bug fix and enhancement update 2017-06-19 17:24:11 UTC
Ceph Project Bug Tracker 19628 None None None 2017-04-13 21:22 UTC

Description John Fulton 2017-04-13 21:06:10 UTC
When using a version of ceph-disk which contains the change [1] to enable --runtime for ceph-osd systemd units, directory-backed OSDs do not start on boot. If I modify the same system to use disk-backed OSDs then I do not have this problem. 

How to reproduce:
- Deploy a directory-backed OSD 
- Observe on first boot, on an OSD node, that OSDs are running
  but that their target is in /run not /etc, which will cause
  them to not start on reboot if they are directory-based [2] 
- Reboot the system and observe that the OSDs are not running

Workaround:
Replace ceph-disk's main.py with the version [3] that proceeded the --runtime change [1] (and  python -O -m compileall /usr/lib/python2.7/site-packages/ceph_disk/). 

Version:
ceph version 10.2.5-37.el7cp (033f137cde8573cfc5a4662b4ed6a63b8a8d1464)
Red Hat Enterprise Linux Server release 7.3 (Maipo)

[root@compute-0 ~]# rpm -qa | grep ceph | sort | uniq
ceph-base-10.2.5-37.el7cp.x86_64
ceph-common-10.2.5-37.el7cp.x86_64
ceph-mds-10.2.5-37.el7cp.x86_64
ceph-mon-10.2.5-37.el7cp.x86_64
ceph-osd-10.2.5-37.el7cp.x86_64
ceph-radosgw-10.2.5-37.el7cp.x86_64
ceph-selinux-10.2.5-37.el7cp.x86_64
collectd-ceph-5.7.0-4.el7ost.x86_64
libcephfs1-10.2.5-37.el7cp.x86_64
puppet-ceph-2.3.0-3.el7ost.noarch
python-cephfs-10.2.5-37.el7cp.x86_64
[root@compute-0 ~]# 
[root@compute-0 ~]# grep runtime /usr/lib/python2.7/site-packages/ceph_disk/main.py | wc -l
2
[root@compute-0 ~]# md5sum /usr/lib/python2.7/site-packages/ceph_disk/main.py
ac08de47454124bd3d3d0a23478194cf  /usr/lib/python2.7/site-packages/ceph_disk/main.py
[root@compute-0 ~]# 

Note: I encountered this on a deployment done by OSP-Director using the OSP11 puddle. 

Footnotes:
[1] https://github.com/ceph/ceph/commit/539385b143feee3905dceaf7a8faaced42f2d3c6

[2] 
[root@overcloud-osd-compute-1 ~]# ls /run/systemd/system/ceph-osd.target.wants/
ceph-osd@2.service
[root@overcloud-osd-compute-1 ~]# ls /etc/systemd/system/ceph-osd.target.wants/
ls: cannot access /etc/systemd/system/ceph-osd.target.wants/: No such file or directory
[root@overcloud-osd-compute-1 ~]# 

[3] https://raw.githubusercontent.com/ceph/ceph/72f0b2aa1eb4b7b2a2222c2847d26f99400a8374/src/ceph-disk/ceph_disk/main.py

Comment 2 Loic Dachary 2017-04-13 22:03:57 UTC
Testing a fix at https://github.com/ceph/ceph/pull/14546

Comment 3 Loic Dachary 2017-04-14 15:46:32 UTC
Would you be so kind as to verify that applying the patch at https://github.com/ceph/ceph/pull/14546 fixes the problem ? It has been tested but it would be good to have your confirmation before merging it :-)

Comment 4 jomurphy 2017-04-17 13:19:18 UTC
*** Bug 1439223 has been marked as a duplicate of this bug. ***

Comment 5 John Fulton 2017-04-17 13:53:06 UTC
I confirm that https://github.com/ceph/ceph/pull/14546 fixes the problem. Thank you

Comment 7 Ian Colle 2017-04-19 15:24:15 UTC
John, QE would like your help in testing this fix. Can you do that?

Comment 8 John Fulton 2017-04-19 18:53:31 UTC
I have empirically tested that the patch fixes the bug with the
following process: 

- Apply the patch [1] 
- Deploy a directory-backed OSD 
- Observe on first boot, on an OSD node, that OSDs are running
  and that their target is in /etc not /run (so they _should_
  restart on reboot)
- Reboot the system and observe that the OSDs _are_ running


[1] Details on how patch was applied:

  pushd /usr/lib/python2.7/site-packages/ceph_disk/
  mv main.py /root/ceph-disk-main.py
  curl https://raw.githubusercontent.com/ceph/ceph/f425a127b7487d2093c8c943f0bcdec3d673d601/src/ceph-disk/ceph_disk/main.py > main.py
  popd 
  python -O -m compileall /usr/lib/python2.7/site-packages/ceph_disk/

Comment 19 errata-xmlrpc 2017-06-19 13:31:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497

Comment 20 John Fulton 2017-07-02 19:05:55 UTC
*** Bug 1457612 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.