Bug 1442265

Summary: ceph-disk's systemctl enable --runtime change causes directory-backed OSDs to not start on boot
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: Ceph-DiskAssignee: Loic Dachary <ldachary>
Status: CLOSED ERRATA QA Contact: John Fulton <johfulto>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.2CC: fbaudin, gfidente, gmeno, hnallurv, icolle, jcall, jefbrown, johfulto, kdreyer, sreichar, zgreenbe
Target Milestone: rc   
Target Release: 2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.7-13.el7cp Ubuntu: ceph_10.2.7-15redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-19 13:31:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1439223    

Description John Fulton 2017-04-13 21:06:10 UTC
When using a version of ceph-disk which contains the change [1] to enable --runtime for ceph-osd systemd units, directory-backed OSDs do not start on boot. If I modify the same system to use disk-backed OSDs then I do not have this problem. 

How to reproduce:
- Deploy a directory-backed OSD 
- Observe on first boot, on an OSD node, that OSDs are running
  but that their target is in /run not /etc, which will cause
  them to not start on reboot if they are directory-based [2] 
- Reboot the system and observe that the OSDs are not running

Workaround:
Replace ceph-disk's main.py with the version [3] that proceeded the --runtime change [1] (and  python -O -m compileall /usr/lib/python2.7/site-packages/ceph_disk/). 

Version:
ceph version 10.2.5-37.el7cp (033f137cde8573cfc5a4662b4ed6a63b8a8d1464)
Red Hat Enterprise Linux Server release 7.3 (Maipo)

[root@compute-0 ~]# rpm -qa | grep ceph | sort | uniq
ceph-base-10.2.5-37.el7cp.x86_64
ceph-common-10.2.5-37.el7cp.x86_64
ceph-mds-10.2.5-37.el7cp.x86_64
ceph-mon-10.2.5-37.el7cp.x86_64
ceph-osd-10.2.5-37.el7cp.x86_64
ceph-radosgw-10.2.5-37.el7cp.x86_64
ceph-selinux-10.2.5-37.el7cp.x86_64
collectd-ceph-5.7.0-4.el7ost.x86_64
libcephfs1-10.2.5-37.el7cp.x86_64
puppet-ceph-2.3.0-3.el7ost.noarch
python-cephfs-10.2.5-37.el7cp.x86_64
[root@compute-0 ~]# 
[root@compute-0 ~]# grep runtime /usr/lib/python2.7/site-packages/ceph_disk/main.py | wc -l
2
[root@compute-0 ~]# md5sum /usr/lib/python2.7/site-packages/ceph_disk/main.py
ac08de47454124bd3d3d0a23478194cf  /usr/lib/python2.7/site-packages/ceph_disk/main.py
[root@compute-0 ~]# 

Note: I encountered this on a deployment done by OSP-Director using the OSP11 puddle. 

Footnotes:
[1] https://github.com/ceph/ceph/commit/539385b143feee3905dceaf7a8faaced42f2d3c6

[2] 
[root@overcloud-osd-compute-1 ~]# ls /run/systemd/system/ceph-osd.target.wants/
ceph-osd
[root@overcloud-osd-compute-1 ~]# ls /etc/systemd/system/ceph-osd.target.wants/
ls: cannot access /etc/systemd/system/ceph-osd.target.wants/: No such file or directory
[root@overcloud-osd-compute-1 ~]# 

[3] https://raw.githubusercontent.com/ceph/ceph/72f0b2aa1eb4b7b2a2222c2847d26f99400a8374/src/ceph-disk/ceph_disk/main.py

Comment 2 Loic Dachary 2017-04-13 22:03:57 UTC
Testing a fix at https://github.com/ceph/ceph/pull/14546

Comment 3 Loic Dachary 2017-04-14 15:46:32 UTC
Would you be so kind as to verify that applying the patch at https://github.com/ceph/ceph/pull/14546 fixes the problem ? It has been tested but it would be good to have your confirmation before merging it :-)

Comment 4 jomurphy 2017-04-17 13:19:18 UTC
*** Bug 1439223 has been marked as a duplicate of this bug. ***

Comment 5 John Fulton 2017-04-17 13:53:06 UTC
I confirm that https://github.com/ceph/ceph/pull/14546 fixes the problem. Thank you

Comment 7 Ian Colle 2017-04-19 15:24:15 UTC
John, QE would like your help in testing this fix. Can you do that?

Comment 8 John Fulton 2017-04-19 18:53:31 UTC
I have empirically tested that the patch fixes the bug with the
following process: 

- Apply the patch [1] 
- Deploy a directory-backed OSD 
- Observe on first boot, on an OSD node, that OSDs are running
  and that their target is in /etc not /run (so they _should_
  restart on reboot)
- Reboot the system and observe that the OSDs _are_ running


[1] Details on how patch was applied:

  pushd /usr/lib/python2.7/site-packages/ceph_disk/
  mv main.py /root/ceph-disk-main.py
  curl https://raw.githubusercontent.com/ceph/ceph/f425a127b7487d2093c8c943f0bcdec3d673d601/src/ceph-disk/ceph_disk/main.py > main.py
  popd 
  python -O -m compileall /usr/lib/python2.7/site-packages/ceph_disk/

Comment 19 errata-xmlrpc 2017-06-19 13:31:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497

Comment 20 John Fulton 2017-07-02 19:05:55 UTC
*** Bug 1457612 has been marked as a duplicate of this bug. ***