Bug 1413159

Summary: Ceph - Using service command to start OSD in 1.3.x terminates other OSD processes on the node.
Product: Red Hat Ceph Storage Reporter: jquinn <jquinn>
Component: Ceph-DiskAssignee: Kefu Chai <kchai>
Status: CLOSED WONTFIX QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact: Erin Donnelly <edonnell>
Priority: high    
Version: 1.3.3CC: asriram, edonnell, flucifre, hnallurv, jquinn, kchai, kdreyer, mmurthy, nlevine, vumrao
Target Milestone: rcKeywords: Reopened
Target Release: 1.3.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-0.94.10-2.el7cp Ubuntu: ceph_0.94.10-3redhat1xenial Doc Type: Known Issue
Doc Text:
.Running `service ceph start osd.x` on a Red Hat Ceph Storage 1.3.x cluster causes the other OSDs on that node to stop `service` is using `systemd` to manage the lifecycle of services, but `ceph` is a `systemd` service automatically generated from its `sysv` counterpart. Therefore, the generated `systemd` service unit is not able to differentiate osd.0 from other services managed by the `ceph` service, and `systemd` believes that the `ceph` service "exited" when the system reboots. This is why the `ceph` service stops all services before starting a certain OSD instance. After `ceph` stops all services, the status of the `ceph` service is marked `dead`, therefor it does not bother to kill it again when the user tries to restart a certain OSD instance. This is why the user is able to start other OSD instances again once they are killed by the first `service ceph start osd.x`. For more information about this issue, including a workaround, refer to https://access.redhat.com/solutions/2877891.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-16 05:45:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1372735, 1561412    

Description jquinn 2017-01-13 18:43:32 UTC
Description of problem:Whether the OSD is running or down, if we use "service ceph start osd.x" to bring up the OSD it kills the other OSD's on the node. 


Version-Release number of selected component (if applicable):


How reproducible:Have done in 2 lab environments and customer had it happen in their environment. 


Steps to Reproduce:
1.run "service ceph start osd.x" on a 1.3.x node.
2.messages and OSD logs files contain further info. 
3.

Actual results:This is when the OSD was running and I ran the start command, same results when an OSD is down. 


[root@dell-per630-8 ~]# ps -ef |grep osd
root      28446      1  0 Jan06 ?        00:00:00 /usr/bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      28450  28446  0 Jan06 ?        01:01:48 /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      67072      1  0  2016 ?        00:00:00 /usr/bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 25 --pid-file /var/run/ceph/osd.25.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      67078  67072  0  2016 ?        10:36:53 /usr/bin/ceph-osd -i 25 --pid-file /var/run/ceph/osd.25.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      67461      1  0  2016 ?        00:00:00 /usr/bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 26 --pid-file /var/run/ceph/osd.26.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      67464  67461  0  2016 ?        07:40:40 /usr/bin/ceph-osd -i 26 --pid-file /var/run/ceph/osd.26.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      84728  82818  0 23:56 pts/1    00:00:00 grep --color=auto osd
[root@dell-per630-8 ~]# 
[root@dell-per630-8 ~]# 
[root@dell-per630-8 ~]# 
[root@dell-per630-8 ~]# service ceph start osd.24
=== osd.24 === 
Starting Ceph osd.24 on dell-per630-8...
Running as unit ceph-osd.24.1484332086.476213235.service.
[root@dell-per630-8 ~]# 
[root@dell-per630-8 ~]# 
[root@dell-per630-8 ~]# ps -ef |grep -i osd
root      85650      1  0 23:58 ?        00:00:00 /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      85653  85650 14 23:58 ?        00:00:01 /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root      85830  82818  0 23:58 pts/1    00:00:00 grep --color=auto -i osd
[root@dell-per630-8 ~]# 



** From /var/log/messages ** 

Jan 13 23:57:38 dell-per630-8 salt-minion: {'jid': '20170113235735552441', 'return': None, 'success': True, 'pid': 85081, 'fun': 'ceph.heartbeat', 'id': 'dell-per630-8.gsslab.pnq2.redhat.com'}
Jan 13 23:57:47 dell-per630-8 salt-minion: {'jid': '20170113235745100750', 'return': None, 'success': True, 'pid': 85129, 'fun': 'ceph.heartbeat', 'id': 'dell-per630-8.gsslab.pnq2.redhat.com'}
Jan 13 23:57:58 dell-per630-8 salt-minion: {'jid': '20170113235755689703', 'return': None, 'success': True, 'pid': 85171, 'fun': 'ceph.heartbeat', 'id': 'dell-per630-8.gsslab.pnq2.redhat.com'}
Jan 13 23:57:58 dell-per630-8 systemd: Stopping LSB: Start Ceph distributed file system daemons at boot time...
Jan 13 23:57:58 dell-per630-8 ceph: === osd.26 ===
Jan 13 23:57:59 dell-per630-8 bash: 2017-01-13 23:57:59.030283 7f6768300700 -1 osd.26 27603 *** Got signal Terminated ***
Jan 13 23:57:59 dell-per630-8 bash: 2017-01-13 23:57:59.523189 7f6768300700 -1 osd.26 27603 shutdown
Jan 13 23:58:00 dell-per630-8 bash: /usr/bin/bash: line 1: 67464 Terminated              TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 26 --pid-file /var/run/ceph/osd.26.pid -c /etc/ceph/ceph.conf --cluster ceph -f
Jan 13 23:58:00 dell-per630-8 systemd: ceph-osd.26.1478631124.172927518.service: main process exited, code=exited, status=143/n/a
Jan 13 23:58:01 dell-per630-8 ceph: Stopping Ceph osd.26 on dell-per630-8...kill 67464...kill 67464...done
Jan 13 23:58:01 dell-per630-8 ceph: === osd.25 ===
Jan 13 23:58:01 dell-per630-8 bash: 2017-01-13 23:58:01.124628 7f76cc964700 -1 osd.25 27605 *** Got signal Terminated ***
Jan 13 23:58:01 dell-per630-8 systemd: Started Session 198425 of user keystone.
Jan 13 23:58:01 dell-per630-8 systemd: Starting Session 198425 of user keystone.
Jan 13 23:58:01 dell-per630-8 bash: 2017-01-13 23:58:01.825019 7f76cc964700 -1 osd.25 27605 shutdown
Jan 13 23:58:02 dell-per630-8 bash: /usr/bin/bash: line 1: 67078 Terminated              TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 25 --pid-file /var/run/ceph/osd.25.pid -c /etc/ceph/ceph.conf --cluster ceph -f
Jan 13 23:58:02 dell-per630-8 systemd: ceph-osd.25.1478631121.764289531.service: main process exited, code=exited, status=143/n/a
Jan 13 23:58:03 dell-per630-8 ceph: Stopping Ceph osd.25 on dell-per630-8...kill 67078...kill 67078...done
Jan 13 23:58:03 dell-per630-8 ceph: === osd.24 ===
Jan 13 23:58:03 dell-per630-8 bash: 2017-01-13 23:58:03.222118 7f2dc76b0700 -1 osd.24 27606 *** Got signal Terminated ***
Jan 13 23:58:03 dell-per630-8 bash: 2017-01-13 23:58:03.965102 7f2dc76b0700 -1 osd.24 27607 shutdown
Jan 13 23:58:04 dell-per630-8 bash: /usr/bin/bash: line 1: 28450 Terminated              TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f
Jan 13 23:58:04 dell-per630-8 systemd: ceph-osd.24.1483716680.952095560.service: main process exited, code=exited, status=143/n/a
Jan 13 23:58:05 dell-per630-8 ceph: Stopping Ceph osd.24 on dell-per630-8...kill 28450...kill 28450...done
Jan 13 23:58:05 dell-per630-8 ceph: === mon.dell-per630-8 ===
Jan 13 23:58:05 dell-per630-8 bash: 2017-01-13 23:58:05.315713 7f6d2283e700 -1 mon.dell-per630-8@0(leader) e5 *** Got Signal Terminated ***
Jan 13 23:58:06 dell-per630-8 ceph: Stopping Ceph mon.dell-per630-8 on dell-per630-8...kill 135846...done
Jan 13 23:58:06 dell-per630-8 systemd: Stopped LSB: Start Ceph distributed file system daemons at boot time.
Jan 13 23:58:06 dell-per630-8 systemd: Started /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Jan 13 23:58:06 dell-per630-8 systemd: Starting /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Jan 13 23:58:06 dell-per630-8 bash: starting osd.24 at :/0 osd_data /var/lib/ceph/osd/ceph-24 /var/lib/ceph/osd/ceph-24/journal
Jan 13 23:58:07 dell-per630-8 salt-minion: {'jid': '20170113235805301757', 'return': None, 'success': True, 'pid': 85515, 'fun': 'ceph.heartbeat', 'id': 'dell-per630-8.gsslab.pnq2.redhat.com'}
Jan 13 23:58:08 dell-per630-8 bash: 2017-01-13 23:58:08.412658 7faec862c7c0 -1 osd.24 27607 log_to_monitors {default=true}




*** From the Log for down OSD ***

hing to send, going to standby
2017-01-13 23:56:33.802584 7f76bcde2700  0 -- 10.74.128.30:6813/67078 >> 10.74.128.35:6801/53334 pipe(0x8e66000 sd=217 :6813 s=2 pgs=14811 cs=2684 l=0 c=0x74b4c60).fault with nothing to send, going to standby
2017-01-13 23:57:59.620975 7f76c22c5700  0 -- 10.74.128.30:6813/67078 >> 10.74.128.27:6815/60077 pipe(0x79a4000 sd=232 :6813 s=0 pgs=0 cs=0 l=0 c=0x2f49f860).accept connect_seq 1754 vs existing 1754 state standby
2017-01-13 23:57:59.621313 7f76c22c5700  0 -- 10.74.128.30:6813/67078 >> 10.74.128.27:6815/60077 pipe(0x79a4000 sd=232 :6813 s=0 pgs=0 cs=0 l=0 c=0x2f49f860).accept connect_seq 1755 vs existing 1754 state standby
2017-01-13 23:57:59.623445 7f76edfac700  0 osd.25 27604 crush map has features 301610890952704, adjusting msgr requires for osds
2017-01-13 23:58:00.667142 7f76ef7af700  0 osd.25 27605 crush map has features 301610890952704, adjusting msgr requires for osds
2017-01-13 23:58:01.124628 7f76cc964700 -1 osd.25 27605 *** Got signal Terminated ***
2017-01-13 23:58:01.124667 7f76cc964700  0 osd.25 27605 prepare_to_stop telling mon we are shutting down
2017-01-13 23:58:01.824970 7f76ef7af700  0 osd.25 27605 got_stop_ack starting shutdown
2017-01-13 23:58:01.825005 7f76cc964700  0 osd.25 27605 prepare_to_stop starting shutdown
2017-01-13 23:58:01.825019 7f76cc964700 -1 osd.25 27605 shutdown
2017-01-13 23:58:01.825170 7f76cc964700 20 osd.25 27605  kicking pg 49.dfs1
2017-01-13 23:58:01.825177 7f76cc964700 30 osd.25 pg_epoch: 27605 pg[49.dfs1( v 27603'44 (0'0,27603'44] local-les=27544 n=4 ec=26578 les/c 27544/27546 27543/27543/26931) [12,25,5,19,27,15,10] r=1 lpr=27543 pi=27376-27542/4 luod=0'0 crt=27603'33 active] lock
2017-01-13 23:58:01.825197 7f76cc964700 10 osd.25 pg_epoch: 27605 pg[49.dfs1( v 27603'44 (0'0,27603'44] local-les=27544 n=4 ec=26578 les/c 27544/27546 27543/27543/26931) [12,25,5,19,27,15,10] r=1 lpr=27543 pi=27376-27542/4 luod=0'0 crt=27603'33 active] on_shutdown




Expected results:Only start the OSD in question. 


Additional info:

Comment 3 Vikhyat Umrao 2017-01-13 18:48:37 UTC
Changing it to 1.3.z release as we sysvinit is deprecated in RHCS 2.x(jewel).

Comment 6 Kefu Chai 2017-03-01 11:45:38 UTC
could related to how we use pid file to manage the life cycle of a daemon. will try to reproduce it with more verbose log locally.

Comment 20 Kefu Chai 2017-03-13 10:29:17 UTC
i printed "ps aux|grep ceph-osd" in the beginning of "/etc/init.d/ceph", like

diff --git a/src/init-ceph.in b/src/init-ceph.in
index 7bcfda4..fb57406 100644
--- a/src/init-ceph.in
+++ b/src/init-ceph.in
@@ -21,6 +21,8 @@ fi
 SYSTEMD_RUN=$(which systemd-run 2>/dev/null)
 grep -qs systemd /proc/1/comm || SYSTEMD_RUN=""
 
+ps aux|grep ceph-osd
+
 # if we start up as ./init-ceph, assume everything else is in the
 # current directory too.
 if [ `dirname $0` = "." ] && [ $PWD != "/etc/init.d" ]; then

and i got

[root]# ps aux|grep ceph-osd
root       2622  0.0  0.0 115244  1460 ?        Ss   15:44   0:00 /usr/bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 30 --pid-file /var/run/ceph/osd.30.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root       2630  0.0  0.1 911132 18624 ?        Sl   15:44   0:00 /usr/bin/ceph-osd -i 30 --pid-file /var/run/ceph/osd.30.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root       2718  0.0  0.0 115244  1456 ?        Ss   15:44   0:00 /usr/bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 32 --pid-file /var/run/ceph/osd.32.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root       2726  0.0  0.1 911128 18616 ?        Sl   15:44   0:00 /usr/bin/ceph-osd -i 32 --pid-file /var/run/ceph/osd.32.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root       2875  0.0  0.0 115244  1456 ?        Ss   15:44   0:00 /usr/bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 31 --pid-file /var/run/ceph/osd.31.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root       2882  0.0  0.1 911128 22560 ?        Sl   15:44   0:00 /usr/bin/ceph-osd -i 31 --pid-file /var/run/ceph/osd.31.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root       6241  0.0  0.0 112648   960 pts/0    S+   15:53   0:00 grep --color=auto ceph-osd
[root]# service ceph start osd.30
+ '[' -e /lib/lsb/init-functions ']'
+ . /lib/lsb/init-functions
++ which systemd-run
+ SYSTEMD_RUN=/bin/systemd-run
+ grep -qs systemd /proc/1/comm
+ ps aux
+ grep ceph-osd
root       6826  0.0  0.0   9032   656 pts/0    S+   15:54   0:00 grep ceph-osd
++ dirname /etc/init.d/ceph

in other words, all ceph-osd processes are killed before "/etc/init.d/ceph" kicks in.

Comment 21 Kefu Chai 2017-03-13 11:04:55 UTC
in /usr/sbin/service

if [ -f "${SERVICEDIR}/${SERVICE}" ]; then
   # LSB daemons that dies abnormally in systemd looks alive in systemd's eyes due to RemainAfterExit=yes
   # lets reap them before next start
   if [ "${ACTION}" = "start" ] && \
   systemctl show -p ActiveState ${SERVICE}.service | grep -q '=active$' && \
   systemctl show -p SubState ${SERVICE}.service | grep -q '=exited$' ; then
       /bin/systemctl stop ${SERVICE}.service
   fi

it stops "ceph" if the "ActiveState" is "active" and "SubState is "exited".


# systemctl show -p ActiveState -p SubState ceph
ActiveState=active
SubState=exited

# systemctl status  ceph -l
‚óŹ ceph.service - LSB: Start Ceph distributed file system daemons at boot time
   Loaded: loaded (/etc/rc.d/init.d/ceph; bad; vendor preset: disabled)
   Active: active (exited) since Mon 2017-03-13 16:04:19 IST; 24min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 1312 ExecStart=/etc/rc.d/init.d/ceph start (code=exited, status=0/SUCCESS)

Mar 13 16:04:18 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + '[' 0 -eq 0 ']'
Mar 13 16:04:18 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + touch /var/lock/subsys/ceph
Mar 13 16:04:18 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + '[' start = start -a /usr/bin '!=' . ']'
Mar 13 16:04:18 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + '[' '' = '' ']'
Mar 13 16:04:18 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + '[' -x /usr/sbin/ceph-disk ']'
Mar 13 16:04:18 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + ceph-disk activate-all
Mar 13 16:04:19 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: ERROR:ceph-disk:Failed to activate
Mar 13 16:04:19 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: ceph-disk: Error: another ceph osd.30 already mounted in position (old/different cluster instance?); unmounting ours.
Mar 13 16:04:19 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: ceph-disk: Error: One or more partitions failed to activate
Mar 13 16:04:19 dell-per630-9.gsslab.pnq2.redhat.com ceph[1312]: + exit 0

so, when the system boots, "ceph-disk activate-all" failed with above error, thus systemd marks the SubState as "exited".

Comment 22 Kefu Chai 2017-03-13 11:48:04 UTC
Vikhyat, why do we have a 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-40dc-a649-47b0d99a7baf in /dev/disk/by-parttypeuuid/, the name is started with the prefix of "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", which is a mark used by ceph-disk to note that "it is a ready-to-use ceph osd partition". that's why we failed to mount it to /var/lib/ceph/osd/ceph-30.

hence ceph-disk failed.

# ceph-disk -v activate-all
DEBUG:ceph-disk:Scanning /dev/disk/by-parttypeuuid
INFO:ceph-disk:Activating /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-40dc-a649-47b0d99a7baf
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-40dc-a649-47b0d99a7baf
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
DEBUG:ceph-disk:Mounting /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-40dc-a649-47b0d99a7baf on /var/lib/ceph/tmp/mnt.DWsjYq with options noatime,inode64
INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-40dc-a649-47b0d99a7baf /var/lib/ceph/tmp/mnt.DWsjYq
INFO:ceph-disk:Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.DWsjYq
DEBUG:ceph-disk:Cluster uuid is 444f54b1-f97f-43d8-85b7-d5a02daac39a
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
DEBUG:ceph-disk:Cluster name is ceph
DEBUG:ceph-disk:OSD uuid is ec76bcd2-d53c-40dc-a649-47b0d99a7baf
DEBUG:ceph-disk:OSD id is 30
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
DEBUG:ceph-disk:Marking with init system sysvinit
DEBUG:ceph-disk:ceph osd.30 data dir is ready at /var/lib/ceph/tmp/mnt.DWsjYq
INFO:ceph-disk:/var/lib/ceph/osd/ceph-30 is not empty, won't override
ERROR:ceph-disk:Failed to activate
DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.DWsjYq
INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.DWsjYq
ceph-disk: Error: another ceph osd.30 already mounted in position (old/different cluster instance?); unmounting ours.
ceph-disk: Error: One or more partitions failed to activate

Comment 23 Vikhyat Umrao 2017-03-14 20:29:35 UTC
(In reply to Kefu Chai from comment #22)
> Vikhyat, why do we have a
> 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-40dc-a649-47b0d99a7baf in
> /dev/disk/by-parttypeuuid/, the name is started with the prefix of
> "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", which is a mark used by ceph-disk to
> note that "it is a ready-to-use ceph osd partition". that's why we failed to
> mount it to /var/lib/ceph/osd/ceph-30.
>

Hi Kefu,

Thanks for your inputs. But I am not sure why this prefix is there because I checked looks like it comes from ceph-disk prepare or create?

Because it is same in jewel and hammer.

In Jewel:

# cd /dev/disk/by-parttypeuuid/
[root@kilo1 by-parttypeuuid]# ll
total 0
lrwxrwxrwx 1 root root 10 Feb 23 06:16 45b0969e-9b03-4f30-b4c6-b4b80ceff106.c4d58c1a-acd5-4ba6-89f6-e19b10e6ed5c -> ../../vdc2
lrwxrwxrwx 1 root root 10 Feb 23 06:16 45b0969e-9b03-4f30-b4c6-b4b80ceff106.cbd34227-62a2-41b7-ac28-1e57a8b9d561 -> ../../vdb2
lrwxrwxrwx 1 root root 10 Feb 23 05:14 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.115ab8f0-f8d3-43fa-bc57-fb38ba55bbef -> ../../vdb1
lrwxrwxrwx 1 root root 10 Feb 23 05:14 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.3b236137-dcf9-41f6-9981-b123c96f89ab -> ../../vdc1


# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/dm-1 swap, swap
/dev/dm-2 other, xfs, mounted on /home
/dev/sr0 other, unknown
/dev/vda :
 /dev/vda2 other, LVM2_member
 /dev/vda1 other, xfs, mounted on /boot
/dev/vdb :
 /dev/vdb2 ceph journal, for /dev/vdb1
 /dev/vdb1 ceph data, active, cluster ceph, osd.1, journal /dev/vdb2
/dev/vdc :
 /dev/vdc2 ceph journal, for /dev/vdc1
 /dev/vdc1 ceph data, active, cluster ceph, osd.3, journal /dev/vdc2


In Hammer:

# cd /dev/disk/by-parttypeuuid/
# ll
total 0
lrwxrwxrwx 1 root root 10 May 23  2016 0fc63daf-8483-4772-8e79-3d69d8477de4.0cff6bf0-596b-453b-ae61-4147717d1047 -> ../../sda3
lrwxrwxrwx 1 root root 10 Oct  3 15:04 45b0969e-9b03-4f30-b4c6-b4b80ceff106.a3fe4d60-ebe2-4829-8d0e-b12d50e089e2 -> ../../sda1
lrwxrwxrwx 1 root root 10 Oct  3 15:04 45b0969e-9b03-4f30-b4c6-b4b80ceff106.ed8e9371-f219-42be-a677-356dbe910097 -> ../../sda2
lrwxrwxrwx 1 root root 10 May 23  2016 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.d5d2a10e-4b48-4e8e-a88f-19cc7b17f5ad -> ../../sdc1


# ceph-disk list
/dev/sda :
 /dev/sda1 ceph journal, for /dev/sdc1
 /dev/sda2 ceph journal
 /dev/sda3 other, xfs, mounted on /var/lib/ceph/osd/ceph-11
/dev/sdb :
 /dev/sdb1 other, xfs, mounted on /boot
 /dev/sdb2 other, LVM2_member
 /dev/sdb3 other, xfs, mounted on /var/lib/ceph/osd/ceph-10
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.9, journal /dev/sda1
/dev/sr0 other, unknown

This is the symlink to the journal partition for osds.
 
> hence ceph-disk failed.
> 
> # ceph-disk -v activate-all
> DEBUG:ceph-disk:Scanning /dev/disk/by-parttypeuuid
> INFO:ceph-disk:Activating
> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-
> 40dc-a649-47b0d99a7baf
> INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue --
> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-
> 40dc-a649-47b0d99a7baf
> INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph
> --name=osd. --lookup osd_mount_options_xfs
> INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph
> --name=osd. --lookup osd_fs_mount_options_xfs
> DEBUG:ceph-disk:Mounting
> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-
> 40dc-a649-47b0d99a7baf on /var/lib/ceph/tmp/mnt.DWsjYq with options
> noatime,inode64
> INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 --
> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.ec76bcd2-d53c-
> 40dc-a649-47b0d99a7baf /var/lib/ceph/tmp/mnt.DWsjYq
> INFO:ceph-disk:Running command: /usr/sbin/restorecon
> /var/lib/ceph/tmp/mnt.DWsjYq
> DEBUG:ceph-disk:Cluster uuid is 444f54b1-f97f-43d8-85b7-d5a02daac39a
> INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph
> --show-config-value=fsid
> DEBUG:ceph-disk:Cluster name is ceph
> DEBUG:ceph-disk:OSD uuid is ec76bcd2-d53c-40dc-a649-47b0d99a7baf
> DEBUG:ceph-disk:OSD id is 30
> INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph
> --name=osd. --lookup init
> DEBUG:ceph-disk:Marking with init system sysvinit
> DEBUG:ceph-disk:ceph osd.30 data dir is ready at /var/lib/ceph/tmp/mnt.DWsjYq
> INFO:ceph-disk:/var/lib/ceph/osd/ceph-30 is not empty, won't override
> ERROR:ceph-disk:Failed to activate
> DEBUG:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.DWsjYq
> INFO:ceph-disk:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.DWsjYq
> ceph-disk: Error: another ceph osd.30 already mounted in position
> (old/different cluster instance?); unmounting ours.
> ceph-disk: Error: One or more partitions failed to activate

Right and this setup if different from the customer environments. The first customer who reported this issue is running encrypted OSDs.

data disks:

/dev/sdc1: UUID="358c9657-c1e7-4eac-9c2f-7b6a8f52bcad" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="bf0c758f-7bb4-4471-a4d0-10ce056cb825" 
/dev/sdd1: UUID="e62a4eea-44d0-467e-9bd5-afaad2e2917f" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="7c1d6551-2233-4b9a-94dc-27f04a9d1e07" 
/dev/sde1: UUID="660c310a-e944-4ed7-9a6b-760077808dbf" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="4e973be4-4c75-4a94-b6bd-d1034c21de7e" 
/dev/sdf1: UUID="9e72deee-1f7b-4065-8a47-9c43fb7b193a" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="bd310495-909c-4c32-8c06-5bd9a859b5f1" 
/dev/sdg1: UUID="f7914ee5-26d9-4dd0-a146-3e715ab09207" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="9608d14d-c484-4214-9c91-b8cc6d8c3e5d" 
/dev/sdh1: UUID="08efc109-2d55-4efe-9c3f-0b4d08ec91d7" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="26fe4375-4f17-4187-92c6-536803b2900a" 
/dev/sdi1: UUID="dac83874-a2fe-449c-9190-4680dc0d3139" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="a9d46267-533f-4605-a2cc-8d225b258819" 
/dev/sdj1: UUID="19a7cf86-31b8-4efc-a83b-3b2525fc4053" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="1bed3f52-d36d-459a-ada6-b37158987782" 
/dev/sdk1: UUID="dbe69cd2-601a-4905-b594-3d855520e90e" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="9a21d91c-b2db-4cf5-b592-f35f1c754999" 
/dev/sdl1: UUID="80109003-cd0f-409c-a9c9-34d1a07ace03" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="17c08c18-8808-4a56-81a0-d2f47ff9739f" 
/dev/sdm1: UUID="3e689239-0949-4986-8f8a-ad860b0d4077" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="8dd0414d-6a5e-4c74-aed0-3cd8d3fca68a" 
/dev/sdn1: UUID="5ff2aa5d-843d-49a6-893a-21d40c1682e7" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="b7e63d5b-d511-4b72-9458-c5d6905d1673" 
/dev/sdp1: UUID="18b8b4c1-8966-4035-b5c5-c3e2ad953914" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="ab375bfc-e5fd-4702-b110-e99405fc3b25" 
/dev/sdo1: UUID="58a4483d-b008-4112-b748-a18d5039579d" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="a23106d8-bf28-4bd6-995b-80c4b28a61d7" 
/dev/sdq1: UUID="c488b4ec-1c5c-46c2-8b0f-613b7c94c197" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="afe7df59-af5e-41d3-b5f8-1937c14621d3" 
/dev/sds1: UUID="442e4e20-7b49-45a3-ac7d-d8c52d4c2112" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="f93e8168-4428-4a24-928c-068a84c5625b" 
/dev/sdr1: UUID="7ec7075e-2886-4b57-85af-b157b591203c" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="80f72448-387a-4a2b-b553-16f844b57283" 
/dev/sdt1: UUID="a84fb5c4-e564-4d9e-81f0-cfea6435d696" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="ca3d9abd-0fb9-45de-a7d8-e734035c2234" 
/dev/sdu1: UUID="778a2285-c60d-4953-99d7-e0a8eea31ecb" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="56539e45-cb93-456d-aa9a-85762b1f821f" 


/dev/mapper/osd-29: UUID="32119356-8d68-4849-93d4-5a0fac953d1a" TYPE="xfs" 
/dev/mapper/osd-93: UUID="806f1422-d889-4a97-a7eb-4aedc9c2d7a4" TYPE="xfs" 
/dev/mapper/osd-148: UUID="bbebafb0-e615-4a09-a1ed-1157869fff77" TYPE="xfs" 
/dev/mapper/osd-85: UUID="b5cb4024-3033-437f-9f5c-7ce890f3d937" TYPE="xfs" 
/dev/mapper/osd-108: UUID="a29c2923-5c33-4e8b-8806-78df58455bda" TYPE="xfs" 
/dev/mapper/osd-99: UUID="4938c6f2-c3f0-4d81-8df9-64f0f8aee98d" TYPE="xfs" 
/dev/mapper/osd-133: UUID="f68ad6b5-3bf2-47ba-9891-479e4052fffc" TYPE="xfs" 
/dev/mapper/osd-20: UUID="b2fb3584-1b51-4fcb-91f6-831f2ec4acdc" TYPE="xfs" 
/dev/mapper/osd-124: UUID="f2d51ce5-0c82-4155-aebf-af5d2e89c65f" TYPE="xfs" 
/dev/mapper/osd-0: UUID="40fa897d-b43b-4d1b-ab0d-c16913838f4d" TYPE="xfs" 
/dev/mapper/osd-77: UUID="dba7c331-2d0a-496b-9d1e-dce41dbe7ffc" TYPE="xfs" 
/dev/mapper/osd-154: UUID="ed4e9b66-bb2e-4142-93b3-309cd0072f5f" TYPE="xfs" 
/dev/mapper/osd-38: UUID="08dbc954-e1b3-40ce-b47a-a461d0091e77" TYPE="xfs" 
/dev/mapper/osd-105: UUID="4ed80f18-cc3a-4f68-ac2b-6f802e50cabf" TYPE="xfs" 
/dev/mapper/osd-128: UUID="8c357f77-92c3-4305-9a14-fe1679f905be" TYPE="xfs" 
/dev/mapper/osd-15: UUID="fe7b0a5c-0c25-4220-b2f1-86524ee2195c" TYPE="xfs" 
/dev/mapper/osd-65: UUID="d0155357-37f4-4a9b-bd09-1ee2293eb672" TYPE="xfs" 
/dev/mapper/osd-141: UUID="01366708-51f4-42b5-8bdf-caedcb72fda1" TYPE="xfs" 
/dev/mapper/osd-51: UUID="0f8536b8-ad61-499d-ba29-e142676dbf4e" TYPE="xfs" 
/dev/mapper/osd-33: UUID="8b0ee655-d23e-4270-91b8-c939df439f6b" TYPE="xfs" 



Journal disks:

/dev/sdv1: PARTLABEL="primary" PARTUUID="aece715b-8b88-40e5-a845-254cd6004e58" 
/dev/sdv2: PARTLABEL="extended" PARTUUID="f76caef2-11a9-4a3f-8b0a-515289b0809f" 
/dev/sdv3: PARTLABEL="extended" PARTUUID="a46fdd4c-cd0a-4fd7-ac6e-76a1a0ce2cf0" 
/dev/sdv4: PARTLABEL="extended" PARTUUID="d04b04b9-02d0-4538-9d69-4f2f66a929d6" 
/dev/sdv5: PARTLABEL="extended" PARTUUID="39d3fb93-28f8-4647-8165-b2f260dcb233" 
/dev/sdw1: PARTLABEL="primary" PARTUUID="63662a7d-b95d-4d92-b83a-ba6a03f3fe02" 
/dev/sdw2: PARTLABEL="extended" PARTUUID="e03a6e2c-e626-4e12-bc49-38e381b0e05f" 
/dev/sdw3: PARTLABEL="extended" PARTUUID="5f1c4814-c20e-40da-a8b0-6edf9a436915" 
/dev/sdw4: PARTLABEL="extended" PARTUUID="7f96a220-a43f-481c-ac95-e7f21caf47e0" 
/dev/sdw5: PARTLABEL="extended" PARTUUID="3c6fee3e-0625-45e1-9c9c-55f587b791da" 
/dev/sdx1: PARTLABEL="primary" PARTUUID="ef29c94b-a447-4d46-a1e2-3f44ddfaff8f" 
/dev/sdx2: PARTLABEL="extended" PARTUUID="92022058-ae86-4d5d-b138-889d2b422ee9" 
/dev/sdx3: PARTLABEL="extended" PARTUUID="dc5104d5-48b6-4ce8-b58e-68204e8a05c0" 
/dev/sdx4: PARTLABEL="extended" PARTUUID="1c9690aa-c05e-44b4-b13e-7c505d40689a" 
/dev/sdx5: PARTLABEL="extended" PARTUUID="bc256f85-7532-46c7-942c-78967ffb973a" 
/dev/sdy1: PARTLABEL="primary" PARTUUID="c90815e6-1bc4-43c9-a6cd-c4120fde7662" 
/dev/sdy2: PARTLABEL="extended" PARTUUID="e37d5c62-c33c-4529-b605-9a5801da05cb" 
/dev/sdy3: PARTLABEL="extended" PARTUUID="2da35649-f523-49dc-b1a3-35b0bb6dded8" 
/dev/sdy4: PARTLABEL="extended" PARTUUID="9ce6452b-3ee9-441c-ace1-796dc4bc52a3" 
/dev/sdy5: PARTLABEL="extended" PARTUUID="2df30063-5873-4e40-9f70-98ec57b5e684" 


# cat sos_commands/block/lsblk 
NAME                 MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                    8:0    0 136.7G  0 disk  
|-sda1                 8:1    0   200M  0 part  /boot/efi
|-sda2                 8:2    0   512M  0 part  /boot
|-sda3                 8:3    0    54G  0 part  
| |-vg00alt-auditvol 253:2    0   256M  0 lvm   
| |-vg00alt-homevol  253:4    0     1G  0 lvm   
| |-vg00alt-rootvol  253:7    0    15G  0 lvm   
| |-vg00alt-tmpvol   253:8    0     5G  0 lvm   
| `-vg00alt-varvol   253:9    0    14G  0 lvm   
`-sda4                 8:4    0    82G  0 part  
  |-vg00-rootvol     253:0    0    15G  0 lvm   /
  |-vg00-swapvol     253:1    0     2G  0 lvm   [SWAP]
  |-vg00-homevol     253:3    0     1G  0 lvm   /home
  |-vg00-tmpvol      253:5    0     5G  0 lvm   /tmp
  |-vg00-auditvol    253:6    0   256M  0 lvm   /var/log/audit
  |-vg00-crashvol    253:10   0    26G  0 lvm   /var/crash
  `-vg00-varvol      253:11   0    14G  0 lvm   /var
sdb                    8:16   0   1.7T  0 disk  
`-sdb1                 8:17   0   1.7T  0 part  
  `-osd-93           253:13   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-93
sdc                    8:32   0   1.7T  0 disk  
`-sdc1                 8:33   0   1.7T  0 part  
  `-osd-99           253:17   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-99
sdd                    8:48   0   1.7T  0 disk  
`-sdd1                 8:49   0   1.7T  0 part  
  `-osd-51           253:30   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-51
sde                    8:64   0   1.7T  0 disk  
`-sde1                 8:65   0   1.7T  0 part  
  `-osd-65           253:28   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-65
sdf                    8:80   0   1.7T  0 disk  
`-sdf1                 8:81   0   1.7T  0 part  
  `-osd-77           253:22   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-77
sdg                    8:96   0   1.7T  0 disk  
`-sdg1                 8:97   0   1.7T  0 part  
  `-osd-85           253:15   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-85
sdh                    8:112  0   1.7T  0 disk  
`-sdh1                 8:113  0   1.7T  0 part  
  `-osd-133          253:18   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-133
sdi                    8:128  0   1.7T  0 disk  
`-sdi1                 8:129  0   1.7T  0 part  
  `-osd-141          253:29   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-141
sdj                    8:144  0   1.7T  0 disk  
`-sdj1                 8:145  0   1.7T  0 part  
  `-osd-148          253:14   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-148
sdk                    8:160  0   1.7T  0 disk  
`-sdk1                 8:161  0   1.7T  0 part  
  `-osd-154          253:23   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-154
sdl                    8:176  0   1.7T  0 disk  
`-sdl1                 8:177  0   1.7T  0 part  
  `-osd-105          253:25   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-105
sdm                    8:192  0   1.7T  0 disk  
`-sdm1                 8:193  0   1.7T  0 part  
  `-osd-108          253:16   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-108
sdn                    8:208  0   1.7T  0 disk  
`-sdn1                 8:209  0   1.7T  0 part  
  `-osd-124          253:20   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-124
sdo                    8:224  0   1.7T  0 disk  
`-sdo1                 8:225  0   1.7T  0 part  
  `-osd-128          253:26   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-128
sdp                    8:240  0   1.7T  0 disk  
`-sdp1                 8:241  0   1.7T  0 part  
  `-osd-20           253:19   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-20
sdq                   65:0    0   1.7T  0 disk  
`-sdq1                65:1    0   1.7T  0 part  
  `-osd-29           253:12   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-29
sdr                   65:16   0   1.7T  0 disk  
`-sdr1                65:17   0   1.7T  0 part  
  `-osd-33           253:31   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-33
sds                   65:32   0   1.7T  0 disk  
`-sds1                65:33   0   1.7T  0 part  
  `-osd-38           253:24   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-38
sdt                   65:48   0   1.7T  0 disk  
`-sdt1                65:49   0   1.7T  0 part  
  `-osd-0            253:21   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-0
sdu                   65:64   0   1.7T  0 disk  
`-sdu1                65:65   0   1.7T  0 part  
  `-osd-15           253:27   0   1.7T  0 crypt /var/lib/ceph/osd/ceph-15
sdv                   65:80   0 372.6G  0 disk  
|-sdv1                65:81   0  74.5G  0 part  
|-sdv2                65:82   0  74.5G  0 part  
|-sdv3                65:83   0  74.5G  0 part  
|-sdv4                65:84   0  74.5G  0 part  
`-sdv5                65:85   0  74.6G  0 part  
sdw                   65:96   0 372.6G  0 disk  
|-sdw1                65:97   0  74.5G  0 part  
|-sdw2                65:98   0  74.5G  0 part  
|-sdw3                65:99   0  74.5G  0 part  
|-sdw4                65:100  0  74.5G  0 part  
`-sdw5                65:101  0  74.6G  0 part  
sdx                   65:112  0 372.6G  0 disk  
|-sdx1                65:113  0  74.5G  0 part  
|-sdx2                65:114  0  74.5G  0 part  
|-sdx3                65:115  0  74.5G  0 part  
|-sdx4                65:116  0  74.5G  0 part  
`-sdx5                65:117  0  74.6G  0 part  
sdy                   65:128  0 372.6G  0 disk  
|-sdy1                65:129  0  74.5G  0 part  
|-sdy2                65:130  0  74.5G  0 part  
|-sdy3                65:131  0  74.5G  0 part  
|-sdy4                65:132  0  74.5G  0 part  
`-sdy5                65:133  0  74.6G  0 part  

- The Second customer is using normal installation:

# cat sos_commands/block/lsblk 
NAME          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda             8:0    0 111.8G  0 disk 
|-sda1          8:1    0   200M  0 part /boot/efi
|-sda2          8:2    0   500M  0 part /boot
`-sda3          8:3    0 111.1G  0 part 
  |-rhel-swap 253:0    0  11.2G  0 lvm  [SWAP]
  |-rhel-root 253:1    0    50G  0 lvm  /
  `-rhel-home 253:2    0  49.9G  0 lvm  /home
sdb             8:16   0 186.3G  0 disk 
|-sdb1          8:17   0   9.3G  0 part 
|-sdb2          8:18   0   9.3G  0 part 
|-sdb3          8:19   0   9.3G  0 part 
|-sdb4          8:20   0   9.3G  0 part 
`-sdb5          8:21   0   9.3G  0 part 
sdc             8:32   0 186.3G  0 disk 
|-sdc1          8:33   0   9.3G  0 part 
|-sdc2          8:34   0   9.3G  0 part 
|-sdc3          8:35   0   9.3G  0 part 
|-sdc4          8:36   0   9.3G  0 part 
`-sdc5          8:37   0   9.3G  0 part 
sdd             8:48   0   1.8T  0 disk 
`-sdd1          8:49   0   1.8T  0 part /var/lib/ceph/osd/ceph-30
sde             8:64   0   1.8T  0 disk 
`-sde1          8:65   0   1.8T  0 part /var/lib/ceph/osd/ceph-31
sdf             8:80   0   1.8T  0 disk 
`-sdf1          8:81   0   1.8T  0 part /var/lib/ceph/osd/ceph-32
sdg             8:96   0   1.8T  0 disk 
`-sdg1          8:97   0   1.8T  0 part /var/lib/ceph/osd/ceph-33
sdh             8:112  0   1.8T  0 disk 
`-sdh1          8:113  0   1.8T  0 part /var/lib/ceph/osd/ceph-34
sdi             8:128  0   1.8T  0 disk 
`-sdi1          8:129  0   1.8T  0 part /var/lib/ceph/osd/ceph-35
sdj             8:144  0   1.8T  0 disk 
`-sdj1          8:145  0   1.8T  0 part /var/lib/ceph/osd/ceph-36
sdk             8:160  0   1.8T  0 disk 
`-sdk1          8:161  0   1.8T  0 part /var/lib/ceph/osd/ceph-37
sdl             8:176  0   1.8T  0 disk 
`-sdl1          8:177  0   1.8T  0 part /var/lib/ceph/osd/ceph-38
sdm             8:192  0   1.8T  0 disk 
`-sdm1          8:193  0   1.8T  0 part /var/lib/ceph/osd/ceph-39
sdn             8:208  0   7.4G  0 disk 
`-sdn1          8:209  0   7.4G  0 part 
sdo             8:224  0     1G  0 disk 

Journal disk 1:

/dev/sdb1: PARTLABEL="primary" PARTUUID="adb985db-2b10-4311-a6be-b694ca8484ee" 
/dev/sdb2: PARTLABEL="primary" PARTUUID="b6a38b51-f3cc-4f6f-b1cf-52e786110a13" 
/dev/sdb3: PARTLABEL="primary" PARTUUID="3de8106a-56bc-4d69-a268-68f7eb180b22" 
/dev/sdb4: PARTLABEL="primary" PARTUUID="3bbfc0c5-0826-4725-86d1-9310bc8bf42d" 
/dev/sdb5: PARTLABEL="primary" PARTUUID="6a46003b-5e6e-43b4-b352-0cc92bbd7131" 

Journal disk 2:
/dev/sdc1: PARTLABEL="primary" PARTUUID="5a71bc80-4b45-45f3-8888-e68f999ecbec" 
/dev/sdc2: PARTLABEL="primary" PARTUUID="6e9b67e0-dec6-44c2-9650-76f886a1d89f" 
/dev/sdc3: PARTLABEL="primary" PARTUUID="ea054dde-d772-459f-9d66-1c5fd8a879d7" 
/dev/sdc4: PARTLABEL="primary" PARTUUID="c0b5960f-4f71-4122-8b63-4c61cad94ff9" 
/dev/sdc5: PARTLABEL="primary" PARTUUID="6378fbd5-5beb-43b4-80bf-99c53969d56e" 

OSD data disks:
/dev/sdd1: UUID="02b1cb56-efb9-44c2-8fec-a5f6bf5c2f26" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="d790041f-d7ea-4c03-a01f-9801e2c23f6d" 
/dev/sde1: UUID="8152faa6-f8b9-4637-b378-a773be5c0ed2" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="6871b2bb-6d05-4e7f-b9e9-5b95d1c44dd1" 
/dev/sdf1: UUID="4b75f68b-d241-47b1-a14b-1ca6a79af778" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="8d9a27a3-eab8-459a-9e5b-bfb68b2ce789" 
/dev/sdg1: UUID="cd4cc6d8-6cbb-4a24-b4da-8d11b3839c6d" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="357c8863-d51f-459a-88be-22afdfb13cde" 
/dev/sdh1: UUID="9f7a3496-b31c-4ec9-be41-271bb27dd2bc" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="19554bdf-055c-4520-9270-c3e845cf153e" 
/dev/sdi1: UUID="44d875bb-3ad8-4728-aa6e-6b8c6d647a42" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="33a00fd9-3ca1-4e2e-836a-8469486e92fc" 
/dev/sdl1: UUID="8b1926a3-8bf7-4eaf-8d2a-d2430e30563b" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="817c14da-77bf-4625-ba6d-00991d48bf32" 
/dev/sdk1: UUID="61b08eb1-f2b6-4eba-a08f-992b0eabee7b" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="7813779a-c44d-4383-a88e-1ab39f017caa" 
/dev/sdj1: UUID="caf8c7a4-de7c-42a7-a032-d194e299e421" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="b028cbfe-1413-4269-a66b-7da0fdc6625c" 
/dev/sdm1: UUID="0ecf229b-4120-4a85-b36b-0d8b11e293ca" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="2fabe133-6c9e-4aae-b8ff-9073a2a8c81f" 

- And our testbed is neraly equal to the second customer but one OSD in our testbed is co-located in journal disk as SSD disk osd.

Comment 42 Kefu Chai 2018-03-15 08:41:55 UTC
[root@magna077 ubuntu]# pgrep osd -a
4755 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f
5551 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
6000 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph -f

# ======= stop osd.0
[root@magna077 ubuntu]# service ceph stop osd.0
=== osd.0 ===
Stopping Ceph osd.0 on magna077...kill 4755...kill 4755...done

# ======= and osd.0 is stopped, osd.1 and osd.2 are still up and running
[root@magna077 ubuntu]# service ceph status
=== osd.2 ===
osd.2: running {"version":"0.94.10-2.el7cp"}
=== osd.0 ===
osd.0: not running.
=== osd.1 ===
osd.1: running {"version":"0.94.10-2.el7cp"}
[root@magna077 ubuntu]# pgrep osd -a
5551 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
6000 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph -f

# ======= and start osd.0
[root@magna077 ubuntu]# service ceph start osd.0
=== osd.0 ===
ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2018-03-15 08:37:02.402916 7f3d0ebb3700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2018-03-15 08:37:02.402919 7f3d0ebb3700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound'
create-or-move updated item name 'osd.0' weight 0.9 at location {host=magna077} to crush map
Starting Ceph osd.0 on magna077...
Running as unit ceph-osd.0.1521103018.987203963.service.

# ===== all OSDs are running, including osd.1 and osd.2
[root@magna077 ubuntu]# pgrep osd -a
5551 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f
5937 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f
6000 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph -f

[root@magna077 ubuntu]# service ceph status
=== osd.2 ===
osd.2: running {"version":"0.94.10-2.el7cp"}
=== osd.0 ===
osd.0: running {"version":"0.94.10-2.el7cp"}
=== osd.1 ===
osd.1: running {"version":"0.94.10-2.el7cp"}

@Manohar could you share with me the step to reproduce? or am i missing something?

Comment 45 Kefu Chai 2018-03-15 14:29:52 UTC
after applying the fix of #1275636, the the SubState of ceph.service after rebooting is still "exited".

i think it's related to how systemd understands the lifecycle of a SysV/LSB service. for a none native systemd service, systemctl cannot know if it is running or not. that's why the SubState of ceph.service is "exited" once the sysv script exits.

and we are using a single /etc/rc.d/init.d/ceph for managing all ceph services, so we cannot use the "pidfile:"[1] tag to help systemd-sysv-generator to generate a PIDFile line in /run/systemd/generator.late/ceph.service .

so i'd suggest continue using the workaround documented at https://access.redhat.com/solutions/2877891 or just use ceph-osd@ service directly. for instance, "systemctl start ceph-osd@0"

---

[1] https://www.freedesktop.org/wiki/Software/systemd/Incompatibilities/

Comment 47 Kefu Chai 2018-03-16 05:45:50 UTC
the reason why we are able to start other osds after they are killed by "service ceph start osd.0" after reboot, is that the SubState is set to "dead" by the first "service ceph start osd.0" command. it calls "/bin/systemctl stop ${SERVICE}.service" for reaping the "died" ceph SysV daemons. that command changes the state of ceph.service to:

# systemctl show -p ActiveState -p SubState ceph
ActiveState=inactive
SubState=dead