Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1932495

Summary: [cephadm] 5.0 - Ceph orch stop command is taking the invalid entry i.e OSD.4 instead of valid OSD service name
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Preethi <pnataraj>
Component: CephadmAssignee: Juan Miguel Olmo <jolmomar>
Status: CLOSED DUPLICATE QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Karen Norteman <knortema>
Priority: high    
Version: 5.0CC: vereddy
Target Milestone: ---Keywords: Reopened
Target Release: 5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-08 13:08:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Preethi 2021-02-24 18:04:03 UTC
Description of problem:5.0 - Ceph orch stop command is taking the invalid entry i.e OSD.4 instead of valid service name


Version-Release number of selected component (if applicable):
Version-Release number of selected component (if applicable):
[root@ceph-adm7 ~]# sudo cephadm version
Using recent ceph image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest
ceph version 16.0.0-7953.el8cp (aac7c5c7d5f82d2973c366730f65255afd66e515) pacific (dev)


How reproducible:


Steps to Reproduce:
1. Install 5.0 cluster with dashboard enabled
2. Enter to cephadm shell
3. check ceph status 
4. Perform ceph orch stop with OSD ID instead of valid OSD service name
5. observe the behaviour


Actual results: Command executes but service is not stopped


Expected results: It should take the service name instead of incorrect service names


Additional info:
10.74.253.36 root/redhat

output:
[ceph: root@ceph-adm7 /]# ceph orch stop osd.4
[ceph: root@ceph-adm7 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME           STATUS     REWEIGHT  PRI-AFF
-1         0.37105  root default                                    
-7         0.10738      host ceph-adm7                              
 4    hdd  0.02930          osd.4              up   1.00000  1.00000
11    hdd  0.02930          osd.11             up   1.00000  1.00000
12    hdd  0.02930          osd.12             up   1.00000  1.00000
13    hdd  0.01949          osd.13             up   1.00000  1.00000
-3         0.14648      host ceph-adm8                              
 1    hdd  0.05859          osd.1              up   1.00000  1.00000
 2    hdd  0.02930          osd.2              up   1.00000  1.00000
 6    hdd  0.02930          osd.6              up   1.00000  1.00000
 8    hdd  0.02930          osd.8              up   1.00000  1.00000
-5         0.11719      host ceph-adm9                              
 3    hdd  0.02930          osd.3              up   1.00000  1.00000
 5    hdd  0.02930          osd.5              up   1.00000  1.00000
 7    hdd  0.02930          osd.7              up   1.00000  1.00000
 9    hdd  0.02930          osd.9       destroyed         0  1.00000
 0               0  osd.0                    down         0  1.00000
[ceph: root@ceph-adm7 /]# ceph orch stop ^Cd.4
[ceph: root@ceph-adm7 /]# exit
exit
[root@ceph-adm7 ~]# systemctl status ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service
● ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service - Ceph osd.4 for 58149bf2-66ac-11eb-84bf-001a4a000262
   Loaded: loaded (/etc/systemd/system/ceph-58149bf2-66ac-11eb-84bf-001a4a000262@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-02-24 15:10:25 IST; 23min ago
  Process: 2410351 ExecStart=/bin/bash /var/lib/ceph/58149bf2-66ac-11eb-84bf-001a4a000262/osd.4/unit.run (code=exited, status=0/SUCCESS)
  Process: 2410349 ExecStartPre=/bin/rm -f //run/ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service-pid //run/ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service-cid (code=exited, status=0/SUCCESS)
  Process: 2410315 ExecStartPre=/bin/podman rm ceph-58149bf2-66ac-11eb-84bf-001a4a000262-osd.4 (code=exited, status=1/FAILURE)
 Main PID: 2410645 (conmon)
    Tasks: 2 (limit: 49465)
   Memory: 5.3M
   CGroup: /system.slice/system-ceph\x2d58149bf2\x2d66ac\x2d11eb\x2d84bf\x2d001a4a000262.slice/ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service
           └─2410645 /usr/bin/conmon --api-version 1 -c 63599669225c8d4c2e98cccd3d2a9fba09b10bf5dcb66f92778eda4a9b3e8027 -u 63599669225c8d4c2e98cccd3d2a9fba09b10bf5dcb66f92778eda4a9b3e8027 -r /usr/bin/runc -b />
 
Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/ln -snf /dev/ceph-7b04f4e5-ea07-442b-93f2-6f6d96276108/osd-block-b11b448e-60cd-4bde-b6ea-ac76a5284b9b /var/lib/ceph/osd/ceph-4/block
Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-4/block
Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--7b04f4e5--ea07--442b--93f2--6f6d96276108-osd--block--b11b448e--60cd--4bde--b6ea--ac76a5284b9b
Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4
Feb 24 15:10:24 ceph-adm7 bash[2410351]: --> ceph-volume lvm activate successful for osd ID: 4
Feb 24 15:10:24 ceph-adm7 bash[2410351]: Error: Failed to evict container: "": Failed to find container "ceph-58149bf2-66ac-11eb-84bf-001a4a000262-osd.4" in state: no container with name or ID ceph-58149bf2-66a>
Feb 24 15:10:24 ceph-adm7 bash[2410351]: Error: no container with ID or name "ceph-58149bf2-66ac-11eb-84bf-001a4a000262-osd.4" found: no such container
Feb 24 15:10:25 ceph-adm7 bash[2410351]: WARNING: The same type, major and minor should not be used for multiple devices.
Feb 24 15:10:25 ceph-adm7 bash[2410351]: 63599669225c8d4c2e98cccd3d2a9fba09b10bf5dcb66f92778eda4a9b3e8027
Feb 24 15:10:25 ceph-adm7 systemd[1]: Started Ceph osd.4 for 58149bf2-66ac-11eb-84bf-001a4a000262.


#ceph orch stop ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service --> executes

Comment 1 Juan Miguel Olmo 2021-03-03 12:17:24 UTC
The "ceph orch stop" command is an orchestrator command that acts upon ceph services .. not directly over sysstemd services.

if you take a look to the help:
[ceph: root@cephLab2-node-00 /]# ceph orch stop
Invalid command: missing required parameter service_name(<string>)
orch start|stop|restart|redeploy|reconfig <service_name> :  Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons)

It requires a service name... the kind of service name that you can obtain with:

=============================================
ceph: root@cephLab2-node-00 /]# ceph orch ls
NAME                       RUNNING  REFRESHED  AGE   PLACEMENT                                                   IMAGE NAME                                                                                          IMAGE ID      
alertmanager                   1/1  2m ago     20h   count:1                                                     docker.io/prom/alertmanager:v0.20.0                                                                 0881eb8f169f  
....
osd.all-available-devices      3/3  2m ago     19h   <unmanaged>                                                 docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3  5c2a636e05b4  
....
=============================================

so, you can stop osd daemons, if you execute:
===================================================
[ceph: root@cephLab2-node-00 /]# ceph orch stop osd.all-available-devices <----------------- Because this is tyhe name of the service associated with the osds...
Scheduled to stop osd.3 on host 'cephLab2-node-00'
Scheduled to stop osd.0 on host 'cephLab2-node-00'
Scheduled to stop osd.4 on host 'cephLab2-node-01'
===================================================

or start them again executing:
===================================================
[ceph: root@cephLab2-node-00 /]# ceph orch start osd.all-available-devices
Scheduled to start osd.3 on host 'cephLab2-node-00'
Scheduled to start osd.0 on host 'cephLab2-node-00'
Scheduled to start osd.4 on host 'cephLab2-node-01'
===================================================


But what is not going to work is to execute:
===================================================
[ceph: root@cephLab2-node-00 /]# ceph orch stop osd.4
Error EINVAL: No daemons exist under service name "osd.4". View currently running services using "ceph orch ls"  <------   We do not have any service called "osd.4" 
===================================================


If you want to stop a specific daemon, first you need to know what is the name of the daemon, in the case of osds is easy.. "osd.<osd id>. But for any kind of daemon you can get this information using:
===================================================
[ceph: root@cephLab2-node-00 /]# ceph orch ps
NAME                                     HOST              STATUS          REFRESHED  AGE   VERSION                IMAGE NAME                                                                                          IMAGE ID      CONTAINER ID  
alertmanager.cephLab2-node-00            cephLab2-node-00  running (20h)   3m ago     20h   0.20.0                 docker.io/prom/alertmanager:v0.20.0                                                                 0881eb8f169f  1268e1cc5286  
crash.cephLab2-node-00                   cephLab2-node-00  running (20h)   3m ago     20h   17.0.0-1275-g5e197a21  docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3  5c2a636e05b4  b12af74b2371  
...
mon.cephLab2-node-00                     cephLab2-node-00  running (20h)   3m ago     20h   17.0.0-1275-g5e197a21  docker.io/ceph/daemon-base:latest-master-devel                                                      5c2a636e05b4  4f0e24420187  
node-exporter.cephLab2-node-00           cephLab2-node-00  running (20h)   3m ago     20h   0.18.1                 docker.io/prom/node-exporter:v0.18.1                                                                e5a616e4b9cf  2f55375a5807  
...
osd.0                                    cephLab2-node-00  running (19h)   3m ago     19h   17.0.0-1275-g5e197a21  docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3  5c2a636e05b4  bd2a04c6d1fb  
osd.1                                    cephLab2-node-00  running (19h)   3m ago     19h   17.0.0-1275-g5e197a21  docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3  5c2a636e05b4  876950c3935c  
osd.3                                    cephLab2-node-00  running (19h)   3m ago     19h   17.0.0-1275-g5e197a21  docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3  5c2a636e05b4  e32f68b58417  
osd.4                                    cephLab2-node-01  running (19h)   3m ago     19h   17.0.0-1275-g5e197a21  docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3  5c2a636e05b4  ec1ccaa61c83  
===================================================


and you can stop the specific daemon, using the orchestrator daemon command:
===================================================
[ceph: root@cephLab2-node-00 /]# ceph orch daemon stop osd.4
Scheduled to stop osd.4 on host 'cephLab2-node-01'
===================================================

and start again the daemon:
===================================================
[ceph: root@cephLab2-node-00 /]# ceph orch daemon start osd.4
Scheduled to start osd.4 on host 'cephLab2-node-01'
===================================================

Comment 2 Preethi 2021-03-05 17:13:30 UTC
@Juan, I agree we should use ceph orch daemon start/stop for individual daemons and ceph orch <service name> for service to stop/start. However, when we issue ceph orch stop osd.4 it should not execute instead throw the message like below


[ceph: root@ceph-adm7 /]# ceph orch stop osd.4
[ceph: root@ceph-adm7 /]# ceph osd tree

expected :
[ceph: root@cephLab2-node-00 /]# ceph orch stop
Invalid command: missing required parameter service_name(<string>)
orch start|stop|restart|redeploy|reconfig <service_name> :  Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons)

Comment 3 Juan Miguel Olmo 2021-03-08 11:21:47 UTC
@Preethi, I do not understand what do you mean. You have your expected answer if you try to execute a "ceph orch stop" without using the name of a service:

[ceph: root@cephLab2-node-00 /]# ceph orch stop
Invalid command: missing required parameter service_name(<string>)
orch start|stop|restart|redeploy|reconfig <service_name> :  Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons)
Error EINVAL: invalid command

[ceph: root@cephLab2-node-00 /]# ceph orch stop pepe
Error EINVAL: No daemons exist under service name "pepe". View currently running services using "ceph orch ls"

[ceph: root@cephLab2-node-00 /]# ceph orch stop osd.40
Error EINVAL: No daemons exist under service name "osd.40". View currently running services using "ceph orch ls

The same if if I try to execute with a existing osd...

[ceph: root@cephLab2-node-00 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                  STATUS  REWEIGHT  PRI-AFF
-1         0.29279  root default                                        
-3         0.14639      host cephLab2-node-00                           
 0    hdd  0.04880          osd.0                  up   1.00000  1.00000
 1    hdd  0.04880          osd.1                  up   1.00000  1.00000
 2    hdd  0.04880          osd.2                  up   1.00000  1.00000
-5         0.14639      host cephLab2-node-01                           
 3    hdd  0.04880          osd.3                  up   1.00000  1.00000
 4    hdd  0.04880          osd.4                  up   1.00000  1.00000
 5    hdd  0.04880          osd.5                  up   1.00000  1.00000

[ceph: root@cephLab2-node-00 /]# ceph orch stop osd.4
Error EINVAL: No daemons exist under service name "osd.4". View currently running services using "ceph orch ls

Comment 4 Preethi 2021-03-08 12:44:01 UTC
@Juan, I see the issue with the latest compose. i.e ceph-5.0-rhel-8-containers-candidate-21981-20210302003306. My concern is command executes instead of throwing an error of "invalid service name"

[ceph: root@ceph-sangadi-5-0-1614674044401-node1-mon-mgr-installer-node-exp /]# ceph orch stop peeppp
[ceph: root@ceph-sangadi-5-0-1614674044401-node1-mon-mgr-installer-node-exp /]#

[ceph: root@magna021 /]# ceph orch stop osd.4
[ceph: root@magna021 /]# 
[ceph: root@magna021 /]# 

[ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# ceph orch stop osd.4
[ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]#


Below cluster details:
10.0.211.211 -cephuser/cephuser
magna021 - root/q
10.0.210.149 -cephuser/cephuser

Comment 5 Juan Miguel Olmo 2021-03-08 13:08:11 UTC

*** This bug has been marked as a duplicate of bug 1909628 ***