Bug 1932495
| Summary: | [cephadm] 5.0 - Ceph orch stop command is taking the invalid entry i.e OSD.4 instead of valid OSD service name | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Preethi <pnataraj> |
| Component: | Cephadm | Assignee: | Juan Miguel Olmo <jolmomar> |
| Status: | CLOSED DUPLICATE | QA Contact: | Vasishta <vashastr> |
| Severity: | high | Docs Contact: | Karen Norteman <knortema> |
| Priority: | high | ||
| Version: | 5.0 | CC: | vereddy |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 5.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-03-08 13:08:11 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The "ceph orch stop" command is an orchestrator command that acts upon ceph services .. not directly over sysstemd services. if you take a look to the help: [ceph: root@cephLab2-node-00 /]# ceph orch stop Invalid command: missing required parameter service_name(<string>) orch start|stop|restart|redeploy|reconfig <service_name> : Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons) It requires a service name... the kind of service name that you can obtain with: ============================================= ceph: root@cephLab2-node-00 /]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID alertmanager 1/1 2m ago 20h count:1 docker.io/prom/alertmanager:v0.20.0 0881eb8f169f .... osd.all-available-devices 3/3 2m ago 19h <unmanaged> docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3 5c2a636e05b4 .... ============================================= so, you can stop osd daemons, if you execute: =================================================== [ceph: root@cephLab2-node-00 /]# ceph orch stop osd.all-available-devices <----------------- Because this is tyhe name of the service associated with the osds... Scheduled to stop osd.3 on host 'cephLab2-node-00' Scheduled to stop osd.0 on host 'cephLab2-node-00' Scheduled to stop osd.4 on host 'cephLab2-node-01' =================================================== or start them again executing: =================================================== [ceph: root@cephLab2-node-00 /]# ceph orch start osd.all-available-devices Scheduled to start osd.3 on host 'cephLab2-node-00' Scheduled to start osd.0 on host 'cephLab2-node-00' Scheduled to start osd.4 on host 'cephLab2-node-01' =================================================== But what is not going to work is to execute: =================================================== [ceph: root@cephLab2-node-00 /]# ceph orch stop osd.4 Error EINVAL: No daemons exist under service name "osd.4". View currently running services using "ceph orch ls" <------ We do not have any service called "osd.4" =================================================== If you want to stop a specific daemon, first you need to know what is the name of the daemon, in the case of osds is easy.. "osd.<osd id>. But for any kind of daemon you can get this information using: =================================================== [ceph: root@cephLab2-node-00 /]# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID alertmanager.cephLab2-node-00 cephLab2-node-00 running (20h) 3m ago 20h 0.20.0 docker.io/prom/alertmanager:v0.20.0 0881eb8f169f 1268e1cc5286 crash.cephLab2-node-00 cephLab2-node-00 running (20h) 3m ago 20h 17.0.0-1275-g5e197a21 docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3 5c2a636e05b4 b12af74b2371 ... mon.cephLab2-node-00 cephLab2-node-00 running (20h) 3m ago 20h 17.0.0-1275-g5e197a21 docker.io/ceph/daemon-base:latest-master-devel 5c2a636e05b4 4f0e24420187 node-exporter.cephLab2-node-00 cephLab2-node-00 running (20h) 3m ago 20h 0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf 2f55375a5807 ... osd.0 cephLab2-node-00 running (19h) 3m ago 19h 17.0.0-1275-g5e197a21 docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3 5c2a636e05b4 bd2a04c6d1fb osd.1 cephLab2-node-00 running (19h) 3m ago 19h 17.0.0-1275-g5e197a21 docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3 5c2a636e05b4 876950c3935c osd.3 cephLab2-node-00 running (19h) 3m ago 19h 17.0.0-1275-g5e197a21 docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3 5c2a636e05b4 e32f68b58417 osd.4 cephLab2-node-01 running (19h) 3m ago 19h 17.0.0-1275-g5e197a21 docker.io/ceph/daemon-base@sha256:8a9d01990f2601ea3111d84a18ba84a9043275cde1e48956a74b5a12e5e850b3 5c2a636e05b4 ec1ccaa61c83 =================================================== and you can stop the specific daemon, using the orchestrator daemon command: =================================================== [ceph: root@cephLab2-node-00 /]# ceph orch daemon stop osd.4 Scheduled to stop osd.4 on host 'cephLab2-node-01' =================================================== and start again the daemon: =================================================== [ceph: root@cephLab2-node-00 /]# ceph orch daemon start osd.4 Scheduled to start osd.4 on host 'cephLab2-node-01' =================================================== @Juan, I agree we should use ceph orch daemon start/stop for individual daemons and ceph orch <service name> for service to stop/start. However, when we issue ceph orch stop osd.4 it should not execute instead throw the message like below [ceph: root@ceph-adm7 /]# ceph orch stop osd.4 [ceph: root@ceph-adm7 /]# ceph osd tree expected : [ceph: root@cephLab2-node-00 /]# ceph orch stop Invalid command: missing required parameter service_name(<string>) orch start|stop|restart|redeploy|reconfig <service_name> : Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons) @Preethi, I do not understand what do you mean. You have your expected answer if you try to execute a "ceph orch stop" without using the name of a service: [ceph: root@cephLab2-node-00 /]# ceph orch stop Invalid command: missing required parameter service_name(<string>) orch start|stop|restart|redeploy|reconfig <service_name> : Start, stop, restart, redeploy, or reconfig an entire service (i.e. all daemons) Error EINVAL: invalid command [ceph: root@cephLab2-node-00 /]# ceph orch stop pepe Error EINVAL: No daemons exist under service name "pepe". View currently running services using "ceph orch ls" [ceph: root@cephLab2-node-00 /]# ceph orch stop osd.40 Error EINVAL: No daemons exist under service name "osd.40". View currently running services using "ceph orch ls The same if if I try to execute with a existing osd... [ceph: root@cephLab2-node-00 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.29279 root default -3 0.14639 host cephLab2-node-00 0 hdd 0.04880 osd.0 up 1.00000 1.00000 1 hdd 0.04880 osd.1 up 1.00000 1.00000 2 hdd 0.04880 osd.2 up 1.00000 1.00000 -5 0.14639 host cephLab2-node-01 3 hdd 0.04880 osd.3 up 1.00000 1.00000 4 hdd 0.04880 osd.4 up 1.00000 1.00000 5 hdd 0.04880 osd.5 up 1.00000 1.00000 [ceph: root@cephLab2-node-00 /]# ceph orch stop osd.4 Error EINVAL: No daemons exist under service name "osd.4". View currently running services using "ceph orch ls @Juan, I see the issue with the latest compose. i.e ceph-5.0-rhel-8-containers-candidate-21981-20210302003306. My concern is command executes instead of throwing an error of "invalid service name" [ceph: root@ceph-sangadi-5-0-1614674044401-node1-mon-mgr-installer-node-exp /]# ceph orch stop peeppp [ceph: root@ceph-sangadi-5-0-1614674044401-node1-mon-mgr-installer-node-exp /]# [ceph: root@magna021 /]# ceph orch stop osd.4 [ceph: root@magna021 /]# [ceph: root@magna021 /]# [ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# ceph orch stop osd.4 [ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# Below cluster details: 10.0.211.211 -cephuser/cephuser magna021 - root/q 10.0.210.149 -cephuser/cephuser *** This bug has been marked as a duplicate of bug 1909628 *** |
Description of problem:5.0 - Ceph orch stop command is taking the invalid entry i.e OSD.4 instead of valid service name Version-Release number of selected component (if applicable): Version-Release number of selected component (if applicable): [root@ceph-adm7 ~]# sudo cephadm version Using recent ceph image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest ceph version 16.0.0-7953.el8cp (aac7c5c7d5f82d2973c366730f65255afd66e515) pacific (dev) How reproducible: Steps to Reproduce: 1. Install 5.0 cluster with dashboard enabled 2. Enter to cephadm shell 3. check ceph status 4. Perform ceph orch stop with OSD ID instead of valid OSD service name 5. observe the behaviour Actual results: Command executes but service is not stopped Expected results: It should take the service name instead of incorrect service names Additional info: 10.74.253.36 root/redhat output: [ceph: root@ceph-adm7 /]# ceph orch stop osd.4 [ceph: root@ceph-adm7 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.37105 root default -7 0.10738 host ceph-adm7 4 hdd 0.02930 osd.4 up 1.00000 1.00000 11 hdd 0.02930 osd.11 up 1.00000 1.00000 12 hdd 0.02930 osd.12 up 1.00000 1.00000 13 hdd 0.01949 osd.13 up 1.00000 1.00000 -3 0.14648 host ceph-adm8 1 hdd 0.05859 osd.1 up 1.00000 1.00000 2 hdd 0.02930 osd.2 up 1.00000 1.00000 6 hdd 0.02930 osd.6 up 1.00000 1.00000 8 hdd 0.02930 osd.8 up 1.00000 1.00000 -5 0.11719 host ceph-adm9 3 hdd 0.02930 osd.3 up 1.00000 1.00000 5 hdd 0.02930 osd.5 up 1.00000 1.00000 7 hdd 0.02930 osd.7 up 1.00000 1.00000 9 hdd 0.02930 osd.9 destroyed 0 1.00000 0 0 osd.0 down 0 1.00000 [ceph: root@ceph-adm7 /]# ceph orch stop ^Cd.4 [ceph: root@ceph-adm7 /]# exit exit [root@ceph-adm7 ~]# systemctl status ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service ● ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service - Ceph osd.4 for 58149bf2-66ac-11eb-84bf-001a4a000262 Loaded: loaded (/etc/systemd/system/ceph-58149bf2-66ac-11eb-84bf-001a4a000262@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2021-02-24 15:10:25 IST; 23min ago Process: 2410351 ExecStart=/bin/bash /var/lib/ceph/58149bf2-66ac-11eb-84bf-001a4a000262/osd.4/unit.run (code=exited, status=0/SUCCESS) Process: 2410349 ExecStartPre=/bin/rm -f //run/ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service-pid //run/ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service-cid (code=exited, status=0/SUCCESS) Process: 2410315 ExecStartPre=/bin/podman rm ceph-58149bf2-66ac-11eb-84bf-001a4a000262-osd.4 (code=exited, status=1/FAILURE) Main PID: 2410645 (conmon) Tasks: 2 (limit: 49465) Memory: 5.3M CGroup: /system.slice/system-ceph\x2d58149bf2\x2d66ac\x2d11eb\x2d84bf\x2d001a4a000262.slice/ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service └─2410645 /usr/bin/conmon --api-version 1 -c 63599669225c8d4c2e98cccd3d2a9fba09b10bf5dcb66f92778eda4a9b3e8027 -u 63599669225c8d4c2e98cccd3d2a9fba09b10bf5dcb66f92778eda4a9b3e8027 -r /usr/bin/runc -b /> Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/ln -snf /dev/ceph-7b04f4e5-ea07-442b-93f2-6f6d96276108/osd-block-b11b448e-60cd-4bde-b6ea-ac76a5284b9b /var/lib/ceph/osd/ceph-4/block Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-4/block Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--7b04f4e5--ea07--442b--93f2--6f6d96276108-osd--block--b11b448e--60cd--4bde--b6ea--ac76a5284b9b Feb 24 15:10:24 ceph-adm7 bash[2410351]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-4 Feb 24 15:10:24 ceph-adm7 bash[2410351]: --> ceph-volume lvm activate successful for osd ID: 4 Feb 24 15:10:24 ceph-adm7 bash[2410351]: Error: Failed to evict container: "": Failed to find container "ceph-58149bf2-66ac-11eb-84bf-001a4a000262-osd.4" in state: no container with name or ID ceph-58149bf2-66a> Feb 24 15:10:24 ceph-adm7 bash[2410351]: Error: no container with ID or name "ceph-58149bf2-66ac-11eb-84bf-001a4a000262-osd.4" found: no such container Feb 24 15:10:25 ceph-adm7 bash[2410351]: WARNING: The same type, major and minor should not be used for multiple devices. Feb 24 15:10:25 ceph-adm7 bash[2410351]: 63599669225c8d4c2e98cccd3d2a9fba09b10bf5dcb66f92778eda4a9b3e8027 Feb 24 15:10:25 ceph-adm7 systemd[1]: Started Ceph osd.4 for 58149bf2-66ac-11eb-84bf-001a4a000262. #ceph orch stop ceph-58149bf2-66ac-11eb-84bf-001a4a000262.service --> executes