Description of problem ====================== When I instruct cephadm to perform bootstrap and cluster deployment in a single run (using --apply-spec with a yaml file describing whole cluster), and do a mistake with impact on deployment only, the cephamd run completes the bootstrap, fails on deployment and then finishes with zero return code as if nothing went wrong. Version-Release number of selected component ============================================ ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 How reproducible ================ 100% Steps to Reproduce ================== 1. Use cephadm to run bootstrap and cluster deployment in a single step, so that the installation would fail (eg. wrong value for --ssh-private-key or --ssh-public-key option). For example: ``` cephadm bootstrap \ --ssh-private-key /root/.ssh/foo --ssh-public-key /root/.ssh/foo.pub \ --mon-ip {{ admin_host_ip_addr }} --apply-spec cluster-spec.yaml ``` Where foo ssh key is not valid. 2. Observe return code of the cephadm command. Actual results ============== The cephamd command finishes with zero return code, even though it completes bootstrap only. The summary of the operation doesn't highlight that the deployment failed (that said, the problem is clearly mentioned in the output, it's just not directly raised again in the final summary). Expected results ================ The cephamd command finishes with some nonzero return code to indicate an error during deployment. Optionally, mention that the deployment failed in the final summary. Additional info =============== Example of cephadm output: ``` Unable to parse /root/cluster-spec.yaml succesfully Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 -e NODE_NAME=osd-0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/95745a4e-f2f3-11ec-be2a-0050568f082e:/var/log/ceph:z -v /tmp/ceph-tmpok6chxaw:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp17pfii7z:/etc/ceph/ceph.conf:z -v /root/cluster-spec.yaml:/tmp/spec.yml:ro registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 orch apply -i /tmp/spec.yml /usr/bin/ceph: stderr Error EINVAL: Failed to connect to osd-1 (10.1.160.73). /usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key /usr/bin/ceph: stderr /usr/bin/ceph: stderr To add the cephadm SSH key to the host: /usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub /usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub root.160.73 /usr/bin/ceph: stderr /usr/bin/ceph: stderr To check that the host is reachable open a new shell with the --no-hosts flag: /usr/bin/ceph: stderr > cephadm shell --no-hosts /usr/bin/ceph: stderr /usr/bin/ceph: stderr Then run the following: /usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config /usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key /usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key /usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key root.160.73 Applying /root/cluster-spec.yaml to cluster failed! Enabling autotune for osd_memory_target You can access the Ceph CLI as following in case of multi-cluster or non-default config: sudo /usr/sbin/cephadm shell --fsid 95745a4e-f2f3-11ec-be2a-0050568f082e -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring Or, if you are only running a single cluster on this host: sudo /usr/sbin/cephadm shell Please consider enabling telemetry to help improve Ceph: ceph telemetry on For more information see: https://docs.ceph.com/en/pacific/mgr/telemetry/ Bootstrap complete. ``` The last message is "Bootstrap complete" and return code is zero. The failure of celuster deployment is mentioned before the final summary: - Error EINVAL: Failed to connect to osd-1 (10.1.160.73). - Applying /root/cluster-spec.yaml to cluster failed!