Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2114882

Summary: cephadm ends zero returncode when deployment fails during combined bootstrap/deployment execution
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Martin Bukatovic <mbukatov>
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED DUPLICATE QA Contact: Manasa <mgowri>
Severity: low Docs Contact: Anjana Suparna Sriram <asriram>
Priority: unspecified    
Version: 5.2CC: cephqe-warriors
Target Milestone: ---   
Target Release: 6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-04 02:20:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Bukatovic 2022-08-03 12:45:09 UTC
Description of problem
======================

When I instruct cephadm to perform bootstrap and cluster deployment in a single
run (using --apply-spec with a yaml file describing whole cluster), and do a
mistake with impact on deployment only, the cephamd run completes the
bootstrap, fails on deployment and then finishes with zero return code as if
nothing went wrong.

Version-Release number of selected component
============================================

ceph-5.2-rhel-8-containers-candidate-30183-20220610110810

How reproducible
================

100%

Steps to Reproduce
==================

1. Use cephadm to run bootstrap and cluster deployment in a single step, so
that the installation would fail (eg. wrong value for --ssh-private-key or
--ssh-public-key option). For example:

```
cephadm bootstrap \
  --ssh-private-key /root/.ssh/foo --ssh-public-key /root/.ssh/foo.pub \
  --mon-ip {{ admin_host_ip_addr }} --apply-spec cluster-spec.yaml
```

Where foo ssh key is not valid.

2. Observe return code of the cephadm command.

Actual results
==============

The cephamd command finishes with zero return code, even though it completes
bootstrap only. The summary of the operation doesn't highlight that the
deployment failed (that said, the problem is clearly mentioned in the output,
it's just not directly raised again in the final summary).

Expected results
================

The cephamd command finishes with some nonzero return code to indicate an error
during deployment.

Optionally, mention that the deployment failed in the final summary.

Additional info
===============

Example of cephadm output:

```
Unable to parse /root/cluster-spec.yaml succesfully
Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 -e NODE_NAME=osd-0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/95745a4e-f2f3-11ec-be2a-0050568f082e:/var/log/ceph:z -v /tmp/ceph-tmpok6chxaw:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp17pfii7z:/etc/ceph/ceph.conf:z -v /root/cluster-spec.yaml:/tmp/spec.yml:ro registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 orch apply -i /tmp/spec.yml
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to osd-1 (10.1.160.73).
/usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To add the cephadm SSH key to the host:
/usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub root.160.73
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To check that the host is reachable open a new shell with the --no-hosts flag:
/usr/bin/ceph: stderr > cephadm shell --no-hosts
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr Then run the following:
/usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
/usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key
/usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key root.160.73
 
Applying /root/cluster-spec.yaml to cluster failed!
 
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
 
        sudo /usr/sbin/cephadm shell --fsid 95745a4e-f2f3-11ec-be2a-0050568f082e -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
 
Or, if you are only running a single cluster on this host:
 
        sudo /usr/sbin/cephadm shell
 
Please consider enabling telemetry to help improve Ceph:
 
        ceph telemetry on
 
For more information see:
 
        https://docs.ceph.com/en/pacific/mgr/telemetry/
 
Bootstrap complete.
```

The last message is "Bootstrap complete" and return code is zero. The failure
of celuster deployment is mentioned before the final summary:

- Error EINVAL: Failed to connect to osd-1 (10.1.160.73).
- Applying /root/cluster-spec.yaml to cluster failed!