2114882 – cephadm ends zero returncode when deployment fails during combined bootstrap/deployment execution

Bug 2114882 - cephadm ends zero returncode when deployment fails during combined bootstrap/deployment execution

Summary: cephadm ends zero returncode when deployment fails during combined bootstrap/...

Keywords:
Status:	CLOSED DUPLICATE of bug 2116689
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Cephadm
Sub Component:
Version:	5.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	6.1
Assignee:	Adam King
QA Contact:	Manasa
Docs Contact:	Anjana Suparna Sriram
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-03 12:45 UTC by Martin Bukatovic
Modified:	2022-10-04 02:20 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-10-04 02:20:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-4992	0	None	None	None	2022-08-03 12:53:41 UTC

Description Martin Bukatovic 2022-08-03 12:45:09 UTC

Description of problem
======================

When I instruct cephadm to perform bootstrap and cluster deployment in a single
run (using --apply-spec with a yaml file describing whole cluster), and do a
mistake with impact on deployment only, the cephamd run completes the
bootstrap, fails on deployment and then finishes with zero return code as if
nothing went wrong.

Version-Release number of selected component
============================================

ceph-5.2-rhel-8-containers-candidate-30183-20220610110810

How reproducible
================

100%

Steps to Reproduce
==================

1. Use cephadm to run bootstrap and cluster deployment in a single step, so
that the installation would fail (eg. wrong value for --ssh-private-key or
--ssh-public-key option). For example:

```
cephadm bootstrap \
  --ssh-private-key /root/.ssh/foo --ssh-public-key /root/.ssh/foo.pub \
  --mon-ip {{ admin_host_ip_addr }} --apply-spec cluster-spec.yaml
```

Where foo ssh key is not valid.

2. Observe return code of the cephadm command.

Actual results
==============

The cephamd command finishes with zero return code, even though it completes
bootstrap only. The summary of the operation doesn't highlight that the
deployment failed (that said, the problem is clearly mentioned in the output,
it's just not directly raised again in the final summary).

Expected results
================

The cephamd command finishes with some nonzero return code to indicate an error
during deployment.

Optionally, mention that the deployment failed in the final summary.

Additional info
===============

Example of cephadm output:

```
Unable to parse /root/cluster-spec.yaml succesfully
Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 -e NODE_NAME=osd-0 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/95745a4e-f2f3-11ec-be2a-0050568f082e:/var/log/ceph:z -v /tmp/ceph-tmpok6chxaw:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp17pfii7z:/etc/ceph/ceph.conf:z -v /root/cluster-spec.yaml:/tmp/spec.yml:ro registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.2-rhel-8-containers-candidate-30183-20220610110810 orch apply -i /tmp/spec.yml
/usr/bin/ceph: stderr Error EINVAL: Failed to connect to osd-1 (10.1.160.73).
/usr/bin/ceph: stderr Please make sure that the host is reachable and accepts connections using the cephadm SSH key
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To add the cephadm SSH key to the host:
/usr/bin/ceph: stderr > ceph cephadm get-pub-key > ~/ceph.pub
/usr/bin/ceph: stderr > ssh-copy-id -f -i ~/ceph.pub root.160.73
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr To check that the host is reachable open a new shell with the --no-hosts flag:
/usr/bin/ceph: stderr > cephadm shell --no-hosts
/usr/bin/ceph: stderr
/usr/bin/ceph: stderr Then run the following:
/usr/bin/ceph: stderr > ceph cephadm get-ssh-config > ssh_config
/usr/bin/ceph: stderr > ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
/usr/bin/ceph: stderr > chmod 0600 ~/cephadm_private_key
/usr/bin/ceph: stderr > ssh -F ssh_config -i ~/cephadm_private_key root.160.73
 
Applying /root/cluster-spec.yaml to cluster failed!
 
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:
 
        sudo /usr/sbin/cephadm shell --fsid 95745a4e-f2f3-11ec-be2a-0050568f082e -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
 
Or, if you are only running a single cluster on this host:
 
        sudo /usr/sbin/cephadm shell
 
Please consider enabling telemetry to help improve Ceph:
 
        ceph telemetry on
 
For more information see:
 
        https://docs.ceph.com/en/pacific/mgr/telemetry/
 
Bootstrap complete.
```

The last message is "Bootstrap complete" and return code is zero. The failure
of celuster deployment is mentioned before the final summary:

- Error EINVAL: Failed to connect to osd-1 (10.1.160.73).
- Applying /root/cluster-spec.yaml to cluster failed!

Note You need to log in before you can comment on or make changes to this bug.