1876447 – [16.1.0->16.1.1][DCN] Ceph upgrade is failing for as ansible is not considering the ceph cluster name

Bug 1876447 - [16.1.0->16.1.1][DCN] Ceph upgrade is failing for as ansible is not considering the ceph cluster name

Summary: [16.1.0->16.1.1][DCN] Ceph upgrade is failing for as ansible is not consideri...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	z2
Target Release:	4.1
Assignee:	Francesco Pantano
QA Contact:	Ameena Suhani S H
Docs Contact:	Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks:	1760354 1816167
TreeView+	depends on / blocked

Reported:	2020-09-07 08:56 UTC by Khomesh Thakre
Modified:	2023-12-15 19:13 UTC (History)
CC List:	19 users (show)
Fixed In Version:	ceph-ansible-4.0.30-1.el8cp, ceph-ansible-4.0.30-1.el7cp
Doc Type:	Bug Fix
Doc Text:	.The {storage-product} rolling update fails when multiple storage clusters exist Running the Ceph Ansible `rolling_update.yml` playbook when multiple storage clusters are configured, would cause the rolling update to fail because a storage cluster name could not be specified. With this release, the `rolling_update.yml` playbook uses the `--cluster` option to allow for a specific storage cluster name.
Clone Of:
Environment:
Last Closed:	2020-09-30 17:26:56 UTC
Embargoed:
Dependent Products:
Flags:	yrabl: automate_bug-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 5738	None	closed	Add --cluster option on ceph require-osd-release command	2021-02-08 12:31:45 UTC
Red Hat Issue Tracker	RHCEPH-8069	None	None	None	2023-12-15 19:13:01 UTC
Red Hat Product Errata	RHBA-2020:4144	None	None	None	2020-09-30 17:27:30 UTC

Description Khomesh Thakre 2020-09-07 08:56:11 UTC

Description of problem:
During the upgrade of OSP with ceph from 16.0 to 16.1 for DCN environment, ceph upgrade is failing below belor error 

~~~

        "fatal: [rhosp-con1 -> 172.10.10.101]: FAILED! => {\"changed\": true, \"cmd\": [\"podman\", \"exec\", \"ceph-mon-rhosp-con1\", \"ceph\", \"osd\", \"require-osd-release\", \"nautilus\"], \"delta\": \"0:00:00.435547\", \"end\": \"2020-09-06 15:51:37.055340\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-09-06 15:51:36.619793\", \"stderr\": \"Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)\\nError: non zero exit code: 1: OCI runtime error\", \"stderr_lines\": [\"Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)\", \"Error: non zero exit code: 1: OCI runtime error\"], \"stdout\": \"\", \"stdout_lines\": []}",
        "NO MORE HOSTS LEFT *************************************************************",
        "PLAY RECAP *********************************************************************",
        "localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   ",
        "rhosp-con1                 : ok=295  changed=30   unreachable=0    failed=1    skipped=491  rescued=0    ignored=1   ",
        "rhosp-con2                 : ok=213  changed=20   unreachable=0    failed=0    skipped=395  rescued=0    ignored=1   ",
        "rhosp-con3                 : ok=213  changed=20   unreachable=0    failed=0    skipped=395  rescued=0    ignored=1   ",
        "rhosp-hci1                 : ok=207  changed=13   unreachable=0    failed=0    skipped=359  rescued=0    ignored=0   ",
        "rhosp-hci2                 : ok=202  changed=12   unreachable=0    failed=0    skipped=348  rescued=0    ignored=0   ",
        "rhosp-hci3                 : ok=203  changed=12   unreachable=0    failed=0    skipped=347  rescued=0    ignored=0   ",
        "rhosp-nfv1                 : ok=105  changed=5    unreachable=0    failed=0    skipped=225  rescued=0    ignored=0   ",
        "rhosp-nfv2                 : ok=105  changed=5    unreachable=0    failed=0    skipped=225  rescued=0    ignored=0   ",
        "Sunday 06 September 2020  15:51:37 +0200 (0:00:00.747)       0:18:25.016 ****** ",
~~~

Ansible is running the wrong command 

~~~
[root@rhosp-con1 ~]# podman exec ceph-mon-rhosp-con1 ceph osd require-osd-release nautilus
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
Error: non zero exit code: 1: OCI runtime error
~~~

The correct command is :
~~~
[root@rhosp-con1 ~]# podman exec ceph-mon-rhosp-con1 ceph -c /etc/ceph/central.conf osd require-osd-release nautilus
~~~
As the ceph cluster name is "central".


Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 16.1 Train 

How reproducible:


Steps to Reproduce:
1. deploy a osp with ceph on central site with custom cluster name like central
2. Perform a upgrade of the env 
3. Ceph upgrade playbook will fail 

Actual results:
Ansible is not considering the custom ceph cluster name 

Expected results:
Ansible is shoud consider the custom ceph cluster name 

Additional info:

Comment 20 errata-xmlrpc 2020-09-30 17:26:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144

Note You need to log in before you can comment on or make changes to this bug.