Bug 1876447 - [16.1.0->16.1.1][DCN] Ceph upgrade is failing for as ansible is not considering the ceph cluster name
Summary: [16.1.0->16.1.1][DCN] Ceph upgrade is failing for as ansible is not consideri...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z2
: 4.1
Assignee: Francesco Pantano
QA Contact: Ameena Suhani S H
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1760354 1816167
TreeView+ depends on / blocked
 
Reported: 2020-09-07 08:56 UTC by Khomesh Thakre
Modified: 2023-12-15 19:13 UTC (History)
19 users (show)

Fixed In Version: ceph-ansible-4.0.30-1.el8cp, ceph-ansible-4.0.30-1.el7cp
Doc Type: Bug Fix
Doc Text:
.The {storage-product} rolling update fails when multiple storage clusters exist Running the Ceph Ansible `rolling_update.yml` playbook when multiple storage clusters are configured, would cause the rolling update to fail because a storage cluster name could not be specified. With this release, the `rolling_update.yml` playbook uses the `--cluster` option to allow for a specific storage cluster name.
Clone Of:
Environment:
Last Closed: 2020-09-30 17:26:56 UTC
Embargoed:
yrabl: automate_bug-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5738 0 None closed Add --cluster option on ceph require-osd-release command 2021-02-08 12:31:45 UTC
Red Hat Issue Tracker RHCEPH-8069 0 None None None 2023-12-15 19:13:01 UTC
Red Hat Product Errata RHBA-2020:4144 0 None None None 2020-09-30 17:27:30 UTC

Description Khomesh Thakre 2020-09-07 08:56:11 UTC
Description of problem:
During the upgrade of OSP with ceph from 16.0 to 16.1 for DCN environment, ceph upgrade is failing below belor error 

~~~

        "fatal: [rhosp-con1 -> 172.10.10.101]: FAILED! => {\"changed\": true, \"cmd\": [\"podman\", \"exec\", \"ceph-mon-rhosp-con1\", \"ceph\", \"osd\", \"require-osd-release\", \"nautilus\"], \"delta\": \"0:00:00.435547\", \"end\": \"2020-09-06 15:51:37.055340\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-09-06 15:51:36.619793\", \"stderr\": \"Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)\\nError: non zero exit code: 1: OCI runtime error\", \"stderr_lines\": [\"Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)\", \"Error: non zero exit code: 1: OCI runtime error\"], \"stdout\": \"\", \"stdout_lines\": []}",
        "NO MORE HOSTS LEFT *************************************************************",
        "PLAY RECAP *********************************************************************",
        "localhost                  : ok=1    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   ",
        "rhosp-con1                 : ok=295  changed=30   unreachable=0    failed=1    skipped=491  rescued=0    ignored=1   ",
        "rhosp-con2                 : ok=213  changed=20   unreachable=0    failed=0    skipped=395  rescued=0    ignored=1   ",
        "rhosp-con3                 : ok=213  changed=20   unreachable=0    failed=0    skipped=395  rescued=0    ignored=1   ",
        "rhosp-hci1                 : ok=207  changed=13   unreachable=0    failed=0    skipped=359  rescued=0    ignored=0   ",
        "rhosp-hci2                 : ok=202  changed=12   unreachable=0    failed=0    skipped=348  rescued=0    ignored=0   ",
        "rhosp-hci3                 : ok=203  changed=12   unreachable=0    failed=0    skipped=347  rescued=0    ignored=0   ",
        "rhosp-nfv1                 : ok=105  changed=5    unreachable=0    failed=0    skipped=225  rescued=0    ignored=0   ",
        "rhosp-nfv2                 : ok=105  changed=5    unreachable=0    failed=0    skipped=225  rescued=0    ignored=0   ",
        "Sunday 06 September 2020  15:51:37 +0200 (0:00:00.747)       0:18:25.016 ****** ",
~~~

Ansible is running the wrong command 

~~~
[root@rhosp-con1 ~]# podman exec ceph-mon-rhosp-con1 ceph osd require-osd-release nautilus
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
Error: non zero exit code: 1: OCI runtime error
~~~

The correct command is :
~~~
[root@rhosp-con1 ~]# podman exec ceph-mon-rhosp-con1 ceph -c /etc/ceph/central.conf osd require-osd-release nautilus
~~~
As the ceph cluster name is "central".


Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 16.1 Train 

How reproducible:


Steps to Reproduce:
1. deploy a osp with ceph on central site with custom cluster name like central
2. Perform a upgrade of the env 
3. Ceph upgrade playbook will fail 

Actual results:
Ansible is not considering the custom ceph cluster name 

Expected results:
Ansible is shoud consider the custom ceph cluster name 

Additional info:

Comment 20 errata-xmlrpc 2020-09-30 17:26:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144


Note You need to log in before you can comment on or make changes to this bug.